Commit Graph

966 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
79ba395267 Replace the last susers calls in netinet6/ with privilege checks.
Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by:	rwatson (older version, before addressing his comments)
2008-01-24 08:25:59 +00:00
Bjoern A. Zeeb
ab569b9c05 Correct the commented out debugging printf()s in REPLACE and NEXT macros.
ip6_sprintf() needs a buffer as first argument these days.

MFC after:	2 weeks
2008-01-20 10:08:15 +00:00
David E. O'Brien
9233d8f3ad un-__P() 2008-01-08 19:08:58 +00:00
Robert Watson
8b953b3f9d Fix leaking MAC labels for IPv6 inpcbs by adding missing MAC label
destroy call; this transpired because the inpcb alloc path for IPv4/IPv6
is the same code, but IPv6 has a separate free path.  The results was
that as new IPv6 TCP connections were created, kernel memory would
gradually leak.

MFC after:	3 days
Reported by:	tanyong <tanyong at ercist dot iscas dot ac dot cn>,
		zhouzhouyi
2007-12-17 17:20:57 +00:00
David E. O'Brien
b48287a32a Clean up VCS Ids. 2007-12-10 16:03:40 +00:00
Julian Elischer
dbec798a76 Remove more dup'd code
MFC After: 1 week
2007-12-06 22:48:24 +00:00
Julian Elischer
90b3552e6e remove duped code
Reviewed By: gnn
MRC after: 1 week
2007-12-06 22:44:24 +00:00
Mike Makonnen
016fb9d9c7 Instead of manually freeing the packet options structure (and not even doing
a good job of it) in the copypktopts() function, just call ip6_clearpktopts()
directly. Otherwise, the callers of this function would end up freeing the
memory twice.

Reviewed by: jinmei
PR:	     kern/116360
2007-11-21 16:01:42 +00:00
Robert Watson
b9b0dac33b Move towards more explicit support for various network protocol stacks
in the TrustedBSD MAC Framework:

- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
  for AARP packet labeling, rather than using a generic link layer
  entry point.

- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
  for ND6 packet labeling, rather than using a generic link layer entry
  point.

- Add expliict entry point mac_netinet_arp_send() for ARP packet
  labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
  rather than using a generic link layer entry point.

- Remove previous genering link layer entry point,
  mac_mbuf_create_linklayer() as it is no longer used.

- Add implementations of new entry points to various policies, largely
  by replicating the existing link layer entry point for them; remove
  old link layer entry point implementation.

- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
  to the MAC Framework rather than static to mac_net.c as it is now
  needed outside of mac_net.c.

Obtained from:	TrustedBSD Project
2007-10-28 15:55:23 +00:00
Robert Watson
8640764682 Rename 'mac_mbuf_create_from_firewall' to 'mac_netinet_firewall_send' as
we move towards netinet as a pseudo-object for the MAC Framework.

Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to
reflect general object-first ordering preference.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-26 13:18:38 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
John Baldwin
21b415b212 Close a race when trying to lookup a gateway route in rt_check().
Specifically, if two threads were doing concurrent lookups and the existing
gateway was marked down, the the first thread would drop a reference on the
gateway route and then unlock the "root" route while it tried to allocate
a new route.  The second thread could then also drop a reference on the
same gateway route resulting in a reference underflow.  Fix this by
clearing the gateway route pointer after dropping the reference count but
before dropping the lock.  Secondly, in this same case, the second thread
would overwrite the gateway route pointer w/o free'ing a reference to the
route installed by the first thread.  In practice this would probably just
fix a lost reference that would result in a route never being freed.

This fixes panics observed in rt_check() and rtexpunge().

MFC after:	1 week
PR:		kern/112490
Insight from:	mehuljv at yahoo.com
Reviewed by:	ru (found the "not-setting it to NULL" part)
Tested by:	several
2007-10-22 19:01:26 +00:00
Randall Stewart
04ee05e815 - Incorrect error EAGAIN returned for invalid send on a locked
stream (using EEOR mode). Changed to EINVAL (in sctp_output.c)
- Static analysis comments added
- fix in mobility code to return a value (static analysis found).
- sctp6_notify function made visible instead of
  static (this is needed for Panda).

Approved by:	re@freebsd.org (B Mah)
2007-09-13 10:36:43 +00:00
Randall Stewart
851b7298b3 - send call has a reference to uio->uio_resid in
the recent send code, but uio may be NULL on sendfile
  calls. Change to use sndlen variable.
- EMSGSIZE is not being returned in non-blocking mode
  and needs a small tweak to look if the msg would
  ever fit when returning EWOULDBLOCK.
- FWD-TSN has a bug in stream processing which could
  cause a panic. This is a follow on to the codenomicon
  fix.
- PDAPI level 1 and 2 do not work unless the reader
  gets his returned buffer full. Fix so we can break
  out when at level 1 or 2.
- Fix fast-handoff features to copy across properly on
  accepted sockets
- Fix sctp_peeloff() system call when no true system call
  exists to screen arguments for errors. In cases where a
  real system call exists the system call itself does this.
- Fix raddr leak in recent add-ip code change for bundled
  asconfs (even when non-bundled asconfs are received)
- Make sure ipi_addr lock is held when walking global addr
  list. Need to change this lock type to a rwlock().
- Add don't wake flag on both input and output when the
  socket is closing.
- When deleting an address verify the interface is correct
  before allowing the delete to process. This protects panda
  and unnumbered.
- Clean up old sysctl stuff and get rid of the old Open/Net
  BSD structures.
- Add a function to watch the ranges in the sysctl sets.
- When appending in the reassembly queue, validate that
  the assoc has not gone to about to be freed. If so
  (in the middle) abort out. Note this especially effects
  MAC I think due to the lock/unlock they do (or with
  LOCK testing in place).
- Netstat patch to get rid of warnings.
- Make sure that no data gets queued to inactive/unconfirmed
  destinations. This especially effect CMT but also makes a
  impact on regular SCTP as well.
- During init collision when we detect seq number out
  of sync we need to treat it like Case C and discard
  the cookie (no invarient needed here).
- Atomic access to the random store.
- When we declare a vtag good, we need to shove it
  into the time wait hash to prevent further use. When
  the tag is put into the assoc hash, we need to remove it
  from the twait hash (where it will surely be). This prevents
  duplicate tag assignments.
- Move decr-ref count to better protect sysctl out of
  data.
- ltrace error corrections in sctp6_usrreq.c
- Add hook for interface up/down to be sent to us.
- Make sysctl() exported structures independent of processor
  architecture.
- Fix route and src addr cache clearing for delete address case.
- Make sure address marked SCTP_DEL_IP_ADDRESS is never selected
  as src addr.
- in icmp handling fixed so we actually look at the icmp codes
  to figure out what to do.
- Modified mobility code.
  Reception of DELETE IP ADDRESS for a primary destination and
  SET PRIMARY for a new primary destination is used for
  retransmission trigger to the new primary destination.
  Also, in this case, destination of chunks in send_queue are
  changed to the new primary destination.
- Fix so that we disallow sending by mbuf to ever have EEOR
  mode set upon it.

Approved by:	re@freebsd.org (B Mah)
2007-09-08 17:48:46 +00:00
Randall Stewart
ceaad40ae7 - Locking compatiability changes. This involves adding
additional flags to many function calls. The flags only
  get used in BSD when we compile with lock testing. These
  flags allow apple to escape the "giant" lock it holds on
  the socket and have more fine-grained locking in the NKE.
  It also allows us to test (with witness) the locking used
  by apple via a compile switch (manually applied).

Approved by:	re@freebsd.org(B Mah)
2007-09-08 11:35:11 +00:00
Robert Watson
ce4d8529e3 Continue UDP/UDPv6 synchronization project:
- Fix copyrights, comments in UDPv6.
- Remove macro defines for in6pcb and udp6stat.
- Consistently refer to inpcbs as 'inp' and not also 'in6p'.

Reviewed by:	gnn, jinmei, bz
Approved by:	re (bmah)
2007-09-08 08:18:24 +00:00
Randall Stewart
2afb3e849f - During shutdown pending, when the last sack came in and
the last message on the send stream was "null" but still
  there, a state we allow, we could get hung and not clean
  it up and wait for the shutdown guard timer to clear the
  association without a graceful close. Fix this so that
  that we properly clean up.
- Added support for Multiple ASCONF per new RFC. We only
  (so far) accept input of these and cannot yet generate
  a multi-asconf.
- Sysctl'd support for experimental Fast Handover feature. Always
  disabled unless sysctl or socket option changes to enable.
- Error case in add-ip where the peer supports AUTH and ADD-IP
  but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to
  ABORT in this case.
- According to the Kyoto summit of socket api developers
  (Solaris, Linux, BSD). We need to have:
   o non-eeor mode messages be atomic - Fixed
   o Allow implicit setup of an assoc in 1-2-1 model if
     using the sctp_**() send calls - Fixed
   o Get rid of HAVE_XXX declarations - Done
   o add a sctp_pr_policy in hole in sndrcvinfo structure - Done
   o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch!
- Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize
  when we close sending out the data and disabling Nagle.
- Change key concatenation order to match the auth RFC
- When sending OOTB shutdown_complete always do csum.
- Don't send PKT-DROP to a PKT-DROP
- For abort chunks just always checksums same for
  shutdown-complete.
- inpcb_free front state had a bug where in queue
  data could wedge an assoc. We need to just abandon
  ones in front states (free_assoc).
- If a peer sends us a 64k abort, we would try to
  assemble a response packet which may be larger than
  64k. This then would be dropped by IP. Instead make
  a "minimum" size for us 64k-2k (we want at least
  2k for our initack). If we receive such an init
  discard it early without all the processing.
- When we peel off we must increment the tcb ref count
  to keep it from being freed from underneath us.
- handling fwd-tsn had bugs that caused memory overwrites
  when given faulty data, fixed so can't happen and we
  also stop at the first bad stream no.
- Fixed so comm-up generates the adaption indication.
- peeloff did not get the hmac params copied.
- fix it so we lock the addr list when doing src-addr selection
  (in future we need to use a multi-reader/one writer lock here)
- During lowlevel output, we could end up with a _l_addr set
  to null if the iterator is calling the output routine. This
  means we would possibly crash when we gather the MTU info.
  Fix so we only do the gather where we have a src address
  cached.
- we need to be sure to set abort flag on conn state when
  we receive an abort.
- peeloff could leak a socket. Moved code so the close will
  find the socket if the peeloff fails (uipc_syscalls.c)

Approved by:	re@freebsd.org(Ken Smith)
2007-08-27 05:19:48 +00:00
Randall Stewart
c4739e2f47 - Fix address add handling to clear cached routes and source addresses
when peer acks the add in case the routing table changes.
- Fix sctp_lower_sosend to send shutdown chunk for mbuf send
  case when sndlen = 0 and sinfoflag = SCTP_EOF
- Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data,
  So that it does not send the "null" data mbuf out and cause
  it to get freed twice.
- Fix so auto-asconf sysctl actually effect the socket's asconf state.
- Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets.
- Memset bug in sctp_output.c (arguments were reversed) submitted
  found and reported by Dave Jones (davej@codemonkey.org.uk).
- PD-API point needs to be invoked >= not just > to conform to socket api
  draft this fixes sctp_indata.c in the two places need to be >=.
- move M_NOTIFICATION to use M_PROTO5.
- PEER_ADDR_PARAMS did not fail properly if you specify an address
  that is not in the association with a valid assoc_id. This meant
  you got or set the stcb level values instead of the destination
  you thought you were going to get/set. Now validate if the
  stcb is non-null and the net is NULL that the sa_family is
  set and the address is unspecified otherwise return an error.
- The thread based iterator could crash if associations were freed
  at the exact time it was running. rework the worker thread to
  use the increment/decrement to prevent this and no longer use
  the markers that the timer based iterator uses.
- Fix the memleak in sctp_add_addr_to_vrf() for the case when it is
  detected that ifa is already pointing to a ifn.
- Fix it so that if someone is so insane that they drop the
  send window below the minimal add mark, they still can send.
- Changed all state for associations to use mask safe macro.
- During front states in association freeing in sctp_inpcbfree, we
  had a locking problem where locks were not in place where they
  should have been.
- Free association calls were not testing the return value in
  sctp_inpcb_free() properly... others should be cast  void returns
  where we don't care about the return value.
- If a reference count is held on an assoc, even from the "force free"
  we should not do the actual free.. but instead let the timer
  free it.
- When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED
  flag is set, we must NOT process the packet but handle it like
  ootb. This is because while freeing an assoc we release the
  locks to get all the higher order locks so we can purge all
  the hash tables. This leaves a hole if a packet comes in
  just at that point. Now sctp_common_input_processing() will
  call the ootb code in such a case.
- Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes
  it so we don't have a conflict (I think this is a covertity change).
  We made this change AFTER some conversation and looking to make sure
  that M_PROTO5 does not have a problem between SCTP and the 802.11
  stuff (which is the only other place its used).
- Fixed lock order reversal and missing atomic protection around
  locked_tcb during association lookup and the 1-2-1 model.
- Added debug to source address selection.
- V6 output must always do checksum even for loopback.
- Remove more locks around inp that are not needed for an atomically
  added/subtracted ref count.
- slight optimization in the way we zero the array in sctp_sack_check()
- It was possible to respond to a ABORT() with bad checksum with
  a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT
  send a PKT-DROP to any ABORT().
- Add an option for local logging (useful for macintosh or when
  you need better performing during debugging). Note no commands
  are here to get the log info, you must just use kgdb.
- The timer code needs to be aware of if it needs to call
  sctp_sack_check() to slide the maps and adjust the cum-ack.
  This is because it may be out of sync cum-ack wise.
- Added threshold managment logging.
- If the user picked just the right size, that just filled the send
  window minus one mtu, we would enter a forever loop not copying and
  at the same time not blocking. Change from < to <= solves this.
- Sysctl added to control the fragment interleave level which defaults
  to 1.
- My rwnd control was not being used to control the rwnd properly (we
  did not add and subtract to it :-() this is now fixed so we handle
  small messages (1 byte etc) better to bring our rwnd down more
  slowly.

Approved by:	re@freebsd.org (Bruce Mah)
2007-08-24 00:53:53 +00:00
Bjoern A. Zeeb
cc977adc71 Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL.
Also rename the related functions in a similar way.
There are no functional changes.

For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.

With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.

The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.

Discussed at:			BSDCan 2007
Best new name suggested by:	rwatson
Reviewed by:			rwatson
Approved by:			re (bmah)
2007-08-05 16:16:15 +00:00
Robert Watson
9e7a99e592 Continue effort to improve parity between UDPv4 and UDPv6: add a missing
scope security check for the UDPv6 socket credential lookup service,
allowing security policies to bound access to credential information.
While not an immediate issue for Jail, which doesn't allow use of UDPv6,
this may be relevant to other security policies that may wish to control
ident lookups.

While here, eliminate a very unlikely panic case, in which a socket in
the process of being freed is inspected by the sysctl.

Approved by:	re (kensmith)
Reviewed by:	bz
2007-07-27 08:25:02 +00:00
Randall Stewart
1b649582bb - take out a needless panic under invariants for sctp_output.c
- Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than
   SCTP_SMALL_IOVEC_SIZE
 - re-add back inpcb_bind local address check bypass capability
 - Fix it so sctp_opt_info is independant of assoc_id postion.
 - Fix cookie life set to use MSEC_TO_TICKS() macro.
 - asconf changes
   o More comment changes/clarifications related to the old local address
    "not" list which is now an explicit restricted list.

   o Rename some functions for clarity:
     - sctp_add/del_local_addr_assoc to xxx_local_addr_restricted()
     - asconf related iterator functions to sctp_asconf_iterator_xxx()

   o Fix bug when the same address is deleted and added (and removed from
     the asconf queue) where the ifa is "freed" twice refcount wise,
     possibly freeing it completely.

   o Fix bug in output where the first ASCONF would not go out after the
     last address is changed (e.g. only goes out when retransmitted).

   o Fix bug where multiple ASCONFs can be bundled in the same packet with
     the and with the same serial numbers.

   o Fix asconf stcb iterator to not send ASCONF until after all work
     queue entries have been processed.

   o Change behavior so that when the last address is deleted (auto asconf
     on a bound all endpoint) no action is taken until an address is
     added; at that time, an ASCONF add+delete is sent (if the assoc
     is still up).

   o Fix local address counting so that address scoping is taken into
     account.

   o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending
     of ASCONF (after an RTO).  The default now is to send
     ASCONF immediately (except for the case of changing/deleting the
     last usable address).
Approved by:	re(ken smith)@freebsd.org
2007-07-24 20:06:02 +00:00
Robert Watson
8136d21ec0 Continue effort to align UDPv4 and UDPv6 implementations by merging
udp6_output() from udp6_output.c to udp6_usrreq.c, matching the UDPv4
structure, and allowing us to remove udp6_output.c.

Reviewed by:	bz, gnn
Approved by:	re (bmah)
2007-07-23 07:58:58 +00:00
Randall Stewart
52be287ebb - remove duplicate code from sctp_asconf.c
- remove duplicate #include <sys/priv.h> that is not under
   #ifdef FreeBSD version to allow compile on 6.1
- static analysis changes per the cisco SA tool including:
    o some SA_IGNORE comments
    o some checks for NULL before unlock.
    o type corrections int -> size_t
- Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this
   we pass a NULL in to bind on implicit assoc setup and crash  :-(
Approved by:	re@freebsd.org(Ken Smith)
2007-07-21 21:41:32 +00:00
Robert Watson
08af97b790 Attempt to improve feature parity between UDPv4 and UDPv6 by merging
UDPv4 features to UDPv6:

- Add MAC checks on delivery and MAC labeling on transmit.
- Check for (and reject) datagrams with destination port 0.
- For multicast delivery, check the source port only if the socket being
  considered as a destination has been connected.
- Implement UDP blackholing based on net.inet.udp.blackhole.
- Add a new ICMPv6 unreachable reply rate limiting category for failed
  delivery attempts and implement rate limiting for UDPv6 (submitted by
  bz).

Approved by:	re (kensmith)
Reviewed by:	bz
2007-07-19 22:34:25 +00:00
Bjoern A. Zeeb
8accf26fea Restore behavior changed with rev. 1.46 and make
IPV6_IPSEC_POLICY always visible again. This unbreaks some
third party user space applications.

PR:		114491
Reported by:	sumikawa
Reviewed by:	sumikawa
Approved by:	re (hrs)
2007-07-19 09:16:40 +00:00
Randall Stewart
18e198d3a3 - added pre-checks to the bindx call.
- use proper tick gathering macro instead of ticks directly.
- Placed reasonable boundaries on sets that a user can do
  that are converted to ticks from ms.
- Fix CMT_PF to always check to be sure CMT is on.
- Fix ticks use of CMT_PF.
- put back code to allow asconfs to be queued while INITs are in flight
  and before the assoc is established.
- During window probes, an ack'd packet might be left with the window
  probe mark on it causing it to be retransmitted. Change so that
  the flight decrease macro clears the window_probe mark.
- Additional logging flight size/reading and ASOC LOG. This
  is only enabled if you manually insert things into opt_sctp.h
  since its a set of debug code only.
- Found an interesting SMP race in the way data was appended which
  could cause a reader to lose a part of a message, had to
  reorder when we marked the message was complete to after
  the data was appended.
- bug in ADD-IP for the subset bound socket case when the peer has only
  one address
- fix ASCONF implicit success/error handling case
- proper support of jails in Freebsd 6>
- copy out the timeval for the 64 bit sparc world on cookie-echo
  alignment error crashes without this).
Approved by:	re(Ken Smith)
2007-07-17 20:58:26 +00:00
Randall Stewart
b54d3a6c48 - Modular congestion control, with RFC2581 being the default.
- CMT_PF states added (w/sysctl to turn the PF version on)
- sctp_input.c had a missing incr of cookie case when the
  auth was bad. This meant a free was called without an
  increment to refcnt, added increment like rest of code.
- There was a case, unlikely, when the scope of the destination
  changed (this is a TSNH case). In that case, it would not free
  the alloc'ed asoc (in sctp_input.c).
- When listed addresses found a colliding cookie/Init, then
  the collided upon tcb was not unlocked in sctp_pcb.c
- Add error checking on arguments of sctp_sendx(3) to prevent it from
  referencing a NULL pointer.
- Fix an error return of sctp_sendx(3), it was returing
  ENOMEM not -1.
- Get assoc id was changed to use the sanctified socket api
  method for getting a assoc id (PEER_ADDR_INFO instead of
  PEER_ADDR_PARAMS).
- Fix it so a peeled off socket will get a proper error return
  if it trys to send to a different address then it is connected to.
- Fix so that select_a_stream can avoid an endless loop that
  could hang a caller.
- time_entered (state set time) was not being set in all cases
  to the time we went established.
Approved by:	re(ken smith)
2007-07-14 09:36:28 +00:00
Robert Watson
542a638396 General style, white space, and comment cleanup; move to ANSI C
prototypes, don't use register, etc.  Synchronize structure and
layout to the IPv4 versions of these functions to a greater extent,
making visual comparison easier.

Remove now stale or incorrect comments.

Enable full lock assertions, and correct one exception handling
case where the wrong label was jumped to.

Tested by:	bz
Approved by:	re (bmah)
2007-07-09 17:47:04 +00:00
Xin LI
2a463222be Space cleanup
Approved by:	re (rwatson)
2007-07-05 16:29:40 +00:00
Xin LI
1272577e22 ANSIfy[1] plus some style cleanup nearby.
Discussed with:	gnn, rwatson
Submitted by:	Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by:	re (rwatson)
2007-07-05 16:23:49 +00:00
Peter Wemm
0273079097 Fix a stray splx() that caused a new warning.
Approved by:  re (rwatson)
2007-07-05 06:54:03 +00:00
Peter Wemm
edbb8b4600 Fix 'assignment used as truth value' warning
Approved by: re (rwatson)
2007-07-05 06:27:15 +00:00
George V. Neville-Neil
d8c2182456 Remove a last, dangling, file from the Kame IPsec code.
Approved by: re
Spotted by: rwatson, bz
2007-07-04 01:03:48 +00:00
Max Laier
60ee384760 Link pf 4.1 to the build:
- move ftp-proxy from libexec to usr.sbin
 - add tftp-proxy
 - new altq mtag link

Approved by:	re (kensmith)
2007-07-03 12:46:08 +00:00
George V. Neville-Neil
b2630c2934 Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing
2007-07-03 12:13:45 +00:00
George V. Neville-Neil
e66ff7fc8e Removing old, dead, KAME IPsec files as part of the move to the
new FAST_IPSEC based IPsec stack.

Approved by: re
Reviewed by: bz
2007-07-02 04:02:21 +00:00
George V. Neville-Neil
adb0e1681f Follow on cleanup and removal of two unnecessary include files.
Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 12:31:01 +00:00
George V. Neville-Neil
2cb64cb272 Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by:    bz
Approved by:    re
Supported by:   Secure Computing
2007-07-01 11:41:27 +00:00
Matt Jacob
0add0b912e gcc4.2 somehow doesn't believe that finaldst can stay stable between
where it's initialized and where it's checked twice such that the
origingal destination address is saved. Make it happier and trim
things down a bit.
2007-06-17 04:12:21 +00:00
Randall Stewart
e42a0f5e72 - For sctp_input/sctp6_input add announcment when a packet arrives (debug)
- re-factor the packet drop in sctp_output a bit more, we don't need the
   trim after all, but the size calc is now corrected.
 - When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user
   closes, it should not matter if data is queued, the assoc should be
   purged.
 - In error leg a missing free_chunk when iph comes in NULL (should not
   happen but just in case).
2007-06-17 01:36:02 +00:00
Matt Jacob
37f878f56c Garbage collect unused variables. 2007-06-15 22:56:12 +00:00
Randall Stewart
80fefe0a08 - Fix so ifn's are properly deleted when the ref count goes to 0.
- Fix so VRF's will clean themselves up when no references are around.
- Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass
  normal validation checks.
- turn auto-asconf off for subset bound sockets
- Moves all logging to use KTR. This gets rid of most
  of the logging #ifdef's with a few exceptions reducing
  the number of config options for SCTP.
2007-06-14 22:59:04 +00:00
Robert Watson
c2259ba44f Include priv.h to pick up suser(9) definitions, missed in an earlier
commit.

Warnings spotted by:	kris
2007-06-13 22:42:43 +00:00
Bruce M Simpson
71498f308b Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
 * IPv4 multicast socket processing is now moved out of ip_output.c
   into a new module, in_mcast.c.
 * The in_mcast.c module implements the IPv4 legacy any-source API in
   terms of the protocol-independent source-specific API.
 * Source filters are lazy allocated as the common case does not use them.
   They are part of per inpcb state and are covered by the inpcb lock.
 * struct ip_mreqn is now supported to allow applications to specify
   multicast joins by interface index in the legacy IPv4 any-source API.
 * In UDP, an incoming multicast datagram only requires that the source
   port matches the 4-tuple if the socket was already bound by source port.
   An unbound socket SHOULD be able to receive multicasts sent from an
   ephemeral source port.
 * The UDP socket multicast filter mode defaults to exclusive, that is,
   sources present in the per-socket list will be blocked from delivery.
 * The RFC 3678 userland functions have been added to libc: setsourcefilter,
   getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
 * Definitions for IGMPv3 are merged but not yet used.
 * struct sockaddr_storage is now referenced from <netinet/in.h>. It
   is therefore defined there if not already declared in the same way
   as for the C99 types.
 * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
   which are then interpreted as interface indexes) is now deprecated.
 * A patch for the Rhyolite.com routed in the FreeBSD base system
   is available in the -net archives. This only affects individuals
   running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
 * Make IPv6 detach path similar to IPv4's in code flow; functionally same.
 * Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from:  p4://bms_netdev
Submitted by:   Wilbert de Graaf (original work)
Reviewed by:    rwatson (locking), silence from fenner,
		net@ (but with encouragement)
2007-06-12 16:24:56 +00:00
Randall Stewart
35918f8571 - Restructure so bindx functions are not done inline to socket option
but are a seperate call that can be re-used if needed.
- 64 bit issues
  o re-arrange cookie so it is better 64 bit aligned
  o For wire level things we need the packed attribute.
2007-06-12 11:21:00 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
JINMEI Tatuya
5e9510e3b6 cleanup about the reassembly structures and routine:
- removed unused structure members
  - fixed a minor bug that the ECN code point may not be restored correctly

Approved by:	ume (mentor)
MFC after:	1 week
2007-06-04 06:06:35 +00:00
Randall Stewart
f4c93d2405 - fix initial pcb vrf setting when the initial vrf is not the
default_vrf_id
- Missing lock/unlock of inp added as well in the v6 side.
- IFN hash table moves to sctppcbinfo since indexes are
  unique across systems (including different VRFs) this makes it easier
  to do ifn lookups.
2007-06-02 11:05:08 +00:00
JINMEI Tatuya
09a52a5532 fixed memory leak for IPv6 multicast membership information associated
with interface addresses.

Approved by:	gnn (mentor)
MFC after:	1 week
2007-06-02 08:02:36 +00:00
JINMEI Tatuya
99124467fc simplified the fix in rev. 1.69 by replacing RT_REMREF+RT_UNLOCK with
RTFREE_LOCKED.

Approved by:	gnn (mentor)
2007-06-02 07:27:02 +00:00
Randall Stewart
ad21a36485 - Take out the broken table-id concept. Panda Routers have a M-VRF
concept that is NOT well thought out for a multi-homed transport
  protocol. So the useless table-id entries passed around need to
  be removed.
- Add a event timer for the zero copy api.
- Fix a bug in sctp_timer.c when searching for an alternate
  with the largest ssthresh (the compare was wrong).
2007-06-01 11:19:54 +00:00
Randall Stewart
207304d4b7 - Fixes so we won't try to start a timer when we
hold a wq lock for the iterator. Panda uses a
  silly recursive lock they hold through the timer.
- Add poor mans wireshark compile option..
- Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls.
- sysctl now will get back the refcnt for viewing by onlookers.

Reviewed by:	gnn
2007-05-29 09:29:03 +00:00
Randall Stewart
d61a0ae066 - fixed autclose to not allow setting on 1-2-1 model.
- bounded cookie-life to 1 second minimum in socket option set.
- Delayed_ack_time becomes delayed_ack per new socket api document.
- Improve port number selection, we now use low/high bounds and
  no chance of a endless loop. Only one call to random per bind
  as well.
- fixes so set_peer_primary pre-screens addresses to be
  valid to this host.
- maxseg did not allow setting on an assoc basis. We needed
  to thus track and use an association value instead of a inp value.
- Fixed ep get of HB status to report back properly.
- use settings flag to tell if assoc level hb is on off not
  the timer.. since the timer may still run if unconf address
  are present.
- check for crazy ENABLE/DISABLE conditions.
- set and get of pmtud (fixed path mtu) not always taking into account ovh.
- Getting PMTU info on stcb only needs to return PMTUD_ENABLED if
  any net is doing PMTU discovery.
- Panic or warning fixed to not do so when a valid ip frag is
  taking place.
- sndrcvinfo appearing in both inp and stcb was full size, instead
  of the non-pad version. This saves about 92 bytes from each struct
  by carefully converting to use the smaller version.
- one-2-one model get(maxseg) would always get ep value, never the
  tcb's value.
- The delayed ack time could be under a tick, this fixes so
  it bounds it to at least 1 tick for platforms whos tick
  is more than a ms.
- Fragment interleave level set to wrong default value.
- Fragment interleave could not set level 0.
- Defered stream reset was broken due to a guard check and ntohl issue.
- Found two lock order reversals and fixed.
- Tighten up address checking, if the user gives an address the sa_len
  had better be set properly.
- Get asoc by assoc-id would return a locked tcb when it was asked
  not to if the tcb was in the restart hash.
- sysctl to dig down and get more association details

Reviewed by:	gnn
2007-05-28 11:17:24 +00:00
JINMEI Tatuya
6abdc89958 do not directly call rtfree() to meet an assumption in the callee.
(this fix suppresses a warning message appearing in the boot time on
IPv6-enabled systems)

Approved by:	gnn (mentor)
2007-05-25 06:44:00 +00:00
Olivier Houchard
d10f3ce07f Force the alignment of the chars arrays, as they are casted later to
structs.
gcc 4.2 doesn't do it by default, and that results in unaligned access on
arm.

Reviewed by:	gnn, imp
2007-05-21 14:38:20 +00:00
JINMEI Tatuya
187069853c - Disabled responding to NI queries from a global address by default as
specified in RFC4620.  A new flag for icmp6_nodeinfo was added to enable the
  feature.
- Also cleaned up the code so that the semantics of the icmp6_nodeinfo
  flags is clearer (i.e., defined specific macro names instead of using
  hard-coded values).

Approved by:	gnn (mentor)
MFC after:	1 week
2007-05-17 21:20:24 +00:00
Randall Stewart
3c503c28da - Fixed 1-2-1 model to not worry about associd in sockopts
- Fixed RTOinfo for bounding.
- Fixed connect() to return ECONNREFUSED when an ABORT is received.
- Added comments to direct Static Analysis not to look at some things
  it does not understand (comments are /* sa_ignore XXXXX */)
- Bind when colliding was broken, missing not_found = 1 before
  checking to see if the port was in use caused endless bind loop.
- Cookie life needs to be in milliseconds to conform to socket api.
- Cookie life is not supposed to change if its 0, On the assoc
  level set we changed it to 0 opps.
- Two more static analysis issues identified by the cisco
  tool. Null checks needed.
- An issue for sendfile(). Need to validate the correct
  input argument.
- When sending failed due to a no route to host, we leaked
  the mbuf chain failing to call m_freem().
- Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined
Reviewed by:	gnn
2007-05-17 12:16:24 +00:00
JINMEI Tatuya
7eefde2c0c handle IPv6 router alert option contained in an incoming packet per
option value so that unrecognized options are ignored as specified in RFC2711.
(packets containing an MLD router alert option are passed to the upper layer
as before).

Approved by: gnn (mentor), ume (mentor)
2007-05-14 17:56:13 +00:00
Robert Watson
54d642bbe5 Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
2007-05-11 10:20:51 +00:00
Matt Jacob
b065259568 Need sys/cdevs.h for the macro FBSDID to work. 2007-05-09 23:19:55 +00:00
George V. Neville-Neil
559d3390d0 Integrate the Camellia Block Cipher. For more information see RFC 4132
and its bibliography.

Submitted by:   Tomoyuki Okazaki <okazaki at kick dot gr dot jp>
MFC after:      1 month
2007-05-09 19:37:02 +00:00
Randall Stewart
ad81507eed Two major items here:
- All printf that was surrounded by #ifdef SCTP_DEBUG moves to
  a macro that does all of this. This removes all printfs from
  the code and makes the code more portable and easier to
  read.
- Static Analysis (cisco) - found a few bugs, but mostly we
  add checks for NULL pointers and such to make the tool
  happy. We now pass the Cisco SA tools checks except for
  where it does not understand tailq/lists. We still need
  to look at the coverity tools output too (this is like
  the cisco SA tool) and see if it wants us to fix any other
  items. Hopefully this will be the last major churn in the
  code other than bug fixes.
2007-05-09 13:30:06 +00:00
George V. Neville-Neil
62c4e3f043 Reduce the default number of header options that the IPv6 protocol
stack will process from 50 to 15.  As this is a sysctl variable it
can be tuned up or down at the user/administrator's whim.

Submitted by:	itojun
MFC after:	1 day
2007-05-08 20:11:36 +00:00
Randall Stewart
b100636770 - Copyright change, cisco's silly tool wants it to say:
"Copyright (c) 2001-2007, by Cisco Systems,"
   instead of
       *Copyright (c) 2001-2007, Cisco Systems,"

-  Also fix a few straglers that were still in 2006.
2007-05-08 17:01:12 +00:00
Randall Stewart
b0552ae214 - Get rid of the sctp_inpcb_free() "magic numbers", now they
are sensible defines that tell what you are directing
   the function to do.
2007-05-08 15:53:03 +00:00
Randall Stewart
6e55db5445 - Static analyisis fixes for cisco's commit (this is equivilant
to the coverity tool.. may even be the same one.. not sure).
-  A bug in the way sctp_abort() and friends were
   setting the IP_CLOSE flag.. and NOT passing the
   last argument as a (,1)... so that things would
   get freed..
2007-05-08 14:32:53 +00:00
Randall Stewart
17205ecc85 - More macros for OS compatabilty
-  PR-SCTP would ignore FWD-TSN's above a rwnd's worth
   of TSN's (1 byte msgs).. this left the peer hopelessly
   out of sync.. or an attacker. So now we abort the assoc.
-  New IFN hash, also rename hashes to match addr/ifn now
   that the vrf has multiple.
-  Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default
   as defined in the Socket API ID.
-  Export MTU information via sysctl.
-  Vrf's need table id's. This is default for
   BSD, but may be other things later when BSD
   fully supports VRFs.
-  Additional stream reset bug (caught by cisco dev-test).
-  Additional validations for the address in sending a message (socket api).
-------- and -----
-  Fix association notifications not to give the active open
   side false notifications.
-  Fix so sendfile and SENDALL will work properly (missing
   flag to say socket sender is done).
-  Fix Bug that prevented COOKIES from being retransmitted.
-  Break out connectx into helper sub-models so that iox routines can
   reuse the helpers.
-  When an address is added during system init (non-dynamic mode) make
   sure that the "defer use" flag is not set.
** its compiling on XR now :-D **

Reviewed by:	gnn
2007-05-08 00:21:05 +00:00
SUZUKI Shinsuke
8f34a8b84a some minor modification to the previous commit to sys/netinet6/nd6.c and nd6_nbr.c.
- added some clarification comments
- removed an unnecesary code

Obtained from: KAME
MFC after: 1 week
2007-05-05 04:24:01 +00:00
SUZUKI Shinsuke
8d290a593f fixed a memory leak in unresolved ND queue processing
Obtained from: KAME
MFC after: 1 week
2007-05-04 02:34:17 +00:00
Randall Stewart
d06c82f169 - Somehow the disable fragment option got lost. We could
set/clear it but would not do it. Now we will.
-  Moved to latest socket api for extended sndrcv info struct.
-  Moved to support all new levels of fragment interleave (0-2).
-  Codenomicon security test updates - length checks and such.
-  Bug in stream reset (2 actually).
-  setpeerprimary could unlock a null pointer, fixed.
-  Added a flag in the pcb so netstat can see if we are listening easier.

Obtained from:	(some of the Listen changes from Weongyo Jeong)
2007-05-02 12:50:13 +00:00
Robert Watson
84ca8aa609 Remove unused pcbinfo arguments to in_setsockaddr() and
in_setpeeraddr().
2007-05-01 16:31:02 +00:00
Robert Watson
712fc218a0 Rename some fields of struct inpcbinfo to have the ipi_ prefix,
consistent with the naming of other structure field members, and
reducing improper grep matches.  Clean up and comment structure
fields in structure definition.
2007-04-30 23:12:05 +00:00
George V. Neville-Neil
6486cbd7bb Turn off route header processing for now due to issues pointed out
by Philippe Biondi and Arnaud Ebalard.  This is a temporary fix
until more discussion can be had on the exact risks involved in
allowing source routing in IPv6

Submitted by:	itojun
Reviewed by:	jinmei
MFC after:	1 day
2007-04-23 09:32:04 +00:00
Robert Watson
fea9ea0005 Teach netinet6 to use PRIV_NETINET_REUSEPORT. 2007-04-21 18:14:04 +00:00
Randall Stewart
c105859eee - fix source address selection when picking an acceptable address
- name change of prefered -> preferred
- CMT fast recover code added.
- Comment fixes in CMT.
- We were not giving a reason of cant_start_asoc per socket api
  if we failed to get init/or/cookie to bring up an assoc. Change
  so we don't just give a generic "comm lost" but look at actual
  states of dying assoc.
- change "crc32" arguments to "crc32c" to silence strict/noisy
  compiler warnings when crc32() is also declared
- A few minor tweaks to get the portable stuff truely portable
  for sctp6_usrreq.c :-D
- one-2-one style vrf match problem.
- window recovery would leave chks marked for retran
  during window probes on the sent queue. This would then
  cause an out-of-order problem and assure that the flight
  size "problem" would occur.
- Solves a flight size logging issue that caused rwnd
  overruns, flight size off as well as false retransmissions.g
- Macroize the up and down of flight size.
- Fix a ECNE bug in its counting.
- The strict_sacks options was causing aborts when window probing
  was active, fix to make strict sacks a bit smarter about what
  the next unsent TSN is.
- Fixes a one-2-one wakeup bug found by Martin Kulas.
- If-defed out form, Andre's copy routines pending his
  commit of at least m_last().. need to adjust for 6.2 as
  well.. since m_last won't exist.
Reviewed by:	gnn
2007-04-14 09:44:09 +00:00
Robert Watson
949da0d8f8 Remove obsolete comment about privileges: SUSER_ALLOWJAIL is no longer set
in this code.
2007-04-11 16:31:02 +00:00
Randall Stewart
bff64a4db3 - fixed several places where we did not release INP locks.
- fixed a refcount bug in the new ifa structures.
- use vrf's from default stcb or inp whenever possible.
- Address limits raised to account for a full IP fragmented
  packet (1000 addresses).
- flight size correcting updated to include one message only
  and to handle case where the peer does not cumack the
  next segment aka lists 1/1 in sack blocks..
- Various bad init/init-ack handling could cause a panic
  since we tried to unlock the destroyed mutex. Fixes
  so we properly exit when we need to destroy an assoc.
  (Found by Cisco DevTest team :D)
- name rename in src-addr-selection from pass to sifa.
- route structure typedef'd to allow different platforms
  and updated into sctp_os_bsd file.
- Max retransmissions a chunk can be made added.
Reviewed by:	gnn
2007-04-03 11:15:32 +00:00
John Baldwin
4e7f640dfb Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks.  The algorithms for
manipulating the lock cookie are very similar to that rwlocks.  This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior.  The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option.  Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels.  As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after:	1 month
Submitted by:	attilio
Tested by:	kris, pjd
2007-03-31 23:23:42 +00:00
Randall Stewart
5e54f665f0 - Found bug in min split point bundling which caused
incorrect, non-bundlable fragmentation.
- Added min residual to better control split points for
  both how big a msg must be as well as how much needs
  to be left over.
- With our new algo in place, we need to implicitly
  set "end of msg" on the sp-> structure otherwise we
  end up with "hung" associations.
- Room reserved up front in IP header by pushing IP
  header to back of mbuf.
- Fix so FR's peg count of retransmissions needed.
- Fix so an unlucky chunk that never gets across
  will kill the assoc via the kill timer and send an
  abort too.
- Fix bug in sctp_input which can result in a crash.
- Do not strip off IP options anymore.
- Clean up sctp_calculate_rto().
- Get rid of unused sysctl.
- Fixed so we discard all M-Cast
- Fixed so port check done AFTER checksum
- Fixed bug in fragmentation code that prevented
  us from fragmenting a small complete message when
  we needed to.
- Window probes were not marked back to unsent and
  flight adjusted when a sack came in with no
  window change or accepting of the probe data.
  We now fix this with having a mark on the net and
  the chunk so we can clear it out when the sack arrives
  forcing it to retran just like it was "new" this
  improves the handling of window probes, which were
  dropped by the receiver.
- Tighten AUTH protocol error checks during INIT/INIT-ACK exchange
2007-03-31 11:47:30 +00:00
Bruce M Simpson
ec002fee99 Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from:	p4 branch bms_netdev
Reviewed by:	andre
Sponsored by:	Garance A Drosehn
Idea from:	NetBSD
MFC after:	1 month
2007-03-20 00:36:10 +00:00
Randall Stewart
132dea7d5a - errno -> becomes error in sctp_output.c and sctputil.c
- SB_CLEAR macro defined and used for sb clearing.
- Fix for CMT express_sack_handling did not do proper
  pseudo-cumack updates.
- Get rid of extraneous function that was never used ip_2_ip6_hdr()
- Fixed source address selection bug (initialization problem).
- Source address selection debug added.
2007-03-19 06:53:02 +00:00
Randall Stewart
42551e993f - Sysctl's move to seperate file
- moved away from ifn/ifa access to sctp_ifa/sctp_ifn
  built and managed by the add-ip code.
- cleaned up add-ip code to use the iterator
- made iterator be a thread, which enables auto-asconf now.
- rewrote and cleaned up source address selection (also
  made it use new structures).
- Fixed a couple of memory leaks.
- DACK now settable as to how many packets to delay as
  well as time.
- connectx() to latest socket API, new associd arg.
- Fixed issue with revoking and loosing potential to
  send when we inflate the flight size. We now inflate
  the cwnd too and deflate it later when the revoked
  chunk is sent or acked.
- Got rid of some temp debug code
- src addr selection moved to a common file (sctp_output.c)
- Support for simple VRF's (we have support for multi-vfr
  via compile switch that is scrubbed from BSD but we won't
  need multi-vrf until we first get VRF :-D)
- Rest of mib work for address information now done
- Limit number of addresses in INIT/INIT-ACK to
  a #def (30).

Reviewed by:	gnn
2007-03-15 11:27:14 +00:00
Bruce M Simpson
00cf3f55fb Add comments about common idioms for cleanup pass at a later date. 2007-02-28 21:58:37 +00:00
Bruce M Simpson
cd88c37218 Remove code which would never be used, viz a viz Quality-of-Service;
the token bucket filter got killed in netinet, so it gets killed here
too. Correct comments.
2007-02-28 20:32:25 +00:00
Bruce M Simpson
430fc8f211 Add a comment about a struct which needs to be global.
Remove an unused global variable.
Staticize variables which do not need to be global.
2007-02-28 20:29:20 +00:00
Bruce M Simpson
1291e2a0eb Fix tinderbox. ip6_mrouter should be defined in raw_ip6.c as it is
tested to determine if the userland socket is open; this, in turn, is
used to determine if the module has been loaded.

Tested with:	LINT
2007-02-24 21:09:35 +00:00
Bruce M Simpson
6be2e366d6 Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
2007-02-24 11:38:47 +00:00
Robert Watson
afdb42748d Rename two identically named log_in_vain variables: tcp_input.c's static
log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to
udp_log_in_vain.

MFC after:	1 week
2007-02-20 10:20:03 +00:00
Randall Stewart
f42a358a6f - Copyright updates (aka 2007)
- ZONE get now also take a type cast so it does the
  cast like mtod does.
- New macro SCTP_LIST_EMPTY, which in bsd is just
  LIST_EMPTY
- Removal of const in some of the static hmac functions
  (not needed)
- Store length changes to allow for new fields in auth
- Auth code updated to current draft (this should be the
  RFC version we think).
- use uint8_t instead of u_char in LOOPBACK address comparison
- Some u_int32_t converted to uint32_t (in crc code)
- A bug was found in the mib counts for ordered/unordered
  count, this was fixed (was referencing a freed mbuf).
- SCTP_ASOCLOG_OF_TSNS added (code will probably disappear
  after my testing completes. It allows us to keep a
  small log on each assoc of the last 40 TSN's in/out and
  stream assignment. It is NOT in options and so is only
  good for private builds.
- Some CMT changes in prep for Jana fixing his problem
  with reneging when CMT is enabled (Concurrent Multipath
  Transfer = CMT).
- Some missing mib stats added.
- Correction to number of open assoc's count in mib
- Correction to os_bsd.h to get right sha2 macros
- Add of special AUTH_04 flags so you can compile the code
  with the old format (in case the peer does not yet support
  the latest auth code).
- Nonce sum was incorrectly being set in when ecn_nonce was
  NOT on.
- LOR in listen with implicit bind found and fixed.
- Moved away from using mbuf's for socket options to using
  just data pointers. The mbufs were used to harmonize
  NetBSD code since both Net and Open used this method. We
  have decided to move away from that and more conform to
  FreeBSD style (which makes more sense).
- Very very nasty bug found in some of my "debug" code. The
  cookie_how collision case tracking had an endless loop in
  it if you got a second retransmission of a cookie collision
  case. This would lock up  a CPU .. ugly..
- auth function goes to using size_t instead of int which
  conforms to socketapi better
- Found the nasty bug that happens after 9 days of testing.. you
  get the data chunk, deliver it and due to the reference to a ch->
  that every now and then has been deleted (depending on the postion
  in the mbuf) you have an invalid ch->ch.flags.. and thus you don't
  advance the stream sequence number.. so you block the stream
  permanently. The fix is to make local variables of these guys
  and set them up before you have any chance of trimming the
  mbuf.
- style fix in sctp_util.h, not sure how this got bad maybe in
  the last patch? (aka it may not be in the real source).
- Found interesting bug when using the extended snd/rcv info where
  we would get an error on receiving with this. Thats because
  it was NOT padded to the same size as the snd_rcv info. We
  increase (add the pad) so the two structs are the same size
  in sctp_uio.h
- In sctp_usrreq.c one of the most common things we did for
  socket options was to cast the pointer and validate the size.
  This as been macro-ized to help make the code more readable.
- in sctputil.c two things, the socketapi class found a missing
  flag type (the next msg is a notification) and a missing
  scope recovery was also fixed.

Reviewed by:	gnn
2007-02-12 23:24:31 +00:00
Bruce M Simpson
31a9460383 In the ICMP6 path to handle FQDN 'who-are-you' queries, check that the
packet header mbuf is non-NULL before trying to create a duplicate of it.

PR:		95957
Reviewed by:	ume
MFC after:	3 days
2007-02-10 12:25:19 +00:00
Bruce M Simpson
6ede684320 MFC after: 3 days 2007-02-05 11:05:41 +00:00
Hajimu UMEMOTO
c57086ced7 ng_iface requiers neighbor cache as well.
MFC after:	3 days
2007-02-03 09:34:36 +00:00
Bruce A. Mah
f234bea7d7 Revert nd6.c revs. 1.67, 1.68, 1.69, 1.70 in an attempt to unbreak
IPv6 over point-to-point gif(4) tunnels.

These revisions caused a host route to the destination of a
point-to-point gif(4) interface to not get installed when the interface
and destination addresses were assigned.  This caused
"no route to host" errors when trying to send traffic over the
interface.  The first packet arriving inbound over the tunnel,
however, would cause the correct route to get installed, allowing
subsequent outbound traffic to be routed correctly.

gif(4) interfaces with prefix lengths of less than 128 bits
(i.e. no explicit destination address assigned) were not affected
by this bug.

This bug fix is a possible candidate for a 6.2-RELEASE errata note.

Approved by:	jhay (original committer)
Discussed with:	jhay, JINMEI Tatuya
MFC after:	3 days
2007-01-26 23:22:58 +00:00
Randall Stewart
93164cf98c - most all includes (#include <>) migrate to the sctp_os_bsd.h file
- Finally all splxx() are removed
 - Count error fixed in mapping array which might
   cause a wrong cumack generation.
 - Invariants around panic for case D + printf when no invariants.
 - one-to-one model race condition fixed by using
   a pre-formed connection and then completing the
   work so accept won't happen on a non-formed
   association.
 - Some additional paranoia checks in sctp_output.
 - Locks that were missing in the accept code.

Approved by:	gnn
2007-01-18 09:58:43 +00:00
Hajimu UMEMOTO
6a550ab34b Avoid infinite loop if nicmp6 and nip6 are not on the same mbuf.
NetBSD PR 34994+35333

MFC after:	3 days
2007-01-16 15:55:29 +00:00
Randall Stewart
44b7479ba2 - Macroizes the V6ONLY flag check.
- Added a short time wait (not used yet) constant
- Corrected the type of the crc32c table (it was
  unsigned long and really is a uint32_t
- Got rid of the user of MHeaders until they
  are truely needed by lower layers.
- Fixed an initialization problem in the readq structure
  (ordering was off).
- Found yet another collision bug when the random number
  generator returns two numbers on one side (during a collision)
  that are the same. Also added some tracking of cookies
  that will go away when we know that we have the last collision
  bug gone.
- Fixed an init bug for book_size_scale, that was causing
  Early FR code to run when it should not.
- Fixed a flight size tracking bug that was associated with
  Early FR but due to above bug also effected all FR's
- Fixed it so Max Burst also will apply to Fast Retransmit.
- Fixed a bug in the temporary logging code that allowed a
  static log array overflow
- hashinit_flags is now used.
- Two last mcopym's were converted to the macro sctp_m_copym that
  has always been used by all other places
- macro sctp_m_copym was converted to upper case.
- We now validate sinfo_flags on input (we did not before).
- Fixed a bug that prevented a user from sending data and immediately
  shuting down with one send operation.
- Moved to use hashdestroy instead of free() in our macros.
- Fixed an init problem in our timed_wait vtag where we
  did not fully initialize our time-wait blocks.
- Timer stops were re-positioned.
- A pcb cleanup method was added, however this probably will
  not be used in BSD.. unless we make module loadable protocols
- I think this fixes the mysterious timer bug.. it was a
  ordering of locks problem in the way we did timers. It
  now conforms to the timeout(9) manual (except for the
  _drain part, we had to do this a different way due
  to locks).
- Fixed error return code so we get either CONNREUSED or CONNRESET
  depending on where one is in progression
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
  error when the connection was reset.
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
  error when the connection was reset.
Approved by:	gnn
2007-01-15 15:12:10 +00:00
Warner Losh
1c0ee39e74 Marked these as packed correctly 2007-01-12 07:20:25 +00:00
Randall Stewart
139bc87fda a) macro-ization of all mbuf and random number
access plus timers. This makes the code
   more portable and able to change out the
   mbuf or timer system used more easily ;-)
b) removal of all use of pkt-hdr's until only
   the places we need them (before ip_output routines).
c) remove a bunch of code not needed due to <b> aka
   worrying about pkthdr's :-)
d) There was one last reorder problem it looks where
   if a restart occur's and we release and relock (at
   the point where we setup our alias vtag) we would
   end up possibly getting the wrong TSN in place. The
   code that fixed the TSN's just needed to be shifted
   around BEFORE the release of the lock.. also code that
   set the state (since this also could contribute).
Approved by:	gnn
2006-12-29 20:21:42 +00:00
Bjoern A. Zeeb
e521ae0c64 In ip6_sprintf print the addresses in a more common/readable
format eliminating leading zeros like in :0001 -> :1.

Reviewed by:	mlaier
2006-12-16 14:15:31 +00:00
Randall Stewart
a5d547add3 1) Fixes on a number of different collision case LOR's.
2) Fix all "magic numbers" to be constants.
3) A collision case that would generate two associations to
   the same peer due to a missing lock is fixed.
4) Added tracking of where timers are stopped.
Approved by:	gnn
2006-12-14 17:02:55 +00:00
Bjoern A. Zeeb
1d54aa3ba9 MFp4: 92972, 98913 + one more change
In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
2006-12-12 12:17:58 +00:00
Ruslan Ermilov
f9a047a1b7 - In nd6_rtrequest(), when caching an rtentry, don't forget
to add a reference to it; otherwise, we could later access
  a freed memory.  This is believed to fix panics some users
  were observing when running route6d(8), and is similar to
  the fix in sys/netinet/if_ether.c,v 1.139 by glebius@.

PR:		kern/93910, kern/105437
Testing by:	Wojciech Puchar (still ongoing)

- Add rtentry locking to nd6_output() similar to rt_check().

MFC after:	4 days
2006-11-25 20:38:56 +00:00
Randall Stewart
03b0b02163 -Fixes first of all the getcred on IPv6 and V4. The
copy's were incorrect and so was the locking.
-A bug was also found that would create a race and
 panic when an abort arrived on a socket being read
 from.
-Also fix the reader to get MSG_TRUNC when a partial
 delivery is aborted.
-Also addresses a couple of coverity caught error path
 memory leaks and a couple of other valid complaints
Approved by:	gnn
2006-11-08 00:21:13 +00:00
Robert Watson
b96fbb37da Convert three new suser(9) calls introduced between when the priv(9)
patch was prepared and committed to priv(9) calls.  Add XXX comments
as, in each case, the semantics appear to differ from the TCP/UDP
versions of the calls with respect to jail, and because cr_canseecred()
is not used to validate the query.

Obtained from:	TrustedBSD Project
2006-11-06 14:54:06 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Randall Stewart
50cec91936 Tons of fixes to get all the 64bit issues removed.
This also moves two 16 bit int's to become 32 bit
values so we do not have to use atomic_add_16.
Most of the changes are %p, casts and other various
nasty's that were in the orignal code base. With this
commit my machine will now do a build universe.. however
I as yet have not tested on a 64bit machine .. it may not work :-(
2006-11-05 13:25:18 +00:00
Randall Stewart
73932c69b6 Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I
inserted a few to the new files.. but I falied to
add the #include <sys/cdef.h>

Which causes a compile error.. sorry about that... got it
now :-)

Approved by:gnn
2006-11-03 17:21:53 +00:00
Randall Stewart
f8829a4a40 Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by:	gnn
Approved by:	gnn
2006-11-03 15:23:16 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Hajimu UMEMOTO
e7e51bc3e1 Make net.inet6.ip6.auto_linklocal tunable. Someone may want to
enable/disable auto_linklocal even in single user mode.

Discussed with:	re@, gnn@
MFC after:	3 days
2006-10-13 12:45:53 +00:00
Hajimu UMEMOTO
f5c04409eb Revert the default value of net.inet6.ip6.auto_linklocal to 1.
If ipv6_enable is not set to "YES", net.inet6.ip6.auto_linklocal
is turned to 0 at boot.

Discussed with:	re@, gnn@
MFC after:	3 days
2006-10-13 12:41:36 +00:00
John Hay
ae0ddac700 Hopefully the last tweak in trying to make it possible to add ipv6 direct
host routes without side effects.

Submitted by:	JINMEI Tatuya
MFC after:	4 days
2006-10-02 19:15:10 +00:00
George V. Neville-Neil
90ce6fa1c8 Turn off automatic link local address if ipv6_enable is not set to YES
in rc.conf

Reviewed by:    KAME core team, cperciva
MFC after:      3 days
2006-10-02 10:13:30 +00:00
John Hay
584b68e792 A better fix is to check if it is a host route.
Submitted by:	ume
MFC after:	5 days
2006-09-30 20:25:33 +00:00
John Hay
c482f11edb My previous commit broke "route add -inet6 <network_addr> -interface gif0".
Fix that by excluding point-to-point interfaces.

MFC after:	5 days
2006-09-30 14:08:57 +00:00
Bruce M Simpson
910e1364b6 Nits.
Submitted by:	ru
2006-09-29 16:16:41 +00:00
Bruce M Simpson
2d20d32344 Push removal of mrouted down to the rest of the tree. 2006-09-29 15:45:11 +00:00
SUZUKI Shinsuke
831c32014e fixed a bug that IPv6 packets arriving to stf are not accepted.
(a degrade introduced in in6.c Rev 1.61)

PR: kern/103415
Submitted by: JINMEI Tatuya
MFC after: 1 week
2006-09-22 01:42:22 +00:00
John Hay
f129892448 Make it possible to add an IPv6 host route to a host directly connected.
Use something like this:
route add -inet6 <dest_addr> <my_addr_on_that_interface> -interface -llinfo

This is usefull for wireless adhoc mesh networks.

MFC after:	5 days
2006-09-16 06:24:28 +00:00
John Hay
1fcae350ae All multicast listeners on a port should get one copy of the packet. This
was broken during the locking changes.
2006-09-07 18:44:54 +00:00
Andre Oppermann
233dcce118 First step of TSO (TCP segmentation offload) support in our network stack.
o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6
 o add CSUM_TSO flag to mbuf pkthdr csum_flags field
 o add tso_segsz field to mbuf pkthdr
 o enhance ip_output() packet length check to allow for large TSO packets
 o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities
 o adjust all callers of tcp_maxmtu[46]() accordingly

Discussed on:	-current, -net
Sponsored by:	TCP/IP Optimization Fundraise 2005
2006-09-06 21:51:59 +00:00
John Hay
80a684e083 Use net.inet6.ip6.redirect / ip6_sendredirects as part of the decision
to generate icmp6 redirects. Now it is possible to switch redirects off.

MFC after:	1 week
2006-09-05 19:20:42 +00:00
Brooks Davis
43bc7a9c62 With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.
2006-08-04 21:27:40 +00:00
Robert Watson
c9db0fad09 Align IPv6 socket locking with IPv4 locking: lock socket buffer explicitly
and use _locked variants to avoid extra lock and unlock operations.

Reviewed by:	gnn
MFC after:	1 week
2006-07-23 12:24:22 +00:00
George V. Neville-Neil
c6af35ee0e The KAME project ceased work on IPv6 and IPSec in March of 2006.
Remove the README file which warns against cosmetic or local only
changes.  FreeBSD committers should now feel free to work on the
IPv6 and IPSec code without fetters.  The KAME mailing lists still
exist and it is always a good idea to ask questions about this code
on the snap-users@kame.net mailing list.

Reviewed by:	rwatson, brooks
2006-07-22 02:32:32 +00:00
Robert Watson
a152f8a361 Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket.  pru_abort is now a
notification of close also, and no longer detaches.  pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket.  This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree().  With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by:	gnn
2006-07-21 17:11:15 +00:00
Stephan Uphoff
d915b28015 Fix race conditions on enumerating pcb lists by moving the initialization
( and where appropriate the destruction) of the pcb mutex to the init/finit
functions of the pcb zones.
This allows locking of the pcb entries and race condition free comparison
of the generation count.
Rearrange locking a bit to avoid extra locking operation to update the generation
count in in_pcballoc(). (in_pcballoc now returns the pcb locked)

I am planning to convert pcb list handling from a type safe to a reference count
model soon. ( As this allows really freeing the PCBs)

Reviewed by:	rwatson@, mohans@
MFC after:	1 week
2006-07-18 22:34:27 +00:00
Oleg Bulyzhin
6372145725 Complete timebase (time_second -> time_uptime) conversion.
PR:		kern/94249
Reviewed by:	andre (few months ago)
Approved by:	glebius (mentor)
2006-07-05 23:37:21 +00:00
Yaroslav Tykhiy
4e6098c6a4 We needn't check "m" for NULL here because "off" should be within
the mbuf chain.  If we ever get a buggy caller, a bogus "off" should
be caught by the sanity check at the function entry.  Null "m" here
means a very unusual condition of a totally broken mbuf chain (wrong
m_pkthdr.len or whatever), so we can just page fault later.

Found by:	Coverity Prevent(tm)
CID:		825
2006-06-30 18:25:07 +00:00
Yaroslav Tykhiy
4b97d7affd There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures.  Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by:	glebius, jhb, ru
2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy
40e4360c10 Use queue(3) macros instead of accessing list/queue internals directly. 2006-06-29 16:56:07 +00:00
Bjoern A. Zeeb
421d8aa603 Use INPLOOKUP_WILDCARD instead of just 1 more consistently.
OKed by: rwatson (some weeks ago)
2006-06-29 10:49:49 +00:00
Pawel Jakub Dawidek
5279398812 - Use suser_cred(9) instead of directly comparing cr_uid.
- Compare pointer with NULL, instead of 0.

Reviewed by:	rwatson
2006-06-27 11:40:05 +00:00
Pawel Jakub Dawidek
835d4b8924 - Use suser_cred(9) instead of directly checking cr_uid.
- Change the order of conditions to first verify that we actually need
  to check for privileges and then eventually check them.

Reviewed by:	rwatson
2006-06-27 11:35:53 +00:00
Robert Watson
1e0acb6801 Use suser_cred() instead of a direct comparison of cr_uid with 0 in
rip6_output().

MFC after:	1 week
2006-06-25 13:54:59 +00:00
George V. Neville-Neil
a59af512d4 Fix spurious warnings from neighbor discovery when working with IPv6 over
point to point tunnels (gif).

PR:		93220
Submitted by:	Jinmei Tatuya
MFC after:	1 week
2006-06-08 00:31:17 +00:00
Seigo Tanimura
f8366b0334 Avoid spurious release of an rtentry. 2006-05-23 00:32:22 +00:00
Bjoern A. Zeeb
93e4f81d9f In IN6_IS_ADDR_V4MAPPED case instead of returning directly set error and
goto out so that locks will be dropped.

Reviewed by: rwatson, gnn
2006-05-20 13:26:08 +00:00
Max Laier
656faadcb8 Remove ip6fw. Since ipfw has full functional IPv6 support now and - in
contrast to ip6fw - is properly lockes, it is time to retire ip6fw.
2006-05-12 20:39:23 +00:00
Bjoern A. Zeeb
1b34a059cb Assert ip6_forward_rt protected by Giant adding GIANT_REQUIRED to
functions not yet asserting it but working on global ip6_forward_rt
route cache which is not locked and perhaps should go away in the
future though cache hit/miss ration wasn't bad.

It's #if 0ed in frag6 because the code working on ip6_forward_rt is.
2006-05-04 18:41:08 +00:00
Robert Watson
20e3d71cdd Break out socket access control and delivery logic from udp6_input()
into its own function, udp6_append().  This mirrors a similar structure
in udp_input() and udp_append(), and makes the whole thing a lot more
readable.

While here, add missing inpcb locking in UDP6 input path.

Reviewed by:	bz
MFC after:	3 months
2006-05-01 21:39:48 +00:00
Robert Watson
8deea4a8f3 Move lock assertions to top of in6_pcbladdr(): we still want them to run
even if we're going to return an argument-based error.

Assert pcbinfo lock in in6_pcblookup_local(), in6_pcblookup_hash(), since
they walk pcbinfo inpcb lists.

Assert inpcb and pcbinfo locks in in6_pcbsetport(), since
port reservations are changing.

MFC after:	3 months
2006-04-25 12:09:58 +00:00
Robert Watson
04f2073775 Modify in6_pcbpurgeif0() to accept a pcbinfo structure rather than a pcb
list head structure; this improves congruence to IPv4, and also allows
in6_pcbpurgeif0() to lock the pcbinfo.  Modify in6_pcbpurgeif0() to lock
the pcbinfo before iterating the pcb list, use queue(9)'s LIST_FOREACH()
for the iteration, and to lock individual inpcb's while manipulating
them.

MFC after:	3 months
2006-04-23 15:06:16 +00:00
Paul Saab
4f590175b7 Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.
2006-04-21 09:25:40 +00:00
Robert Watson
086dafc15b Mirror IPv4 pcb locking into in6_setsockaddr() and in6_setpeeraddr():
acquire inpcb lock when reading inpcb port+address in order to prevent
races with other threads that may be changing them.

MFC after:	3 months
2006-04-15 05:24:23 +00:00
Robert Watson
8511b981f6 Assert the inpcb lock in udp6_output(), as we dereference various
fields.

MFC after:	3 months
2006-04-12 03:34:22 +00:00
Robert Watson
dec8026073 Add comment to udp6_input() that locking is missing from multicast
UDPv6 delivery.

Lock the inpcb of the UDP connection being delivered to before
processing IPSEC policy and other delivery activities.

MFC after:	3 months
2006-04-12 03:32:54 +00:00
Robert Watson
5383103aa0 Add udbinfo locking in udp6_input() to protect lookups of the inpcb
lists during UDPv6 receipt.

MFC after:	3 months
2006-04-12 03:23:56 +00:00
Robert Watson
ff7425ced0 Don't use spl around call to in_pcballoc() in IPv6 raw socket support;
all necessary synchronization appears present.

MFC after:	3 months
2006-04-12 03:07:22 +00:00
Robert Watson
41ba156433 Remove one remaining use of spl in the IPv6 fragmentation code, as
this code appears properly locked.

MFC after:	3 months
2006-04-12 03:06:20 +00:00
Robert Watson
e3beea90c7 Add missing locking to udp6_getcred(), remove spl use.
MFC after:	3 months
2006-04-12 03:03:47 +00:00
Robert Watson
4847772314 Remove spl use from IPv6 inpcb code.
In various inpcb methods for IPv6 sockets, don't check of so_pcb is NULL,
assert it isn't.

MFC after:	3 months
2006-04-12 02:52:14 +00:00
SUZUKI Shinsuke
8447156ce0 ip6_mrouter_done(): use if_allmulti(0) for disabling the multicast promiscuous mode
Obtained from: KAME
MFC after: 2 days
2006-04-10 14:33:22 +00:00
Robert Watson
c60afb3f55 Fix assertion description: !=, not ==.
Submitted by:	pjd
MFC after:	3 months
2006-04-09 16:33:41 +00:00
Robert Watson
14ba8add01 Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
  never NULL, converting dozens of unnecessary NULL checks into
  assertions, and eliminating dozens of unnecessary error handling
  cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
  longer required to ensure so_pcb != NULL.  For example, in protocol
  shutdown methods, and in raw IP send.

- Abort and detach protocol switch methods no longer return failures,
  nor attempt to free sockets, as the socket layer does this.

- Invoke in_pcbfree() after in_pcbdetach() in order to free the
  detached in_pcb structure for a socket.

MFC after:	3 months
2006-04-01 16:20:54 +00:00
Robert Watson
4c7c478d0f Break out in_pcbdetach() into two functions:
- in_pcbdetach(), which removes the link between an inpcb and its
  socket.

- in_pcbfree(), which frees a detached pcb.

Unlike the previous in_pcbdetach(), neither of these functions will
attempt to conditionally free the socket, as they are responsible only
for managing in_pcb memory.  Mirror these changes into in6_pcbdetach()
by breaking it into in6_pcbdetach() and in6_pcbfree().

While here, eliminate undesired checks for NULL inpcb pointers in
sockets, as we will now have as an invariant that sockets will always
have valid so_pcb pointers.

MFC after:	3 months
2006-04-01 16:04:42 +00:00
Robert Watson
bc725eafc7 Chance protocol switch method pru_detach() so that it returns void
rather than an error.  Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF.  so_pcb is now entirely owned and
managed by the protocol code.  Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit.  In their current state they may leak
memory or panic.

MFC after:	3 months
2006-04-01 15:42:02 +00:00
Robert Watson
ac45e92ff2 Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful.  Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit.  This will be corrected shortly in followup
commits to these components.

MFC after:      3 months
2006-04-01 15:15:05 +00:00
David Malone
fe12457335 This comment on various IPPORT_ defines was copied from in.h and
probably never fully applied to IPv6. Over time it has become more
stale, so replace it with something more up to date.

Reviewed by:	ume
MFC after:	1 month
2006-03-28 12:51:22 +00:00
Robert Watson
85f1f481ab Remove manual assignment of m_pkthdr from one mbuf to another in
ipsec_copypkt(), as this is already handled by the call to M_MOVE_PKTHDR(),
which also knows how to correctly handle MAC m_tags.  This corrects a panic
when running with MAC and KAME IPSEC.

PR:		kern/94599
Submitted by:	zhouyi zhou <zhouyi04 at ios dot cn>
Reviewed by:	bz
MFC after:	3 days
2006-03-28 10:16:38 +00:00
SUZUKI Shinsuke
31d4137bf3 fixed a memory leak when net.inet6.icmp6.nd6_maxqueuelen is greater than 1
Obtained from: KAME
MFC after: 3 days
2006-03-24 16:20:12 +00:00
David Malone
fcd1001c63 Make net.inet.ip.portrange.reservedhigh and
net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4.

We could have made new sysctls for IPv6, but that potentially makes
things complicated for mapped addresses. This seems like the least
confusing option and least likely to cause obscure problems in the
future.

This change makes the mac_portacl module useful with IPv6 apps.

Reviewed by:	ume
MFC after:	1 month
2006-03-19 11:48:48 +00:00
SUZUKI Shinsuke
d3693a631e implements section 2.2 of RFC4191, regarding the reserved preference value (10)
Obtained from: KAME
MFC after: 1 day
2006-03-19 06:38:39 +00:00
SUZUKI Shinsuke
e381ac4daa updates net.inet6.ip6.kame_version as the proof of the latest KAME merge
Reviewed by: KAME
MFC after: 2 days
2006-03-19 02:11:42 +00:00
SUZUKI Shinsuke
2c112cdc6d fixed a bug that an MLD report is not advertised when group-specific MLD query is received.
PR:	kern/93526
Obtained from:	KAME
MFC after:	1 day
2006-03-04 09:17:11 +00:00
Hajimu UMEMOTO
430683286b avoided the use of purged address structure when an address became
invalid in nd6_timer().

PR:		kern/93170
Reported by:	kris
Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Confirmed by:	kris
Obtained from:	KAME
MFC after:	2 days
2006-02-12 15:37:08 +00:00
George V. Neville-Neil
f2b1bd14dc Fix for an inappropriate bzero of the ICMPv6 stats. The code was zero'ing the wrong structure member but setting the correct one.
Submitted by:	James dot Juran at baesystems dot com
Reviewed by:	gnn
MFC after:	1 week
2006-02-08 07:16:46 +00:00
Hajimu UMEMOTO
8c76311215 shut up strict-aliasing rules warning. 2006-02-05 09:52:40 +00:00
Hajimu UMEMOTO
92cb1c3210 make IPV6_V6ONLY socket option work for UDP as well.
PR:		ports/92620
Reported by:	Kurt Miller <kurt__at__intricatesoftware.com>
MFC after:	1 week
2006-02-02 11:46:05 +00:00
Christian S.J. Peron
604afec496 Somewhat re-factor the read/write locking mechanism associated with the packet
filtering mechanisms to use the new rwlock(9) locking API:

- Drop the variables stored in the phil_head structure which were specific to
  conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
  macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
  of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
  hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
  PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
  hooks to be ran, and returns if not. This check is already performed by the
  IP stacks when they call:

        if (!PFIL_HOOKED(ph))
                goto skip_hooks;

- Drop in assertion which makes sure that the number of hooks never drops
  below 0 for good measure. This in theory should never happen, and if it
  does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
  the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
  API instead of our home rolled version
- Convert the inlined functions to macros

Reviewed by:	mlaier, andre, glebius
Thanks to:	jhb for the new locking API
2006-02-02 03:13:16 +00:00
Gleb Smirnoff
25af0bb50e Add some initial locking to gif(4). It doesn't covers the whole driver,
however IPv4-in-IPv4 tunnels are now stable on SMP. Details:

- Add per-softc mutex.
- Hold the mutex on output.

The main problem was the rtentry, placed in softc. It could be
freed by ip_output(). Meanwhile, another thread being in
in_gif_output() can read and write this rtentry.

Reported by:	many
Tested by:	Alexander Shiryaev <aixp mail.ru>
2006-01-30 08:39:09 +00:00
Hajimu UMEMOTO
411babc618 don't embed scope id before running packet filters.
Reported by:	YAMAMOTO Takashi <yamt__at__mwd.biglobe.ne.jp>
Obtained from:	NetBSD
MFC after:	1 week
2006-01-25 08:17:02 +00:00
Robert Watson
9f8a02f168 Convert in6_cksum() to ANSI C function declaration.
MFC after:	1 week
2006-01-22 01:17:57 +00:00
Robert Watson
fc4c825847 When storing the results of malloc() in a pointer to a pointer, check
the pointer to a pointer for NULL, not the pointer for NULL.

Noticed by:	Coverity Prevent analysis tool
MFC after:	3 days
2006-01-14 00:09:41 +00:00
Robert Watson
2ab392c630 In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.

Noticed by:	Coverity Prevent analysis tool
MFC after:	3 days
2006-01-13 23:53:23 +00:00
SUZUKI Shinsuke
02ff33e2d0 added a note about the assumption for m->m_pkthdr.rcvif
Obtained from: KAME
MFC After: 1 day
2006-01-09 09:08:43 +00:00
Andrew Thompson
73ff045c57 Add RFC 3378 EtherIP support. This change makes it possible to add gif
interfaces to bridges, which will then send and receive IP protocol 97 packets.
Packets are Ethernet frames with an EtherIP header prepended.

Obtained from:	NetBSD
MFC after:	2 weeks
2005-12-21 21:29:45 +00:00
SUZUKI Shinsuke
7014e0eb11 fixed a kernel crash at the initialization time of PIM-SM register interface
MFC after: 2 days
2005-12-09 04:42:19 +00:00
Hajimu UMEMOTO
4a3df7fe7b the response NS to a DAD NS was not sent correctly due to the
invalid destination address.

Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
MFC after:	1 day
2005-12-08 06:43:39 +00:00
SUZUKI Shinsuke
a829cf5765 fixed a kernel crash due to an improper removal of callout-timer
(ToDo: similar fix is necessary for other NDP-related callout-timers
 in netinet6/nd6*.c)

PR: kern/88725
MFC after: 1 month
2005-11-16 12:36:08 +00:00
Ruslan Ermilov
303989a2f3 Use sparse initializers for "struct domain" and "struct protosw",
so they are easier to follow for the human being.
2005-11-09 13:29:16 +00:00
SUZUKI Shinsuke
797df30d75 statically configured IPv6 address is properly added/deleted now
Obtained from: KAME
Reported in: freebsd-net@freebsd
MFC after: 1 day
2005-10-31 23:06:04 +00:00
SUZUKI Shinsuke
36dc24e61e fixed a compilation failure on amd64/sparc64/ia64
Submitted by: max
MFC after: 2 month
2005-10-22 05:07:16 +00:00
SUZUKI Shinsuke
200caaf0c0 nuked non-existing commands 2005-10-21 16:31:39 +00:00
SUZUKI Shinsuke
743eee666f sync with KAME regarding NDP
- introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners
- supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt>
- better prefix lifetime management
- more spec-comformant DAD advertisement
- updated RFC/internet-draft revisions

Obtained from: KAME
Reviewed by: ume, gnn
MFC after: 2 month
2005-10-21 16:23:01 +00:00
SUZUKI Shinsuke
9c8aab3e0b perform NUD on an IPv6-aware point-to-point interface
Obtained from: KAME
MFC after: 1 week
2005-10-21 15:59:00 +00:00
SUZUKI Shinsuke
4ecbe3316a sync with KAME (renamed a macro IPV6_DADOUTPUT to IPV6_UNSPECSRC)
Obtained from: KAME
2005-10-21 15:45:13 +00:00
SUZUKI Shinsuke
7aa5949375 sync with KAME (nuked unused code, use NULL to denote a NULL pointer)
Obtained from: KAME
Reviewed by: ume, gnn
2005-10-19 17:18:49 +00:00
SUZUKI Shinsuke
c1a049ac20 sync with KAME (removed a unnecesary non-standard macro)
Obtained from: KAME
Reviewd by: ume, gnn
2005-10-19 16:53:24 +00:00
SUZUKI Shinsuke
d28bde669a sync with KAME regarding the following clarification in RFC3542:
- disable IPv6 operation if DAD fails for some EUI-64 link-local addresses.
 - export get_hw_ifid() (and rename it) as a subroutine for this process.

Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 week
2005-10-19 16:43:57 +00:00
SUZUKI Shinsuke
a22adbc68c sync with KAME (don't respond to NI_QTYPE_IPV4ADDR)
Obtained from: KAME
Reviewed by: ume, gnn
2005-10-19 16:27:33 +00:00
SUZUKI Shinsuke
5b27b04579 supported an ndp command suboption to disable IPv6 in the given interface
Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 week
2005-10-19 16:20:18 +00:00
SUZUKI Shinsuke
b9204379a1 added an ioctl option in kernel so that ndp/rtadvd can change some NDP-related kernel variables based on their configurations (RFC2461 p.43 6.2.1 mandates this for IPv6 routers)
Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 weeks
2005-10-19 15:05:42 +00:00
SUZUKI Shinsuke
2ce62dce17 sync with KAME in the following points:
- fixed typos
- improved some comment descriptions
- use NULL, instead of 0, to denote a NULL pointer
- avoid embedding a magic number in the code
- use nd6log() instead of log() to record NDP-specific logs
- nuked an unnecessay white space

Obtained from: KAME
MFC after:  1 day
2005-10-19 10:09:19 +00:00
SUZUKI Shinsuke
4350fcab1b Raw IPv6 checksum must use the protocol number of the last header, instead of the first next-header value.
Obtained from: KAME
MFC after: 1 day
2005-10-19 01:21:49 +00:00
SUZUKI Shinsuke
2d70ebe43d fixed a kernel crash when IPv6 PIM-SM routing is enabled and a PIM register message is received
Obtained from: KAME
MFC After: 3 days
2005-10-17 13:47:31 +00:00
SUZUKI Shinsuke
971b154cd3 added a missing unlock
Submitted by: JINMEI Tatuya
MFC After: 1 day
2005-10-15 08:49:49 +00:00
Hajimu UMEMOTO
9129d539e2 AES counter mode uses 8byte IV, not 16 bytes.
Obtained from:	NetBSD
2005-10-12 09:13:48 +00:00
Andre Oppermann
fe53256dc2 Use monotonic 'time_uptime' instead of 'time_second' as timebase
for rt->rt_rmx.rmx_expire.
2005-09-19 22:54:55 +00:00
SUZUKI Shinsuke
9689258fb5 plugged a possible memory leak
Obtained from: KAME
MFC after: 1 day
2005-09-16 01:42:50 +00:00
David E. O'Brien
7ba26d99d8 IPv6 was improperly defining its malloc type the same as IPv4 (M_IPMADDR,
M_IPMOPTS, M_MRTABLE).  Thus we had conflicting instantiations.
Create an IPv6-specific type to overcome this.
2005-09-07 10:11:49 +00:00
Andrew Thompson
59280079d3 Add support for multicast to the bridge and allow inet6 addresses to be
assigned to the interface.

IPv6 auto-configuration is disabled. An IPv6 link-local address has a
link-local scope within one link, the spec is unclear for the bridge case and
it may cause scope violation.

An address can be assigned in the usual way;
  ifconfig bridge0 inet6 xxxx:...

Tested by:	bmah
Reviewed by:	ume (netinet6)
Approved by:	mlaier (mentor)
MFC after:	1 week
2005-09-06 21:11:59 +00:00
Andre Oppermann
e0aec68255 Use the correct mbuf type for MGET(). 2005-08-30 16:35:27 +00:00
SUZUKI Shinsuke
2af9b91993 added a missing unlock (just do the same thing as in netinet/raw_ip.c)
Obtained from: KAME
MFC after: 3 days
2005-08-18 11:11:27 +00:00
Hajimu UMEMOTO
5d52565396 - fix race condition using sx lock.
- use TAILQ_FOREACH() for readability.

Suggested by:	jhb
2005-08-17 16:46:55 +00:00
Hajimu UMEMOTO
1c44678637 avoid exclusive sleep mutex. 2005-08-16 19:49:10 +00:00
Hajimu UMEMOTO
5af09736a8 added a knob to enable path MTU discovery for multicast packets.
(by default, it is disabled)

Submitted by:	suz
Obtained from:	KAME
2005-08-13 19:55:06 +00:00
Hajimu UMEMOTO
cd0fdcf7a7 - fix typo in comment.
- nuke unused code.

Submitted by:	suz
Obtained from:	KAME
2005-08-12 15:27:25 +00:00
Gleb Smirnoff
530f95fc08 o Make rt_check() function more strict:
- rt0 passed to rt_check() must not be NULL, assert this.
  - rt returned by rt_check() must be valid locked rtentry,
    if no error occured.
o Modify callers, so that they never pass NULL rt0
  to rt_check().

Reviewed by:	sam, ume (nd6.c)
2005-08-11 08:14:53 +00:00
Hajimu UMEMOTO
ae12c6579e create sysctl tree dynamically. it is required to share
net.inet6.ip6.fw with upcomming ipfw2 improvement for IPv6.

Requested by:	bz
2005-08-11 07:28:01 +00:00
Hajimu UMEMOTO
31c8e3fbec removed RFC1885-related code. it was obsoleted by RFC2463, and the
code was #ifdef'ed out for a long time.

Submitted by:	suz
Obtained from:	KAME
2005-08-10 17:30:10 +00:00
SUZUKI Shinsuke
f8a8f9ca5e supports stealth forwarding in IPv6, as well as in IPv4
PR: kern/54625
MFC after: 1 week
2005-08-10 09:13:35 +00:00
David E. O'Brien
c11ba30c9a Remove public declarations of variables that were forgotten when they were
made static.
2005-08-10 07:10:02 +00:00
David E. O'Brien
6ca6f60b07 Style nit. 2005-08-10 06:38:46 +00:00
SUZUKI Shinsuke
05b697ddcb fixed a kernel crash at the start-up time of an IPv6 multicast daemons o
(e.g. pim6dd, pim6sd)

MFC after: 3 days
2005-08-10 05:28:11 +00:00
Hajimu UMEMOTO
c66b5fea43 corrected the fourth argument to ni6_addrs(). 2005-08-09 12:24:11 +00:00
Robert Watson
13f4c340ae Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
Gleb Smirnoff
9bd8ca3014 In preparation for fixing races in ARP (and probably in other
L2/L3 mappings) make rt_check() return a locked rtentry.
2005-08-09 08:39:56 +00:00
Gleb Smirnoff
401df2f296 - Use 'error' variable to store error value, instead of 'i'.
- Push 'i' into the only block where it is used.
- Remove redundant check for rt being NULL. If rt_check() hasn't
  returned an error, then rt is valid.

Reviewed by:	gnn
2005-08-09 08:37:28 +00:00
Robert Watson
bccb41014a Modify network protocol consumers of the ifnet multicast address lists
to lock if_addr_mtx.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		1 week
2005-08-02 23:51:22 +00:00
Hajimu UMEMOTO
e770771a78 simplied the fix to FreeBSD-SA-04:06.ipv6. The previous one worried
too much even though we actually validate the parameters.  This code
also is more compatible with other *BSDs, which do copyin within
setsockopt().

Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Reviewed by:	security-officer (nectar)
Obtained from:	KAME
2005-07-28 18:07:07 +00:00
Colin Percival
1fcc990954 Correct a buffer overflow which can occur when decompressing a
carefully crafted deflated data stream. [1]

Correct problems in the AES-XCBC-MAC IPsec authentication algorithm. [2]

Submitted by:	suz [2]
Security:	FreeBSD-SA-05:18.zlib [1], FreeBSD-SA-05:19.ipsec [2]
2005-07-27 08:41:17 +00:00
Hajimu UMEMOTO
d6bb0cb7eb nuke duplicate inclusion of scope6_var.h. 2005-07-26 11:46:15 +00:00
Hajimu UMEMOTO
336a1a7b37 oops, make it compilable. i need sleep. X-( 2005-07-25 17:28:39 +00:00
Hajimu UMEMOTO
a7734b4bfd restore locks which disappeared wrongly by my previous commit. 2005-07-25 17:05:37 +00:00
Hajimu UMEMOTO
a1f7e5f8ee scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
  scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
  scoped addresses as a special case.
- scope boundary check will be stricter.  For example, the current
  *BSD code allows a packet with src=::1 and dst=(some global IPv6
  address) to be sent outside of the node, if the application do:
    s = socket(AF_INET6);
    bind(s, "::1");
    sendto(s, some_global_IPv6_addr);
  This is clearly wrong, since ::1 is only meaningful within a single
  node, but the current implementation of the *BSD kernel cannot
  reject this attempt.

Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from:	KAME
2005-07-25 12:31:43 +00:00
Hajimu UMEMOTO
885adbfa81 always copy ip6_pktopt. remove needcopy and needfree
argument/structure member accordingly.

Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-21 16:39:23 +00:00
Hajimu UMEMOTO
e07db7aa57 simplified udp6_output() and rip6_output(): do not override
in6p_outputopts at the entrance of the functions.  this trick was
necessary when we passed an in6 pcb to in6_embedscope(), within which
the in6p_outputopts member was used, but we do not use this kind of
interface any more.

Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-21 16:32:50 +00:00
Hajimu UMEMOTO
d5e3406d06 be consistent on naming advanced API functions; use ip6_XXXpktopt(s).
Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-21 15:06:32 +00:00
Hajimu UMEMOTO
8507acb169 NULL is not zero.
Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-21 14:57:53 +00:00
Hajimu UMEMOTO
9727df0c09 do not hardcode if_mtu values in here, except for IFT_{ARC,FDDI} -
they need special handling.  makes it possible to take advantage of 9k ether
frames.

Obtained from:	NetBSD
2005-07-20 20:02:28 +00:00
Hajimu UMEMOTO
18b35df8fe update comments:
- RFC2292bis -> RFC3542
  - typo fixes

Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-20 08:59:45 +00:00
Andrew Thompson
2fcb030ad5 Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

 IP_HDR_ALIGNED_P()
 IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR:		ia64/81284
Obtained from:	NetBSD
Approved by:	re (dwhite), mlaier (mentor)
2005-07-02 23:13:31 +00:00
Hajimu UMEMOTO
d098c2c166 fix IP(v4) over IPv6 tunneling most likely broken with ifnet changes.
Submitted by:	bz
Approved by:	re (dwhite)
2005-06-20 20:17:00 +00:00
Brooks Davis
be4889bb80 Fix IPv6 neighbor discovery by using IF_LLADDR to get the mac address
instead of a particularly ugly cast + pointer math hack.

Reported by:	kuriyama, kris
2005-06-12 00:45:24 +00:00
Brooks Davis
fc74a9f93a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
Ian Dowse
ba5da2a06f Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface
if_ioctl routine. This should fix a number of code paths through
soo_ioctl() that could call into Giant-locked network drivers without
first acquiring Giant.
2005-06-02 00:04:08 +00:00
Robert Watson
8a2aa63d7e Lock udbinfo and inp before calling in6_pcbdetach() from udp6_abort().
MFC after:	1 week
2005-06-01 11:38:19 +00:00
George V. Neville-Neil
403cbcf59f Fixes for various nits found by the Coverity tool.
In particular 2 missed return values and an inappropriate bcopy from
a possibly NULL pointer.

Reviewed by:	jake
Approved by:	rwatson
MFC after:	1 week
2005-05-15 02:28:30 +00:00
Brooks Davis
8195404bed Add IPv6 support to IPFW and Dummynet.
Submitted by:	Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
2005-04-18 18:35:05 +00:00
George V. Neville-Neil
c543ec4e34 Remove dead code which would never execute.
i.e. checking to see if a cluster was every less than 48 bytes,
    a rather unlikely case.

Check return value of m_dup_pkthdr() calls.

Found by: Coverity
Reviewed by: rwatson (mentor), Keiichi Shima (for Kame)
Approved by: rwatson (mentor)
2005-04-14 11:41:23 +00:00
Sam Leffler
8a9d54df38 check for malloc failure (also move malloc up to simplify error recovery)
Noticed by:	Coverity Prevent analysis tool
Reviewed by:	gnn
2005-03-29 01:26:27 +00:00
Gleb Smirnoff
d4d2297060 ifma_protospec is a pointer. Use NULL when assigning or compating it. 2005-03-20 14:31:45 +00:00
Sam Leffler
6c011e4dc3 correct bounds check
Noticed by:	Coverity Prevent analysis tool
2005-03-16 05:11:11 +00:00
Hajimu UMEMOTO
9f65b10b0f refer opencrypto/cast.h directly. 2005-03-11 12:37:07 +00:00
Hajimu UMEMOTO
d34fd3c7e0 reported from VANHULLEBUS Yvan [remote kernel crash may result]
Submitted by:	itojun
Obtained from:	KAME
MFC after:	1 day
2005-03-09 14:39:48 +00:00
SUZUKI Shinsuke
da57b1caf8 ignores ICMPv6 code field in case of ICMPv6 Packet-Too-Big (as specified in RFC2463 and draft-ietf-ipngwg-icmp-v3-06.txt)
Obtained from: KAME
MFC after: 1 day
2005-03-02 05:14:15 +00:00
Hajimu UMEMOTO
9c0fda722d icmp6_notify_error uses IP6_EXTHDR_CHECK, which in turn calls
m_pullup.  icmp6_notify_error continued to use the old pointer,
which after the m_pullup is not suitable as a packet header any
longer (see m_move_pkthdr).
and this is what causes the kernel panic in sbappendaddr later on.

PR:		kern/77934
Submitted by:	Gerd Rausch <gerd@juniper.net>
MFC after:	2 days
2005-02-27 18:57:10 +00:00
Hajimu UMEMOTO
bee48028f0 fix typo.
MFC after:	2 days
2005-02-27 18:23:29 +00:00
Hajimu UMEMOTO
283f9f8a3c initialized the last arg to ip6_process_hopopts(), because the recent
code requires it to be 0 when a jumbo payload option is contained.

PR:		kern/77934
Submitted by:	Gerd Rausch <gerd@juniper.net>
Obtained from:	KAME
MFC after:	2 days
2005-02-27 18:07:18 +00:00
Sam Leffler
ba1a42195c remove dead code
Noticed by:	Coverity Prevent analysis tool
2005-02-25 22:58:25 +00:00
Sam Leffler
7f560471fe eliminate dead code
Noticed by:	Coverity Prevent analysis tool
2005-02-23 22:53:04 +00:00
Gleb Smirnoff
a97719482d Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by:	mlaier
Obtained from:	OpenBSD (mickey, mcbride)
2005-02-22 13:04:05 +00:00
Robert Watson
da2ecc1aa6 Add missed merge of ripcbinfo extern. Given how widely used
ripcbinfo is, we should probably add it to an include file.

Spotted by:	mux
2005-02-09 01:12:43 +00:00
Robert Watson
62a2c81733 Lock raw IP socket pcb list and PCBs when processing input via
icmp6_rip6_input().

Reviewed by:	gnn
MFC after:	1 week
2005-02-08 22:16:26 +00:00
Robert Watson
8760934124 Remove a comment from the raw IPv6 output function regarding
M_TRYWAIT allocations: M_PREPEND() now uses M_DONTWAIT.

MFC after:	3 days
2005-02-06 21:43:55 +00:00
Hajimu UMEMOTO
5b8c5ac438 we don't need to make fake sockaddr_in6 to compare subject address.
MFC after:	1 week
2005-01-21 18:12:46 +00:00
Warner Losh
caf43b0208 /* -> /*- for license, minor formatting changes, separate for KAME 2005-01-07 02:30:35 +00:00
Gleb Smirnoff
5e5da86597 In certain cases ip_output() can free our route, so check
for its presence before RTFREE().

Noticed by:	ru
2004-12-10 07:51:14 +00:00
Gleb Smirnoff
34291a9efc style the last change 2004-12-09 09:52:58 +00:00
Gleb Smirnoff
39817106d4 MFinet4:
- Make route cacheing optional, configurable via IFF_LINK0 flag.
  - Turn it off by default.

Reminded by:	suz
2004-12-09 09:48:47 +00:00
George V. Neville-Neil
026e67b69b Reviewed by: SUZUKI Shinsuke <suz@kame.net>
Approved by:  Robert Watson <rwatson@freebsd.org>

Add locking to the IPv6 scoping code.

All spl() like calls have also been removed.

Cleaning up the handling of ifnet data will happen at a later date.
2004-11-29 03:10:35 +00:00
SUZUKI Shinsuke
3d54848fc2 support TCP-MD5(IPv4) in KAME-IPSEC, too.
MFC after: 3 week
2004-11-08 18:49:51 +00:00
Poul-Henning Kamp
756d52a195 Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.
2004-11-08 14:44:54 +00:00
SUZUKI Shinsuke
b3fe9bc483 fixed a bug that incorrect IPsec request level may be returned for proto AH
Obtained from: KAME
2004-10-28 09:24:45 +00:00
Andre Oppermann
f45cd79a03 Be more careful to only index valid IP protocols and be more verbose with
comments.
2004-10-19 14:26:44 +00:00
Robert Watson
81158452be Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-18 22:19:43 +00:00
SUZUKI Shinsuke
6f9e3ebf47 fixed too delayed routing cache expiry. (tvtohz() converts a time interval to ticks, whereas hzto() converts an absolute time to ticks)
Obtained from: KAME
2004-10-06 03:32:26 +00:00
Brian Feldman
77b691e0ad Prevent reentrancy of the IPv6 routing code (leading to crash with
INVARIANTS on, who knows what with it off).
2004-10-03 00:49:33 +00:00
Doug White
763f534e3c Disable MTU feedback in IPv6 if the sender writes data that must be fragmented.
Discussed extensively with KAME.  The API author's intent isn't clear at this
point, so rather than remove the code entirely, #if 0 out and put a big
comment in for now. The IPV6_RECVPATHMTU sockopt is available if the
application wants to be notified of the path MTU to optimize packet sizes.

Thanks to JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp> for putting up
with my incessant badgering on this issue, and fenner for pointing out
the API issue and suggesting solutions.
2004-10-02 23:45:02 +00:00
Max Laier
d6a8d58875 Add an additional struct inpcb * argument to pfil(9) in order to enable
passing along socket information. This is required to work around a LOR with
the socket code which results in an easy reproducible hard lockup with
debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do
so later. The missing piece is to turn the filter locking into a leaf lock
and will follow in a seperate (later) commit.

This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in
forseeable future.

Suggested by:		rwatson
A lot of work by:	csjp (he'd be even more helpful w/o mentor-reviews ;)
Reviewed by:		rwatson, csjp
Tested by:		-pf, -ipfw, LINT, csjp and myself
MFC after:		3 days

LOR IDs:		14 - 17 (not fixed yet)
2004-09-29 04:54:33 +00:00
Stefan Farfeleder
e7b80a8e24 Prefer C99's __func__ over GCC's __FUNCTION__. 2004-09-22 17:16:04 +00:00
Robert Watson
690be704f3 Call callout_init() on nd6_slowtimo_ch before setting it going; otherwise,
the flags field will be improperly initialized resulting in inconsistent
operation (sometimes with Giant, sometimes without, et al).

RELENG_5 candidate.
2004-09-05 17:27:54 +00:00
Robert Watson
0b7851fa03 Unlock rather than lock the ripcbinfo lock at the end of rip6_input().
RELENG_5 candidate.

Foot provided by:	Patrick Guelat <pg at imp dot ch>
2004-09-02 20:18:02 +00:00
Robert Watson
98f6a62499 Mark Netgraph TTY, KAME IPSEC, and IPX/SPX as requiring Giant for correct
operation using NET_NEEDS_GIANT().  This will result in a boot-time
restoration of Giant-enabled network operation, or run-time warning on
dynamic load (applicable only to the Netgraph component).  Additional
components will likely need to be marked with this in the future.
2004-08-28 15:24:53 +00:00
Andre Oppermann
3161f583ca Apply error and success logic consistently to the function netisr_queue() and
its users.

netisr_queue() now returns (0) on success and ERRNO on failure.  At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.

Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success.  Due to this
schednetisr() was never called to kick the scheduling of the isr.  However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.

PR:		kern/70988
Found by:	MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after:	3 days
2004-08-27 18:33:08 +00:00
Andre Oppermann
c21fd23260 Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option.  All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over.  This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
2004-08-27 15:16:24 +00:00
Robert Watson
c415679d71 Remove in6_prefix.[ch] and the contained router renumbering capability.
The prefix management code currently resides in nd6, leaving only the
unused router renumbering capability in the in6_prefix files.  Removing
it will make it easier for us to provide locking for the remainder of
IPv6 by reducing the number of objects requiring synchronized access.

This functionality has also been removed from NetBSD and OpenBSD.

Submitted by:	George Neville-Neil <gnn at neville-neil.com>
Discussed with/approved by:	suz, keiichi at kame.net, core at kame.net
2004-08-23 03:00:27 +00:00
Robert Watson
5a0192650e When notifying protocol components of an event on an in6pcb, use the
result of the notify() function to decide if we need to unlock the
in6pcb or not, rather than always unlocking.  Otherwise, we may unlock
and already unlocked in6pcb.

Reported by:	kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Tested by:	kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Discussed with:	mdodd
2004-08-21 17:38:48 +00:00
David Malone
1f44b0a1b5 Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD
have already done this, so I have styled the patch on their work:

        1) introduce a ip_newid() static inline function that checks
        the sysctl and then decides if it should return a sequential
        or random IP ID.

        2) named the sysctl net.inet.ip.random_id

        3) IPv6 flow IDs and fragment IDs are now always random.
        Flow IDs and frag IDs are significantly less common in the
        IPv6 world (ie. rarely generated per-packet), so there should
        be smaller performance concerns.

The sysctl defaults to 0 (sequential IP IDs).

Reviewed by:	andre, silby, mlaier, ume
Based on:	NetBSD
MFC after:	2 months
2004-08-14 15:32:40 +00:00
Robert Watson
8a0c4da871 When allocating the IPv6 header to stick in front of raw packet being
sent via a raw IPv6 socket, use M_DONTWAIT not M_TRYWAIT, as we're
holding the raw pcb mutex.

Reported, tested by:	kuriyama
2004-08-12 18:31:36 +00:00
Robert Watson
f31f65a708 Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead
structures, allowing in6_pcbnotify() to lock the pcbinfo and each
inpcb that it notifies of ICMPv6 events.  This prevents inpcb
assertions from firing when IPv6 generates and delievers event
notifications for inpcbs.

Reported by:	kuriyama
Tested by:	kuriyama
2004-08-06 03:45:45 +00:00
Yaroslav Tykhiy
a4eb4405e3 Disallow a particular kind of port theft described by the following scenario:
Alice is too lazy to write a server application in PF-independent
	manner.  Therefore she knocks up the server using PF_INET6 only
	and allows the IPv6 socket to accept mapped IPv4 as well.  An evil
	hacker known on IRC as cheshire_cat has an account in the same
	system.  He starts a process listening on the same port as used
	by Alice's server, but in PF_INET.  As a consequence, cheshire_cat
	will distract all IPv4 traffic supposed to go to Alice's server.

Such sort of port theft was initially enabled by copying the code that
implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for
the implied case of the same owner for both connections.  After this
change, the above scenario will be impossible.  In the same setting,
the user who attempts to start his server last will get EADDRINUSE.

Of course, using IPv4 mapped to IPv6 leads to security complications
in the first place, but there is no reason to make it even more unsafe.

This change doesn't apply to KAME since it affects a FreeBSD-specific
part of the code.  It doesn't modify the out-of-box behaviour of the
TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.

MFC after:	1 month
2004-07-28 13:03:07 +00:00
Robert Watson
07385abd73 Commit a first pass at in6pcb and pcbinfo locking for IPv6,
synchronizing IPv6 protocol control blocks and lists.  These changes
are modeled on the inpcb locking for IPv4, submitted by Jennifer Yang,
and committed by Jeffrey Hsu.  With these locking changes, IPv6 use of
inpcbs is now substantially more MPSAFE, and permits IPv4 inpcb locking
assertions to be run in the presence of IPv6 compiled into the kernel.
2004-07-27 23:44:03 +00:00
Yaroslav Tykhiy
f66145c6bd Don't consider TCP connections beyond LISTEN state
(i.e. with the foreign address being not wildcard) when checking
for possible port theft since such connections cannot be stolen.

The port theft check is FreeBSD-specific and isn't in the KAME tree.

PR:		bin/65928 (in the audit trail)
Reviewed by:	-net, -hackers (silence)
Tested by:	Nick Leuta <skynick at mail.sc.ru>
MFC after:	1 month
2004-07-27 16:35:09 +00:00
Colin Percival
56f21b9d74 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Max Laier
02b199f158 Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by:	(i386)LINT
2004-06-13 17:29:10 +00:00
Robert Watson
359fdba7a7 Missed directory in previous commit; need to hold SOCK_LOCK(so)
before calling sotryfree().

-- Body of earlier bulk commit this belonged with --

  Log:
  Extend coverage of SOCK_LOCK(so) to include so_count, the socket
  reference count:

  - Assert SOCK_LOCK(so) macros that directly manipulate so_count:
    soref(), sorele().

  - Assert SOCK_LOCK(so) in macros/functions that rely on the state of
    so_count: sofree(), sotryfree().

  - Acquire SOCK_LOCK(so) before calling these functions or macros in
    various contexts in the stack, both at the socket and protocol
    layers.

  - In some cases, perform soisdisconnected() before sotryfree(), as
    this could result in frobbing of a non-present socket if
    sotryfree() actually frees the socket.

  - Note that sofree()/sotryfree() will release the socket lock even if
    they don't free the socket.

  Submitted by:   sam
  Sponsored by:   FreeBSD Foundation
  Obtained from:  BSD/OS
2004-06-12 20:59:48 +00:00
Hajimu UMEMOTO
3c751c1b6c do not check super user privilege in ip6_savecontrol. It is
meaningless and can even be harmful.

Obtained from:	KAME
MFC after:	3 days
2004-06-02 15:41:18 +00:00
Poul-Henning Kamp
5dba30f15a add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
Bill Paul
6f8aee2268 Fix a bug which I discovered recently while doing IPv6 testing at
Wind River. In the IPv4 output path, one of the tests in ip_output()
checks how many slots are actually available in the interface output
queue before attempting to send a packet. If, for example, we need
to transmit a packet of 32K bytes over an interface with an MTU of
1500, we know it's going to take about 21 fragments to do it. If
there's less than 21 slots left in the output queue, there's no point
in transmitting anything at all: IP does not do retransmission, so
sending only some of the fragments would just be a waste of bandwidth.
(In an extreme case, if you're sending a heavy stream of fragmented
packets, you might find yourself sending nothing by the first fragment
of all your packets.) So if ip_output() notices there's not enough
room in the output queue to send the frame, it just dumps the packet
and returns ENOBUFS to the app.

It turns out ip6_output() lacks this code. Consequently, this caused
the netperf UDPIPV6_STREAM test to produce very poor results with large
write sizes. This commit adds code to check the remaining space in the
output queue and junk fragmented packets if they're too big to be
sent, just like with IPv4. (I can't imagine anyone's running an NFS
server using UDP over IPv6, but if they are, this will likely make them
a lot happier. :)
2004-05-14 03:57:17 +00:00
Luigi Rizzo
354c3d34d2 fix the change of interface in nd6_storelladdr for multicast
addresses too.

Reported by: Jun Kuriyama
2004-04-26 20:31:46 +00:00
Luigi Rizzo
cd46a114fc This commit does two things:
1. rt_check() cleanup:
    rt_check() is only necessary for some address families to gain access
    to the corresponding arp entry, so call it only in/near the *resolve()
    routines where it is actually used -- at the moment this is
    arpresolve(), nd6_storelladdr() (the call is embedded here),
    and atmresolve() (the call is just before atmresolve to reduce
    the number of changes).
    This change will make it a lot easier to decouple the arp table
    from the routing table.

    There is an extra call to rt_check() in if_iso88025subr.c to
    determine the routing info length. I have left it alone for
    the time being.

    The interface of arpresolve() and nd6_storelladdr() now changes slightly:
     + the 'rtentry' parameter (really a hint from the upper level layer)
       is now passed unchanged from *_output(), so it becomes the route
       to the final destination and not to the gateway.
     + the routines will return 0 if resolution is possible, non-zero
       otherwise.
     + arpresolve() returns EWOULDBLOCK in case the mbuf is being held
       waiting for an arp reply -- in this case the error code is masked
       in the caller so the upper layer protocol will not see a failure.

2. arpcom untangling
    Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
    and use the IFP2AC macro to access arpcom fields.
    This mostly affects the netatalk code.

=== Detailed changes: ===
net/if_arcsubr.c
   rt_check() cleanup, remove a useless variable

net/if_atmsubr.c
   rt_check() cleanup

net/if_ethersubr.c
   rt_check() cleanup, arpcom untangling

net/if_fddisubr.c
   rt_check() cleanup, arpcom untangling

net/if_iso88025subr.c
   rt_check() cleanup

netatalk/aarp.c
   arpcom untangling, remove a block of duplicated code

netatalk/at_extern.h
   arpcom untangling

netinet/if_ether.c
   rt_check() cleanup (change arpresolve)

netinet6/nd6.c
   rt_check() cleanup (change nd6_storelladdr)
2004-04-25 09:24:52 +00:00
Luigi Rizzo
60348b56fd ifp has the same value as rt->rti_ifp so remove the dependency
on the route entry to locate the necessary information.
2004-04-19 08:02:52 +00:00
Luigi Rizzo
3240408870 Remove a tail-recursive call in nd6_output.
This change is functionally identical to the original code, though
I have no idea if that was correct in the first place (see comment
in the commit).
2004-04-19 07:48:48 +00:00
Luigi Rizzo
056c7327e4 Replace Bcopy/Bzero with 'the real thing' as in the rest of the file. 2004-04-18 11:45:28 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
SUZUKI Shinsuke
b5676acff4 UDP checksum is mandatory in IPv6 (RFC2460 p.28)
Obtained from: KAME
2004-04-01 13:48:23 +00:00
Pawel Jakub Dawidek
b0330ed929 Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:
- in_pcbbind(),
	- in_pcbbind_setup(),
	- in_pcbconnect(),
	- in_pcbconnect_setup(),
	- in6_pcbbind(),
	- in6_pcbconnect(),
	- in6_pcbsetport().
"It should simplify/clarify things a great deal." --rwatson

Requested by:	rwatson
Reviewed by:	rwatson, ume
2004-03-27 21:05:46 +00:00