Commit Graph

209 Commits

Author SHA1 Message Date
mjacob
cd743f496a Make gcc4.2 happy and zero save_ip for the unlikely (blackhole != 0)
codepath.
2007-06-17 04:07:11 +00:00
bms
ffd77d9ba5 Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
 * IPv4 multicast socket processing is now moved out of ip_output.c
   into a new module, in_mcast.c.
 * The in_mcast.c module implements the IPv4 legacy any-source API in
   terms of the protocol-independent source-specific API.
 * Source filters are lazy allocated as the common case does not use them.
   They are part of per inpcb state and are covered by the inpcb lock.
 * struct ip_mreqn is now supported to allow applications to specify
   multicast joins by interface index in the legacy IPv4 any-source API.
 * In UDP, an incoming multicast datagram only requires that the source
   port matches the 4-tuple if the socket was already bound by source port.
   An unbound socket SHOULD be able to receive multicasts sent from an
   ephemeral source port.
 * The UDP socket multicast filter mode defaults to exclusive, that is,
   sources present in the per-socket list will be blocked from delivery.
 * The RFC 3678 userland functions have been added to libc: setsourcefilter,
   getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
 * Definitions for IGMPv3 are merged but not yet used.
 * struct sockaddr_storage is now referenced from <netinet/in.h>. It
   is therefore defined there if not already declared in the same way
   as for the C99 types.
 * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
   which are then interpreted as interface indexes) is now deprecated.
 * A patch for the Rhyolite.com routed in the FreeBSD base system
   is available in the -net archives. This only affects individuals
   running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
 * Make IPv6 detach path similar to IPv4's in code flow; functionally same.
 * Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from:  p4://bms_netdev
Submitted by:   Wilbert de Graaf (original work)
Reviewed by:    rwatson (locking), silence from fenner,
		net@ (but with encouragement)
2007-06-12 16:24:56 +00:00
rwatson
00b02345d4 Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
dwmalone
7ff161914e When verifying the IPv4 UDP checksum, don't overwrite the checksum
value in the mbuf with the result of the calculation. Previously,
if we chose to return an ICMP message, the quoted UDP checksum bytes
would be different to what was sent.

PR:		112471
Submitted by:	Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after:	3 weeks
2007-05-16 09:12:16 +00:00
rwatson
47d37a80be Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
2007-05-11 10:20:51 +00:00
rwatson
d225f06201 Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr()
and in_setsockaddr(), containing only stale comments on why they
exist, remove them and initialize the protosw for UDP to directly
reference in_setpeeraddr() and in_setsockaddr().
2007-05-07 13:51:24 +00:00
rwatson
f91ad81f4a Minor style tweaks. 2007-05-07 13:47:39 +00:00
rwatson
a9656f2df2 Remove unused pcbinfo arguments to in_setsockaddr() and
in_setpeeraddr().
2007-05-01 16:31:02 +00:00
rwatson
c27ef03414 Rename some fields of struct inpcbinfo to have the ipi_ prefix,
consistent with the naming of other structure field members, and
reducing improper grep matches.  Clean up and comment structure
fields in structure definition.
2007-04-30 23:12:05 +00:00
bms
428a375b20 Fix IP_SENDSRCADDR semantics.
* To use this option with a UDP socket, it must be bound to a local port,
   and INADDR_ANY, to disallow possible collisions with existing udp inpcbs
   bound to the same port on other interfaces at send time.

 * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with
   INADDR_ANY will be rejected as it is ambiguous.

 * If the socket is bound to an address other than INADDR_ANY, specifying
   IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup().

Reviewed by:	silence on -net
Tested with:	src/tools/regression/netinet/ipbroadcast
MFC after:	4 days
2007-03-08 15:26:54 +00:00
rwatson
b659d84f71 Rename two identically named log_in_vain variables: tcp_input.c's static
log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to
udp_log_in_vain.

MFC after:	1 week
2007-02-20 10:20:03 +00:00
rwatson
5c6ebcbcaf Gratuitous UDP restyling toward style(9) in 7.x. 2007-02-20 10:13:11 +00:00
maxim
4a9a2f072d o One more typo in the comment.
PR:		kern/107609
Submitted by:	Dr. Markus Waldeck
2007-01-06 13:12:24 +00:00
imp
51b8a53652 Fix typo in comment.
Submitted by: remko
2007-01-01 00:35:34 +00:00
imp
426d73c762 Add comment about udp checksums being off in BSD 4.2 compatibility mode.
Submitted by: Dr. Markus Waldeck
PR: kern/106657
2006-12-31 21:34:53 +00:00
jhb
9adb288460 Some whitespace nits and remove a few casts. 2006-12-29 14:58:18 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
rwatson
7beaaf5cd2 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
andre
b095262f18 Check inp_flags instead of inp_vflag for INP_ONESBCAST flag.
PR:		kern/99558
Tested by:	Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 days
2006-09-06 19:04:36 +00:00
thomas
b6505e0884 Fix typo in comment. 2006-09-04 08:32:17 +00:00
rwatson
720efebbba Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket.  pru_abort is now a
notification of close also, and no longer detaches.  pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket.  This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree().  With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by:	gnn
2006-07-21 17:11:15 +00:00
ups
ee0a5eb928 Fix race conditions on enumerating pcb lists by moving the initialization
( and where appropriate the destruction) of the pcb mutex to the init/finit
functions of the pcb zones.
This allows locking of the pcb entries and race condition free comparison
of the generation count.
Rearrange locking a bit to avoid extra locking operation to update the generation
count in in_pcballoc(). (in_pcballoc now returns the pcb locked)

I am planning to convert pcb list handling from a type safe to a reference count
model soon. ( As this allows really freeing the PCBs)

Reviewed by:	rwatson@, mohans@
MFC after:	1 week
2006-07-18 22:34:27 +00:00
rwatson
133fd236d1 Acquire udbinfo lock after call to soreserve() rather than before, as it
is not required.  This simplifies error-handling, and reduces the time
that this lock is held.

MFC after:	1 month
2006-06-03 19:29:26 +00:00
maxim
fa6b08051c o In udp|rip_disconnect() acquire a socket lock before the socket
state modification.  To prevent races do that while holding inpcb
lock.

Reviewed by:	rwatson
2006-05-21 19:28:46 +00:00
rwatson
b4334eda5c Modify UDP to use sosend_dgram() instead of sosend(). This allows
for signicantly optimized UDP socket I/O when using a single UDP
socket from many threads or processes that share it, by avoiding
significant locking and other overhead in the general sosend()
path that isn't necessary for simple datagram sockets.  Specifically,
this change results in a significant performance improvement for
threaded name service in BIND9 under load.

Suggested by:	Jinmei_Tatsuya at isc dot org
2006-05-06 11:24:59 +00:00
rwatson
01b24f64ce Rename 'last' to 'inp' in udp_append(): the name 'last' is due to
the fact that the loop through inpcb's in udp_input() tracks the
last inpcb while looping.  We keep that name in the calling loop
but not in the delivery routine itself.

MFC after:	3 months
2006-04-25 17:38:08 +00:00
ps
10b2fe8dea Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.
2006-04-21 09:25:40 +00:00
rwatson
a7c2bca553 Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
  never NULL, converting dozens of unnecessary NULL checks into
  assertions, and eliminating dozens of unnecessary error handling
  cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
  longer required to ensure so_pcb != NULL.  For example, in protocol
  shutdown methods, and in raw IP send.

- Abort and detach protocol switch methods no longer return failures,
  nor attempt to free sockets, as the socket layer does this.

- Invoke in_pcbfree() after in_pcbdetach() in order to free the
  detached in_pcb structure for a socket.

MFC after:	3 months
2006-04-01 16:20:54 +00:00
rwatson
5479e5d692 Chance protocol switch method pru_detach() so that it returns void
rather than an error.  Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF.  so_pcb is now entirely owned and
managed by the protocol code.  Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit.  In their current state they may leak
memory or panic.

MFC after:	3 months
2006-04-01 15:42:02 +00:00
rwatson
8622e776f9 Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful.  Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit.  This will be corrected shortly in followup
commits to these components.

MFC after:      3 months
2006-04-01 15:15:05 +00:00
glebius
d9cbe01472 Implement 'ipfw fwd laddr,port' feature for UDP. According to ipfw(8)
it should work, however it never did. People expect it to work.

PR:		kern/90834
2006-01-24 09:08:54 +00:00
rwatson
01ba13c93e Remove dead code: 'opts' is not used in udp_append(), only in udp_input(),
so no need to assign it to NULL or conditionally free it.

Found with:	Coverity Prevent(tm)
MFC after:	3 days
2006-01-14 11:18:32 +00:00
mux
b29e3549b8 Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to
match the type of the variable they are exporting.

Spotted by:	Thomas Hurst <tom@hur.st>
MFC after:	3 days
2005-12-14 22:27:48 +00:00
andre
a6a209f2cc Consolidate all IP Options handling functions into ip_options.[ch] and
include ip_options.h into all files making use of IP Options functions.

From ip_input.c rev 1.306:
  ip_dooptions(struct mbuf *m, int pass)
  save_rte(m, option, dst)
  ip_srcroute(m0)
  ip_stripoptions(m, mopt)

From ip_output.c rev 1.249:
  ip_insertoptions(m, opt, phlen)
  ip_optcopy(ip, jp)
  ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)

No functional changes in this commit.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-11-18 20:12:40 +00:00
maxim
98442a62dc o INP_ONESBCAST is inpcb.inp_vflag flag not inp_flags. The confusion
with IP_PORTRANGE_HIGH leads to the incorrect checksum calculation.

PR:		kern/87306
Submitted by:	Rickard Lind
Reviewed by:	bms
MFC after:	2 weeks
2005-10-12 18:13:25 +00:00
andre
bedcd4ace8 Implement IP_DONTFRAG IP socket option enabling the Don't Fragment
flag on IP packets.  Currently this option is only repected on udp
and raw ip sockets.  On tcp sockets the DF flag is controlled by the
path MTU discovery option.

Sending a packet larger than the MTU size of the egress interface
returns an EMSGSIZE error.

Discussed with:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-09-26 20:25:16 +00:00
andre
573a9535a8 Add socketoption IP_MINTTL. May be used to set the minimum acceptable
TTL a packet must have when received on a socket.  All packets with a
lower TTL are silently dropped.  Works on already connected/connecting
and listening sockets for RAW/UDP/TCP.

This option is only really useful when set to 255 preventing packets
from outside the directly connected networks reaching local listeners
on sockets.

Allows userland implementation of 'The Generalized TTL Security Mechanism
(GTSM)' according to RFC3682.  Examples of such use include the Cisco IOS
BGP implementation command "neighbor ttl-security".

MFC after:	2 weeks
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-08-22 16:13:08 +00:00
rwatson
18e2f22abb De-spl UDP.
MFC after:	3 days
2005-06-01 11:24:00 +00:00
cperciva
e513415af9 If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem
2005-05-06 02:50:00 +00:00
sam
0f999925e8 eliminate extraneous null ptr checks
Noticed by:	Coverity Prevent analysis tool
2005-03-29 01:10:46 +00:00
glebius
38f30cf325 In in_pcbconnect_setup() jailed sockets are treated specially: if local
address is not supplied, then jail IP is choosed and in_pcbbind() is called.
Since udp_output() does not save local addr after call to in_pcbconnect_setup(),
in_pcbbind() is called for each packet, and this is incorrect.

So, we shall treat jailed sockets specially in udp_output(), we will save
their local address.

This fixes a long standing bug with broken sendto() system call in jails.

PR:		kern/26506
Reviewed by:	rwatson
MFC after:	2 weeks
2005-02-22 07:50:02 +00:00
imp
a50ffc2912 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
phk
027fce30f5 Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.
2004-11-08 14:44:54 +00:00
phk
f4e34013c8 Hide udp_in6 behind #ifdef INET6 2004-11-04 07:14:03 +00:00
rwatson
f00509ea8d Until this change, the UDP input code used global variables udp_in,
udp_in6, and udp_ip6 to pass socket address state between udp_input(),
udp_append(), and soappendaddr_locked().  While file in the default
configuration, when running with multiple netisrs or direct ithread
dispatch, this can result in races wherein user processes using
recvmsg() get back the wrong source IP/port.  To correct this and
related races:

- Eliminate udp_ip6, which is believed to be generated but then never
  used.  Eliminate ip_2_ip6_hdr() as it is now unneeded.

- Eliminate setting, testing, and existence of 'init' status fields
  for the IPv6 structures.  While with multiple UDP delivery this
  could lead to amortization of IPv4 -> IPv6 conversion when
  delivering an IPv4 UDP packet to an IPv6 socket, it added
  substantial complexity and side effects.

- Move global structures into the stack, declaring udp_in in
  udp_input(), and udp_in6 in udp_append() to be used if a conversion
  is required.  Pass &udp_in into udp_append().

- Re-annotate comments to reflect updates.

With this change, UDP appears to operate correctly in the presence of
substantial inbound processing parallelism.  This solution avoids
introducing additional synchronization, but does increase the
potential stack depth.

Discovered by:	kris (Bug Magnet)
MFC after:	3 weeks
2004-11-04 01:25:23 +00:00
rwatson
cf6eacbc1e Don't release the udbinfo lock until after the last use of UDP inpcb
in udp_input(), since the udbinfo lock is used to prevent removal of
the inpcb while in use (i.e., as a form of reference count) in the
in-bound path.

RELENG_5 candidate.
2004-10-12 20:03:56 +00:00
jmg
8e8293b765 fix up socket/ip layer violation... don't assume/know that
SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
2004-09-05 02:34:12 +00:00
rwatson
2989f4181e When sliding the m_data pointer forward, update m_pktrhdr.len as well
as m_len, or the pkthdr length will be inconsistent with the actual
length of data in the mbuf chain.  The symptom of this occuring was
"out of data" warnings from in_cksum_skip() on large UDP packets sent
via the loopback interface.

Foot shot:	green
2004-08-22 01:32:48 +00:00
rwatson
51b320a56b When prepending space onto outgoing UDP datagram payloads to hold the
UDP/IP header, make sure that space is also allocated for the link
layer header.  If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer.  This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
2004-08-21 16:14:04 +00:00
rwatson
eb3ee278bc Push down pcbinfo and inpcb locking from udp_send() into udp_output().
This provides greater context for the locking and allows us to avoid
locking the pcbinfo structure if not binding operations will take
place (i.e., already bound, connected, and no expliti sendto()
address).
2004-08-19 01:13:10 +00:00