Commit Graph

158 Commits

Author SHA1 Message Date
Robert Watson
0daccb9c94 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00
Robert Watson
1eb7608e00 Mark netatm and netnatm explicitly as requiring Giant, as they still do.
MFC after:	3 days
2005-02-17 14:21:22 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Poul-Henning Kamp
756d52a195 Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.
2004-11-08 14:44:54 +00:00
Robert Watson
81158452be Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-18 22:19:43 +00:00
Alexander Kabaev
445e045b0d Avoid casts as lvalues. 2004-07-28 06:59:55 +00:00
Hartmut Brandt
305627df81 Fix a typo that could provoke a panic or access to random memory.
PR:		kern/67012
Submitted by:	Zhenmin <zli4@cs.uiuc.edu>
2004-07-19 12:54:00 +00:00
Robert Watson
c0b99ffa02 The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality.  This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties.  The
following fields are moved to sb_state:

  SS_CANTRCVMORE            (so_state)
  SS_CANTSENDMORE           (so_state)
  SS_RCVATMARK              (so_state)

Rename respectively to:

  SBS_CANTRCVMORE           (so_rcv.sb_state)
  SBS_CANTSENDMORE          (so_snd.sb_state)
  SBS_RCVATMARK             (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts).  In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
2004-06-14 18:16:22 +00:00
Robert Watson
395a08c904 Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS
2004-06-12 20:47:32 +00:00
Stefan Farfeleder
9b93f39503 Remove an #if section originally written for Sun compilers. 2004-06-08 13:46:31 +00:00
Tom Rhodes
a122cca953 These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
 - in_cksum.[ch]:
   * use a generic C version instead of the assembly version in the !gcc
     case (ASM code breaks with the optimizations icc does)
     -> no bad checksums with an icc compiled kernel
     Help from:		andre, grehan, das
     Stolen from: 	alpha version via ppc version
     The entire checksum code should IMHO be replaced with the DragonFly
     version (because it isn't guaranteed future revisions of gcc will
     include similar optimizations) as in:
        ---snip---
          Revision  Changes    Path
          1.12      +1 -0      src/sys/conf/files.i386
          1.4       +142 -558  src/sys/i386/i386/in_cksum.c
          1.5       +33 -69    src/sys/i386/include/in_cksum.h
          1.5       +2 -0      src/sys/netinet/igmp.c
          1.6       +0 -1      src/sys/netinet/in.h
          1.6       +2 -0      src/sys/netinet/ip_icmp.c

          1.4       +3 -4      src/contrib/ipfilter/ip_compat.h
          1.3       +1 -2      src/sbin/natd/icmp.c
          1.4       +0 -1      src/sbin/natd/natd.c
          1.48      +1 -0      src/sys/conf/files
          1.2       +0 -1      src/sys/conf/files.amd64
          1.13      +0 -1      src/sys/conf/files.i386
          1.5       +0 -1      src/sys/conf/files.pc98
          1.7       +1 -1      src/sys/contrib/ipfilter/netinet/fil.c
          1.10      +2 -3      src/sys/contrib/ipfilter/netinet/ip_compat.h
          1.10      +1 -1      src/sys/contrib/ipfilter/netinet/ip_fil.c
          1.7       +1 -1      src/sys/dev/netif/txp/if_txp.c
          1.7       +1 -1      src/sys/net/ip_mroute/ip_mroute.c
          1.7       +1 -2      src/sys/net/ipfw/ip_fw2.c
          1.6       +1 -2      src/sys/netinet/igmp.c
          1.4       +158 -116  src/sys/netinet/in_cksum.c
          1.6       +1 -1      src/sys/netinet/ip_gre.c
          1.7       +1 -2      src/sys/netinet/ip_icmp.c
          1.10      +1 -1      src/sys/netinet/ip_input.c
          1.10      +1 -2      src/sys/netinet/ip_output.c
          1.13      +1 -2      src/sys/netinet/tcp_input.c
          1.9       +1 -2      src/sys/netinet/tcp_output.c
          1.10      +1 -1      src/sys/netinet/tcp_subr.c
          1.10      +1 -1      src/sys/netinet/tcp_syncache.c
          1.9       +1 -2      src/sys/netinet/udp_usrreq.c

          1.5       +1 -2      src/sys/netinet6/ipsec.c
          1.5       +1 -2      src/sys/netproto/ipsec/ipsec.c
          1.5       +1 -1      src/sys/netproto/ipsec/ipsec_input.c
          1.4       +1 -2      src/sys/netproto/ipsec/ipsec_output.c

          and finally remove
            sys/i386/i386        in_cksum.c
            sys/i386/include     in_cksum.h
        ---snip---
 - endian.h:
   * DTRT in C++ mode
 - quad.h:
   * we don't use gcc v1 anymore, remove support for it
   Suggested by:	bde (long ago)
 - assym.h:
   * avoid zero-length arrays (remove dependency on a gcc specific
     feature)
     This change changes the contents of the object file, but as it's
     only used to generate some values for a header, and the generator
     knows how to handle this, there's no impact in the gcc case.
   Explained by:	bde
   Submitted by:	Marius Strobl <marius@alchemy.franken.de>
 - aicasm.c:
   * minor change to teach it about the way icc spells "-nostdinc"
   Not approved by:	gibbs (no reply to my mail)
 - bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by:	-arch
Submitted by:	netchild
2004-03-12 21:45:33 +00:00
Hartmut Brandt
8f52a59171 Don't remove the first mbuf in the chain if it got empty.
This removes the packet header in certain cases which later on
will give panic. Clarify what the atm_intr expects in the comment
and de-obscurify the code a little bit by replacing the portability
macros with the BSD names. The code isn't maintained externally anymore
so there's no point in keeping the extra level of obscurity.
2004-02-21 12:55:07 +00:00
Robert Watson
a557af222b Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
Bruce Evans
f9d801d6f7 Include <sys/malloc.h> for the declaration of malloc(), etc. instead
of depending on namespace pollution 2 layers deep in <vm/uma.h>.  Fixed
most nearby include messes (another like this, several the opposite of
this, and some formatting).
2003-11-14 21:02:10 +00:00
Sam Leffler
7902224c6b o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
  operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
  calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
  Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
  have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by:	FreeBSD Foundation
2003-11-08 22:28:40 +00:00
Brooks Davis
9bf40ede4a Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
Hartmut Brandt
f345f5e020 The number of prefixes can never be negative so use an u_int for this. 2003-07-29 13:46:43 +00:00
Hartmut Brandt
dd937e32bd Make the ioctl() interface cleaner with regard to types: use size_t
instead of int where the variable has to hold buffer lengths,
use u_int for things like number of network interfaces which
in principle can never be negative.
2003-07-29 13:32:10 +00:00
Hartmut Brandt
cd7a4fa6eb Silence a gcc-warning. Do this by inlining the macro-call. This is
not very nice - the compiler should just silently optimize away the
unused else clause.
2003-07-26 14:20:37 +00:00
Hartmut Brandt
a327640a9e Print the offending SPANS message only if printing is enabled. 2003-07-25 12:32:08 +00:00
Hartmut Brandt
ebcdc0a12e Add support for VBR and CBR PVCs for IP over ATM.
Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-25 08:35:26 +00:00
Hartmut Brandt
69ba39416a Set the interface type of the network interfaces to IFT_IPOVERATM(114).
This is specified by RFC2320.
2003-07-25 07:16:28 +00:00
Hartmut Brandt
9b18b1f47d Hand the packet to bpf not only in the LLC/SNAP case, but for all
connections. While this confuses tcpdump, it enables other applications
to see and analyze non-IP traffic (signalling, for example).

Pointed out by:	Vincent Jardin <vjardin@wanadoo.fr>
2003-07-25 06:43:41 +00:00
Hartmut Brandt
6c373d607e Make the debugging variable that controls printing of UNI messages
accessible as a sysctl and move the debugging stuff out of DIAGNOSTICS.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-25 06:39:46 +00:00
Hartmut Brandt
80366b6d6a Make the debugging variable that controls dumping of IP over ATM packets
accessible as a sysctl.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 15:25:17 +00:00
Hartmut Brandt
5d53a37cb4 Create a sysctl that allows to enable/disable printing of SPANS messages.
While here delete to sys/types.h includes when sys/param.h is also included.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 14:37:01 +00:00
Hartmut Brandt
892e9c9b57 Free the UNI vcc to the same zone from where it was allocated from.
This resulted in a panic when detaching the uni31 signalling manager.
2003-07-24 12:24:41 +00:00
Hartmut Brandt
fb4304eca0 Now that we have if_detach() don't try to get rid of all the interface
stuff (routes, ...) by hand - simply use if_detach().

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 week
2003-07-24 11:17:36 +00:00
Hartmut Brandt
ca4125f7b3 Create a subtree 'harp' of the net sysctl tree. This uses a fixed
OID as the other protocol family sub-trees do, that is equal to the
protocol family identifier. Make the ATM layer debugging flags
available under this tree.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 10:33:01 +00:00
Hartmut Brandt
56acf6178a Constify the arguments to several pdu_print functions. 2003-07-24 09:13:03 +00:00
Hartmut Brandt
5be9a825e2 Add BPF support to HARP network interfaces. This allows one to see
the traffic on LLC multiplexed connections (like CLIP).

PR:		kern/51831
Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 08:15:20 +00:00
Hartmut Brandt
3a1646de2a Handle the new MEDIA definitions. 2003-07-23 15:04:31 +00:00
Hartmut Brandt
06055f52f3 Convert a lot of uma_zalloc() calls to be NOWAIT instead of WAITOK. All
these may be called from contexts where we cannot sleep (callout handlers
for example).
2003-07-23 14:28:57 +00:00
Hartmut Brandt
e717cfbc40 Get rid of the zone for network interfaces. We have converted this to
use malloc(9).
2003-07-23 14:25:53 +00:00
Hartmut Brandt
05ab0ba3b5 Allocate network interfaces from malloc() instead of using a zone.
Usually one needs only a couple of them so using a zone is waste
of memory (esp. on multi-cpu systems).
2003-07-22 15:11:08 +00:00
Hartmut Brandt
b92ba02261 Remove the zone limits for all the zones used in the ATM code.
These were a left over from when the private memory pools were
converted to use uma zones. The limit of UMA zones, however,
works differently. When a zone is limited to only one or two pages
than, on multi-cpu systems, processes can get stuck on the zonelimit,
because all remaining free items are in caches of other CPUs.

Also add rudimentary error handling in some places (panic) when a zone
cannot be created.
2003-07-22 12:46:30 +00:00
Hartmut Brandt
3a6052bb2a Add several vendor, API and media definitions. This has been
forgotten in the previous commit to harp and should unbreak world.
2003-07-22 06:31:13 +00:00
Hartmut Brandt
084fb28576 Fix a number of occurences of calling uma_zalloc() with neither
M_WAITOK nor M_NOWAIT.
2003-07-18 16:36:41 +00:00
David E. O'Brien
81a6b595de Use __FBSDID(). 2003-06-11 07:22:30 +00:00
David E. O'Brien
f25de95508 Use __FBSDID(). 2003-06-11 07:11:35 +00:00
David E. O'Brien
f98c8ea46c Use __FBSDID(). 2003-06-11 07:06:31 +00:00
David E. O'Brien
050ae80c6f Use __FBSDID(). 2003-06-11 07:00:30 +00:00
David E. O'Brien
8368cf8f75 Use __FBSDID rather than rcsid[]. 2003-04-03 21:36:33 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
David E. O'Brien
1e44b62e0d There is no reason to be cute with ntohl(). Just call it directly rather
than use a macro that tries to do conversions in place.

Compile tested on:	sparc64
2003-02-23 22:26:39 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Poul-Henning Kamp
7baee2b7cf Band-XXX-aid an easy to provoke panic.
MFC:	2 weeks
2003-01-28 12:10:11 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Jens Schweikhardt
d64ada501a Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
2002-12-30 21:18:15 +00:00