Commit Graph

158 Commits

Author SHA1 Message Date
rwatson
a584d6bac9 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00
rwatson
7fc8e1f5e1 Mark netatm and netnatm explicitly as requiring Giant, as they still do.
MFC after:	3 days
2005-02-17 14:21:22 +00:00
imp
3618e2b1b6 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
phk
fb234edeec Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.
2004-11-08 14:44:54 +00:00
rwatson
189bd3e18c Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-18 22:19:43 +00:00
kan
f2f082edda Avoid casts as lvalues. 2004-07-28 06:59:55 +00:00
harti
35291d5f2c Fix a typo that could provoke a panic or access to random memory.
PR:		kern/67012
Submitted by:	Zhenmin <zli4@cs.uiuc.edu>
2004-07-19 12:54:00 +00:00
rwatson
f467792e5a The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality.  This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties.  The
following fields are moved to sb_state:

  SS_CANTRCVMORE            (so_state)
  SS_CANTSENDMORE           (so_state)
  SS_RCVATMARK              (so_state)

Rename respectively to:

  SBS_CANTRCVMORE           (so_rcv.sb_state)
  SBS_CANTSENDMORE          (so_snd.sb_state)
  SBS_RCVATMARK             (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts).  In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
2004-06-14 18:16:22 +00:00
rwatson
75ff0eae83 Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS
2004-06-12 20:47:32 +00:00
stefanf
1d1f53a52d Remove an #if section originally written for Sun compilers. 2004-06-08 13:46:31 +00:00
trhodes
f6877ab8d5 These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
 - in_cksum.[ch]:
   * use a generic C version instead of the assembly version in the !gcc
     case (ASM code breaks with the optimizations icc does)
     -> no bad checksums with an icc compiled kernel
     Help from:		andre, grehan, das
     Stolen from: 	alpha version via ppc version
     The entire checksum code should IMHO be replaced with the DragonFly
     version (because it isn't guaranteed future revisions of gcc will
     include similar optimizations) as in:
        ---snip---
          Revision  Changes    Path
          1.12      +1 -0      src/sys/conf/files.i386
          1.4       +142 -558  src/sys/i386/i386/in_cksum.c
          1.5       +33 -69    src/sys/i386/include/in_cksum.h
          1.5       +2 -0      src/sys/netinet/igmp.c
          1.6       +0 -1      src/sys/netinet/in.h
          1.6       +2 -0      src/sys/netinet/ip_icmp.c

          1.4       +3 -4      src/contrib/ipfilter/ip_compat.h
          1.3       +1 -2      src/sbin/natd/icmp.c
          1.4       +0 -1      src/sbin/natd/natd.c
          1.48      +1 -0      src/sys/conf/files
          1.2       +0 -1      src/sys/conf/files.amd64
          1.13      +0 -1      src/sys/conf/files.i386
          1.5       +0 -1      src/sys/conf/files.pc98
          1.7       +1 -1      src/sys/contrib/ipfilter/netinet/fil.c
          1.10      +2 -3      src/sys/contrib/ipfilter/netinet/ip_compat.h
          1.10      +1 -1      src/sys/contrib/ipfilter/netinet/ip_fil.c
          1.7       +1 -1      src/sys/dev/netif/txp/if_txp.c
          1.7       +1 -1      src/sys/net/ip_mroute/ip_mroute.c
          1.7       +1 -2      src/sys/net/ipfw/ip_fw2.c
          1.6       +1 -2      src/sys/netinet/igmp.c
          1.4       +158 -116  src/sys/netinet/in_cksum.c
          1.6       +1 -1      src/sys/netinet/ip_gre.c
          1.7       +1 -2      src/sys/netinet/ip_icmp.c
          1.10      +1 -1      src/sys/netinet/ip_input.c
          1.10      +1 -2      src/sys/netinet/ip_output.c
          1.13      +1 -2      src/sys/netinet/tcp_input.c
          1.9       +1 -2      src/sys/netinet/tcp_output.c
          1.10      +1 -1      src/sys/netinet/tcp_subr.c
          1.10      +1 -1      src/sys/netinet/tcp_syncache.c
          1.9       +1 -2      src/sys/netinet/udp_usrreq.c

          1.5       +1 -2      src/sys/netinet6/ipsec.c
          1.5       +1 -2      src/sys/netproto/ipsec/ipsec.c
          1.5       +1 -1      src/sys/netproto/ipsec/ipsec_input.c
          1.4       +1 -2      src/sys/netproto/ipsec/ipsec_output.c

          and finally remove
            sys/i386/i386        in_cksum.c
            sys/i386/include     in_cksum.h
        ---snip---
 - endian.h:
   * DTRT in C++ mode
 - quad.h:
   * we don't use gcc v1 anymore, remove support for it
   Suggested by:	bde (long ago)
 - assym.h:
   * avoid zero-length arrays (remove dependency on a gcc specific
     feature)
     This change changes the contents of the object file, but as it's
     only used to generate some values for a header, and the generator
     knows how to handle this, there's no impact in the gcc case.
   Explained by:	bde
   Submitted by:	Marius Strobl <marius@alchemy.franken.de>
 - aicasm.c:
   * minor change to teach it about the way icc spells "-nostdinc"
   Not approved by:	gibbs (no reply to my mail)
 - bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by:	-arch
Submitted by:	netchild
2004-03-12 21:45:33 +00:00
harti
e4202e0024 Don't remove the first mbuf in the chain if it got empty.
This removes the packet header in certain cases which later on
will give panic. Clarify what the atm_intr expects in the comment
and de-obscurify the code a little bit by replacing the portability
macros with the BSD names. The code isn't maintained externally anymore
so there's no point in keeping the extra level of obscurity.
2004-02-21 12:55:07 +00:00
rwatson
58c71ea6dd Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
bde
c62355cf68 Include <sys/malloc.h> for the declaration of malloc(), etc. instead
of depending on namespace pollution 2 layers deep in <vm/uma.h>.  Fixed
most nearby include messes (another like this, several the opposite of
this, and some formatting).
2003-11-14 21:02:10 +00:00
sam
0a78164dde o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
  operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
  calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
  Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
  have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by:	FreeBSD Foundation
2003-11-08 22:28:40 +00:00
brooks
4290fbacd1 Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
harti
a1507cf661 The number of prefixes can never be negative so use an u_int for this. 2003-07-29 13:46:43 +00:00
harti
da263ada5e Make the ioctl() interface cleaner with regard to types: use size_t
instead of int where the variable has to hold buffer lengths,
use u_int for things like number of network interfaces which
in principle can never be negative.
2003-07-29 13:32:10 +00:00
harti
bbd8b93e9c Silence a gcc-warning. Do this by inlining the macro-call. This is
not very nice - the compiler should just silently optimize away the
unused else clause.
2003-07-26 14:20:37 +00:00
harti
2f2cdfb150 Print the offending SPANS message only if printing is enabled. 2003-07-25 12:32:08 +00:00
harti
a727c2e451 Add support for VBR and CBR PVCs for IP over ATM.
Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-25 08:35:26 +00:00
harti
eaf68d7639 Set the interface type of the network interfaces to IFT_IPOVERATM(114).
This is specified by RFC2320.
2003-07-25 07:16:28 +00:00
harti
7ec7cd78cc Hand the packet to bpf not only in the LLC/SNAP case, but for all
connections. While this confuses tcpdump, it enables other applications
to see and analyze non-IP traffic (signalling, for example).

Pointed out by:	Vincent Jardin <vjardin@wanadoo.fr>
2003-07-25 06:43:41 +00:00
harti
ae7cc1c97f Make the debugging variable that controls printing of UNI messages
accessible as a sysctl and move the debugging stuff out of DIAGNOSTICS.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-25 06:39:46 +00:00
harti
782af2095e Make the debugging variable that controls dumping of IP over ATM packets
accessible as a sysctl.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 15:25:17 +00:00
harti
0c5fc17ba8 Create a sysctl that allows to enable/disable printing of SPANS messages.
While here delete to sys/types.h includes when sys/param.h is also included.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 14:37:01 +00:00
harti
36158ad3d8 Free the UNI vcc to the same zone from where it was allocated from.
This resulted in a panic when detaching the uni31 signalling manager.
2003-07-24 12:24:41 +00:00
harti
b1b0ed2a64 Now that we have if_detach() don't try to get rid of all the interface
stuff (routes, ...) by hand - simply use if_detach().

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 week
2003-07-24 11:17:36 +00:00
harti
1fd91a3dc1 Create a subtree 'harp' of the net sysctl tree. This uses a fixed
OID as the other protocol family sub-trees do, that is equal to the
protocol family identifier. Make the ATM layer debugging flags
available under this tree.

Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 10:33:01 +00:00
harti
7397e4d132 Constify the arguments to several pdu_print functions. 2003-07-24 09:13:03 +00:00
harti
79f1e880dd Add BPF support to HARP network interfaces. This allows one to see
the traffic on LLC multiplexed connections (like CLIP).

PR:		kern/51831
Submitted by:	Vincent Jardin <vjardin@wanadoo.fr>
MFC after:	2 weeks
2003-07-24 08:15:20 +00:00
harti
4a9393f207 Handle the new MEDIA definitions. 2003-07-23 15:04:31 +00:00
harti
ca7251f9ff Convert a lot of uma_zalloc() calls to be NOWAIT instead of WAITOK. All
these may be called from contexts where we cannot sleep (callout handlers
for example).
2003-07-23 14:28:57 +00:00
harti
8eae9989bc Get rid of the zone for network interfaces. We have converted this to
use malloc(9).
2003-07-23 14:25:53 +00:00
harti
9b6a4217c3 Allocate network interfaces from malloc() instead of using a zone.
Usually one needs only a couple of them so using a zone is waste
of memory (esp. on multi-cpu systems).
2003-07-22 15:11:08 +00:00
harti
6669cd1fe5 Remove the zone limits for all the zones used in the ATM code.
These were a left over from when the private memory pools were
converted to use uma zones. The limit of UMA zones, however,
works differently. When a zone is limited to only one or two pages
than, on multi-cpu systems, processes can get stuck on the zonelimit,
because all remaining free items are in caches of other CPUs.

Also add rudimentary error handling in some places (panic) when a zone
cannot be created.
2003-07-22 12:46:30 +00:00
harti
933c46faee Add several vendor, API and media definitions. This has been
forgotten in the previous commit to harp and should unbreak world.
2003-07-22 06:31:13 +00:00
harti
8376c648e1 Fix a number of occurences of calling uma_zalloc() with neither
M_WAITOK nor M_NOWAIT.
2003-07-18 16:36:41 +00:00
obrien
dad2a041ad Use __FBSDID(). 2003-06-11 07:22:30 +00:00
obrien
9acfe16945 Use __FBSDID(). 2003-06-11 07:11:35 +00:00
obrien
490ccf4a4b Use __FBSDID(). 2003-06-11 07:06:31 +00:00
obrien
18e346a162 Use __FBSDID(). 2003-06-11 07:00:30 +00:00
obrien
8d3944c4fc Use __FBSDID rather than rcsid[]. 2003-04-03 21:36:33 +00:00
jlemon
8d19d664ac Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
obrien
c163670ac7 There is no reason to be cute with ntohl(). Just call it directly rather
than use a macro that tries to do conversions in place.

Compile tested on:	sparc64
2003-02-23 22:26:39 +00:00
imp
1493fd6e76 Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
phk
81dfd6e5c2 Band-XXX-aid an easy to provoke panic.
MFC:	2 weeks
2003-01-28 12:10:11 +00:00
alfred
8f5153c3ea Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
schweikh
c353aec149 Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
schweikh
28d78933e7 Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
2002-12-30 21:18:15 +00:00