Commit Graph

104 Commits

Author SHA1 Message Date
csjp
2c4f67981e Fix the following bpf(4) race condition which can result in a panic:
(1) bpf peer attaches to interface netif0
	(2) Packet is received by netif0
	(3) ifp->if_bpf pointer is checked and handed off to bpf
	(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
	    initialized to NULL.
	(5) ifp->if_bpf is dereferenced by bpf machinery
	(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
  bpf interface structure. Once this is done, ifp->if_bpf should never be
  NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
  a lockless read bpf peer list associated with the interface. It should
  be noted that the bpf code will pickup the bpf_interface lock before adding
  or removing bpf peers. This should serialize the access to the bpf descriptor
  list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
  can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

	(1) Packet is received by netif0
	(2) Check to see if bpf descriptor list is empty
	(3) Pickup the bpf interface lock
	(4) Hand packet off to process

From the attach/detach side:

	(1) Pickup the bpf interface lock
	(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
    not we have any bpf peers that might be interested in receiving packets.

In collaboration with:	sam@
MFC after:	1 month
2006-06-02 19:59:33 +00:00
ru
dcace5669d Use sparse initializers for "struct domain" and "struct protosw",
so they are easier to follow for the human being.
2005-11-09 13:29:16 +00:00
thompsa
48c0bcb5c2 Move the cloned interface list management in to if_clone. For some drivers the
softc lists and associated mutex are now unused so these have been removed.

Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.

Idea by:	brooks
Reviewed by:	brooks
2005-11-08 20:08:34 +00:00
thompsa
d6130a4703 Change the reference counting to count the number of cloned interfaces for each
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.

Assert that all cloners have been destroyed when freeing the memory.

Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.

Discussed with:	brooks, pjd, -current
Reviewed by:	brooks
2005-10-12 19:52:16 +00:00
dwmalone
f1f0123e88 Fix some long standing bugs in writing to the BPF device attached to
a DLT_NULL interface. In particular:

        1) Consistently use type u_int32_t for the header of a
           DLT_NULL device - it continues to represent the address
           family as always.
        2) In the DLT_NULL case get bpf_movein to store the u_int32_t
           in a sockaddr rather than in the mbuf, to be consistent
           with all the DLT types.
        3) Consequently fix a bug in bpf_movein/bpfwrite which
           only permitted packets up to 4 bytes less than the MTU
           to be written.
        4) Fix all DLT_NULL devices to have the code required to
           allow writing to their bpf devices.
        5) Move the code to allow writing to if_lo from if_simloop
           to looutput, because it only applies to DLT_NULL devices
           but was being applied to other devices that use if_simloop
           possibly incorrectly.

PR:		82157
Submitted by:	Matthew Luckie <mjl@luckie.org.nz>
Approved by:	re (scottl)
2005-06-26 18:11:11 +00:00
brooks
44ce56b077 Initialze ifp->if_softc.
Submitted by:	ume
2005-06-13 17:17:07 +00:00
brooks
567ba9b00a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
sam
b76bf2322a the rt parameter to ifa_rtrequest callbacks should always be non-null;
eliminate grauitous ptr checks that follow ptr deref's

Noticed by:	Coverity Prevent analysis tool
2005-02-24 01:34:01 +00:00
ume
23e96af981 don't see NBPFILTER. 2005-01-11 07:17:33 +00:00
ume
28e58e1cdb remove HAVE_OLD_BPF part. 2005-01-11 07:14:37 +00:00
ume
ddb6478aa3 fix typo. 2005-01-11 07:05:56 +00:00
imp
a50ffc2912 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
phk
5c95d686a1 Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
brooks
e1dd867b55 Major overhaul of pseudo-interface cloning. Highlights include:
- Split the code out into if_clone.[ch].
 - Locked struct if_clone. [1]
 - Add a per-cloner match function rather then simply matching names of
   the form <name><unit> and <name>.
 - Use the match function to allow creation of <interface>.<tag>
   vlan interfaces.  The old way is preserved unchanged!
 - Also the match function to allow creation of stf(4) interfaces named
   stf0, stf, or 6to4.  This is the only major user visible change in
   that "ifconfig stf" creates the interface stf rather then stf0 and
   does not print "stf0" to stdout.
 - Allow destroy functions to fail so they can refuse to delete
   interfaces.  Currently, we forbid the deletion of interfaces which
   were created in the init function, particularly lo0, pflog0, and
   pfsync0.  In the case of lo0 this was a panic implementation so it
   does not count as a user visiable change. :-)
 - Since most interfaces do not need the new functionality, an family of
   wrapper functions, ifc_simple_*(), were created to wrap old style
   cloner functions.
 - The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
   IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
   instead.

Submitted by:   Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by:    andre, mlaier
Discussed on:	net
2004-06-22 20:13:25 +00:00
phk
f43aa0c4bc add missing #include <sys/module.h> 2004-05-30 20:27:19 +00:00
brooks
3d1134df29 Use an tempory struct ifnet *ifp instead of sc->sc_if to access the
ifnet in stf_clone_create.  Also use if_printf() instead of printf().
2004-04-19 05:06:27 +00:00
brooks
6a86b01672 Staticize <if>_clone_{create,destroy} functions.
Reviewed by:	mlaier
2004-04-14 00:57:49 +00:00
rwatson
19c49ad2c7 Introduce stf_mtx to protect global softc list in if_stf. Add
stf_destroy() to handle the common softc destruction path for the
two destruction sources: interface cloning destroy, and module
unload.

NOTE: sc_ro, the cached route for stf conversion, is not synchronized
against concurrent access in this change, that will follow in a future
change.

Reviewed by:	pjd
2004-03-09 20:29:19 +00:00
rwatson
a99c4f7687 Const-poison ip_stf_ttl to make it clear that the variable is not
modified at run-time.
2004-03-07 05:15:42 +00:00
sam
c165a87f8d o eliminate widespread on-stack mbuf use for bpf by introducing
a new bpf_mtap2 routine that does the right thing for an mbuf
  and a variable-length chunk of data that should be prepended.
o while we're sweeping the drivers, use u_int32_t uniformly when
  when prepending the address family (several places were assuming
  sizeof(int) was 4)
o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated
  mbufs have been eliminated; this may better be moved to the bpf
  routines

Reviewed by:	arch@ and several others
2003-12-28 03:56:00 +00:00
brooks
f1e94c6f29 Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
sam
9d93fce265 Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents.  Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
  may be returned; this was never used and added unwarranted complexity
  for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
  we assume the parent will remain as long as the clone; doing this avoids
  a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
   applications cannot/do-no know about mutex's.  Doing this requires
   that the mutex be the last element in the structure.  A better solution
   is to introduce an externalized version of struct rtentry but this is
   a major task because of the intertwining of rtentry and other data
   structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
   work to eliminate many held references.  If not these will be resolved
   prior to release.
3. ATM changes are untested.

Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS (partly)
2003-10-04 03:44:50 +00:00
jlemon
04e28d5a81 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
suz
c5f56c61a9 sync with KAME to simplify rev 1.28's patch (no functional changes)
Obtained from: KAME
Reviewd by: fenner
Approved by: re (jhb)
2003-01-15 20:09:52 +00:00
fenner
24aad6f8c0 Fix alignment problems -- the embedded v4 address is guaranteed to
be only 16-bit aligned, so only do byte operations to compare with it.
2003-01-05 14:03:26 +00:00
sam
6a05792540 network interface and link layer changes:
o on input don't strip the Ethernet header from packets
o input packet handling is now done with if_input
o track changes to ether_ifattach/ether_ifdetach API
o track changes to bpf tapping
o call ether_ioctl for default handling of ioctl's
o use constants from net/ethernet.h where possible

Reviewed by:	many
Approved by:	re
2002-11-15 00:00:15 +00:00
rwatson
3937ec0aed When packets pass in and out of six-to-four (STF) tunnels, perform
labeling checks and operations as with other network interfaces.
Eventually, if it proves desirable, we might want to offer special
casing of this or other tunnel interfaces where we have an existing
label of interest, rather than treating it as though it's an
entirely fresh mbuf in the incoming/outgoing encapsulation directions.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-10-20 22:39:55 +00:00
sam
2a86be217a Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
ume
3596f40025 - increment interface output counter. sync w/ netbsd-current
- increase if_oerrors.  sync w/netbsd

Obtained from:	KAME
2002-09-17 14:25:19 +00:00
ume
e26d348adb - reject SIOCSIFADDR if embedded address is in private address range
- reject packets from private address range.  from hitachi

Obtained from:	KAME
2002-09-17 10:45:51 +00:00
brooks
6cfd5a5a1d Move all unit number management cloned interfaces into the cloning
code.  The reverts the API change which made the <if>_clone_destory()
functions return an int instead of void bringing us into closer
alignment with NetBSD.

Reviewed by:	net (a long time ago)
2002-05-25 20:17:04 +00:00
suz
553226e8e1 just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by:	ume
MFC after:	1 week
2002-04-19 04:46:24 +00:00
alfred
c9985516e4 Remove __P. 2002-03-19 21:54:18 +00:00
mux
9a5e4c88a3 Simplify the interface cloning framework by handling unit
unit allocation with a bitmap in the generic layer.  This
allows us to get rid of the duplicated rman code in every
clonable interface.

Reviewed by:	brooks
Approved by:	phk
2002-03-11 09:26:07 +00:00
brooks
50d3be4c82 Change the network interface cloning API so the destroy function returns
an int errorcode instead of void in preperation for merging cloning of
the loopback device.

Submitted by:	mux
MFC after:	2 weeks
2002-03-04 21:43:49 +00:00
peter
d31ac9bebb Fix warnings. 2002-02-28 00:09:17 +00:00
msmith
e814937e0b Staticise private interface lists. 2002-01-08 10:30:09 +00:00
arr
d502ffe4f0 - malloc should be passed M_WAITOK, not M_WAIT (a mbuf flag)
- make use of M_ZERO to remove a call to bzero()
2001-12-07 01:32:40 +00:00
ru
ecb4d3d05f Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument.  Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest.  3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works.  Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''.  It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from:	BSD/OS, NetBSD
MFC after:	1 month
PR:		kern/28360
2001-10-17 18:07:05 +00:00
jlemon
e2b58d95e0 Use in_ifaddrhashtbl instead of in_ifaddrhead to look up IP address. 2001-09-29 05:02:36 +00:00
brooks
3e9d16ac4c Make stf a clonable device.
Yes this really is rather silly and the implementation is overkill given
that you are only allowed one of them, but NetBSD implements cloning on
this device and it's a less cluttered example of cloning then most.
2001-09-19 00:13:00 +00:00
julian
3cc9960fd1 Patches from KAME to remove usage of Varargs in existing
IPV4 code. For now they will still have some in the developing stuff (IPv6)

Submitted by:	Keiichi SHIMA / <keiichi@iij.ad.jp>
Obtained from:	KAME
2001-09-07 07:19:12 +00:00
julian
071f86f9f1 Patches from Keiichi SHIMA <keiichi@iij.ad.jp>
to make ip use the standard protosw structure again.

Obtained from: Well, KAME I guess.
2001-09-03 20:03:55 +00:00
brooks
e7b9bc714f gif(4) and stf(4) modernization:
- Remove gif dependencies from stf.
 - Make gif and stf into modules
 - Make gif cloneable.

PR:		kern/27983
Reviewed by:	ru, ume
Obtained from:	NetBSD
MFC after:	1 week
2001-07-02 21:02:09 +00:00
ume
04561e1934 inject outbound packet to BPF.
Submitted by:	itojun
Obtained from:	KAME
MFC after:	10 days
2001-06-24 14:52:55 +00:00
ume
832f8d2249 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
phk
2ef21ddcb9 Use <sys/queue.h> macro api rather than fondle its implementation detals.
Created with:	/usr/bin/sed
Reviewed by:	/sbin/md5
2001-02-03 11:46:35 +00:00
bmilekic
4b6a7bddad * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.
2000-12-21 21:44:31 +00:00
jlemon
954e1d2ccd Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue.  Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue.  An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
2000-11-25 07:35:38 +00:00
phk
54ca48450c Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning.  The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by:   various.
Significant brucifications by:  bde
2000-10-27 11:45:49 +00:00
itojun
7059b1cfa8 repair endianness issue in IN_MULTICAST().
again, *BSD difference...

From: Nick Sayer <nsayer@quack.kfu.com>
2000-08-15 07:34:08 +00:00
itojun
5f4e854de1 sync with kame tree as of july00. tons of bug fixes/improvements.
API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
  (also syntax change)
2000-07-04 16:35:15 +00:00