Commit Graph

267 Commits

Author SHA1 Message Date
Max Laier
02b199f158 Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by:	(i386)LINT
2004-06-13 17:29:10 +00:00
Maxim Konovalov
a49b21371a o Calculate a number of bytes to copy (cnt) correctly:
+----+-+-+-+-+----+----+- - - - - - - - - - - -  -+----+
  |    | |C| | |    |    |                          |    |
  | IP |N|O|L|P|    | IP |                          | IP |
  | #1 |O|D|E|T|    | #2 |                          | #n |
  |    |P|E|N|R|    |    |                          |    |
  +----+-+-+-+-+----+----+- - - - - - - - - - - -  -+----+
               ^    ^<---- cnt - (IPOPT_MINOFF - 1) ---->|
               |    |
src            |    +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr)
               |
dst            +-- cp[IPOPT_OFF + 1]

PR:		kern/66386
Submitted by:	Andrei Iltchenko
MFC after:	3 weeks
2004-05-11 19:14:44 +00:00
Darren Reed
2f3f1e6773 Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier. 2004-05-02 15:10:17 +00:00
Darren Reed
ab884d993e Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside
all of the other macros that work ok mbuf's and tag's.
2004-05-02 06:36:30 +00:00
Luigi Rizzo
e6e51f0518 In an effort to simplify the routing code, try to deprecate rtalloc()
in favour of rtalloc_ign(), which is what would end up being called
anyways.

There are 25 more instances of rtalloc() in net*/ and
about 10 instances of rtalloc_ign()
2004-04-14 01:13:14 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Ruslan Ermilov
390cdc6a76 Fixed a bug in previous revision: compute the payload checksum before
we convert ip_len into a network byte order; in_delayed_cksum() still
expects it in host byte order.

The symtom was the ``in_cksum_skip: out of data by %d'' complaints
from the kernel.

To add to the previous commit log.  These fixes make tcpdump(1) happy
by not complaining about UDP/TCP checksum being bad for looped back
IP multicast when multicast router is deactivated.

Reported by:	Vsevolod Lobko
2004-04-07 10:01:39 +00:00
Ruslan Ermilov
26f16ebeb1 Untangle IP multicast routing interaction with delayed payload checksums.
Compute the payload checksum for a locally originated IP multicast where
God intended, in ip_mloopback(), rather than doing it in ip_output() and
only when multicast router is active.  This is more correct as we do not
fool ip_input() that the packet has the correct payload checksum when in
fact it does not (when multicast router is inactive).  This is also more
efficient if we don't join the multicast group we send to, thus allowing
the hardware to checksum the payload.
2004-03-25 08:46:27 +00:00
Max Laier
4672d81921 Two minor follow-ups on the MT_TAG removal:
ifp is now passed explicitly to ether_demux; no need to look it up again.
Make mtag a global var in ip_input.

Noticed by:	rwatson
Approved by:	bms(mentor)
2004-03-02 14:37:23 +00:00
Max Laier
ac9d7e2618 Re-remove MT_TAGs. The problems with dummynet have been fixed now.
Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam
2004-02-25 19:55:29 +00:00
Max Laier
36e8826ffb Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)
2004-02-18 00:04:52 +00:00
Hajimu UMEMOTO
70dbc6cbfc don't update outgoing ifp, if ipsec tunnel mode encapsulation
was not made.

Obtained from:	KAME
2004-02-16 17:05:06 +00:00
Max Laier
1094bdca51 This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).

This is (mostly) work from: sam

Silence from: -arch
Approved by: bms(mentor), sam, rwatson
2004-02-13 19:14:16 +00:00
Bruce M Simpson
1cfd4b5326 Initial import of RFC 2385 (TCP-MD5) digest support.
This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.

For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.

Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.

There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.

Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.

This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.

Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.

Sponsored by:	sentex.net
2004-02-11 04:26:04 +00:00
Hajimu UMEMOTO
f073c60f73 pass pcb rather than so. it is expected that per socket policy
works again.
2004-02-03 18:20:55 +00:00
Andre Oppermann
e0f630ea7a Do not set the ip_id to zero when DF is set on packet and
restore the general pre-randomid behaviour.

Setting the ip_id to zero causes several problems with
packet reassembly when a device along the path removes
the DF bit for some reason.

Other BSD and Linux have found and fixed the same issues.

PR:		kern/60889
Tested by:	Richard Wendland <richard@wendland.org.uk>
Approved by:	re (scottl)
2004-01-08 11:13:40 +00:00
Andre Oppermann
97d8d152c2 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
Andre Oppermann
26d02ca7ba Remove RTF_PRCLONING from routing table and adjust users of it
accordingly.  The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache.  The
network stack remains fully useable with this change.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 19:47:31 +00:00
Andre Oppermann
02c1c7070e Remove the global one-level rtcache variable and associated
complex locking and rework ip_rtaddr() to do its own rtlookup.
Adopt all its callers to this and make ip_output() callable
with NULL rt pointer.

Reviewed by:	sam (mentor)
2003-11-14 21:48:57 +00:00
Andre Oppermann
9188b4a169 Introduce ip_fastforward and remove ip_flow.
Short description of ip_fastforward:

 o adds full direct process-to-completion IPv4 forwarding code
 o handles ip fragmentation incl. hw support (ip_flow did not)
 o sends icmp needfrag to source if DF is set (ip_flow did not)
 o supports ipfw and ipfilter (ip_flow did not)
 o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
 o returns anything it can't handle back to normal ip_input

Enable with sysctl -w net.inet.ip.fastforwarding=1

Reviewed by:	sam (mentor)
2003-11-14 21:02:22 +00:00
Andre Oppermann
2683ceb661 Do not fragment a packet with hardware assistance if it has the DF
bit set.

Reviewed by:	sam (mentor)
2003-11-12 23:35:40 +00:00
Sam Leffler
8484384564 assert optional inpcb is passed in locked
Supported by:	FreeBSD Foundation
2003-11-08 23:03:29 +00:00
Hajimu UMEMOTO
0f9ade718d - cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all.  secpolicy no longer contain
  spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy.  assign ID field to
  all SPD entries.  make it possible for racoon to grab SPD entry on
  pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header.  a mode is always needed
  to compare them.
- fixed that the incorrect time was set to
  sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
  because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
  alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
  XXX in theory refcnt should do the right thing, however, we have
  "spdflush" which would touch all SPs.  another solution would be to
  de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion.  ipsec_*_policy ->
  ipsec_*_pcbpolicy.
- avoid variable name confusion.
  (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
  secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
  can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
  "src" of the spidx specifies ICMP type, and the port field in "dst"
  of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
  kernel forwards the packets.

Tested by:	nork
Obtained from:	KAME
2003-11-04 16:02:05 +00:00
Robert Watson
3de758d3e3 Note that when ip_output() is called from ip_forward(), it will already
have its options inserted, so the opt argument to ip_output()  must be
NULL.
2003-11-03 18:03:05 +00:00
Sam Leffler
d1dd20be6e Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents.  Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
  may be returned; this was never used and added unwarranted complexity
  for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
  we assume the parent will remain as long as the clone; doing this avoids
  a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
   applications cannot/do-no know about mutex's.  Doing this requires
   that the mutex be the last element in the structure.  A better solution
   is to introduce an externalized version of struct rtentry but this is
   a major task because of the intertwining of rtentry and other data
   structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
   work to eliminate many held references.  If not these will be resolved
   prior to release.
3. ATM changes are untested.

Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS (partly)
2003-10-04 03:44:50 +00:00
Sam Leffler
134ea22494 o update PFIL_HOOKS support to current API used by netbsd
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules

Heavy lifting by:	"Max Laier" <max@love2party.net>
Supported by:		FreeBSD Foundation
Obtained from:		NetBSD (bits of pfil.h and pfil.c)
2003-09-23 17:54:04 +00:00
Mike Silbersack
3390d47670 Implement MBUF_STRESS_TEST mark II.
Changes from the original implementation:

- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed.  Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.

- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments.  While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.

- Add two modes of random fragmentation.  Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
2003-09-01 05:55:37 +00:00
Bruce M Simpson
8afa230470 Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on
specific interfaces. This is required by aodvd, and may in future help us
in getting rid of the requirement for BPF from our import of isc-dhcp.

Suggested by:   fenestro
Obtained from:  BSD/OS
Reviewed by:    mini, sam
Approved by:    jake (mentor)
2003-08-20 14:46:40 +00:00
Jeffrey Hsu
1e78ac216e 1. Basic PIM kernel support
Disabled by default. To enable it, the new "options PIM" must be
added to the kernel configuration file (in addition to MROUTING):

options	MROUTING		# Multicast routing
options	PIM			# Protocol Independent Multicast

2. Add support for advanced multicast API setup/configuration and
extensibility.

3. Add support for kernel-level PIM Register encapsulation.
Disabled by default.  Can be enabled by the advanced multicast API.

4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".

Submitted by:	Pavlin Radoslavov <pavlin@icir.org>
2003-08-07 18:16:59 +00:00
Mike Silbersack
7dc7f0311e Minor fix to the MBUF_STRESS_TEST code so that it keeps
pkthdr.len consistant at all times.  (Some debugging
code I'm working on is tripped otherwise.)

MFC after:	3 days
2003-07-19 05:50:32 +00:00
Garrett Wollman
6e49b1fe55 Don't generate an ip_id for packets with the DF bit set; ip_id is
only meaningful for fragments.  Also don't bother to byte-swap the
ip_id when we do generate it; it is only used at the receiver as a
nonce.  I tried several different permutations of this code with no
measurable difference to each other or to the unmodified version, so
I've settled on the one for which gcc seems to generate the best code.
(If anyone cares to microoptimize this differently for an architecture
where it actually matters, feel free.)

Suggested by:	Steve Bellovin's paper in IMW'02
2003-05-31 17:55:21 +00:00
Matthew N. Dodd
4957466b8e IP_RECVTTL socket option.
Reviewed by:	Stuart Cheshire <cheshire@apple.com>
2003-04-29 21:36:18 +00:00
Mike Silbersack
53dcc544a8 Rename MBUF_FRAG_TEST to MBUF_STRESS_TEST as it will be extended
to include more than just frag tests.
2003-04-12 06:11:46 +00:00
Dag-Erling Smørgrav
fe58453891 Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header.  Use it instead of hand-
rolled versions wherever applicable.

Submitted by:	Hiten Pandya <hiten@unixdaemons.com>
2003-04-08 14:25:47 +00:00
Dag-Erling Smørgrav
212059bd83 Replace memcpy() and ovbcopy() with bcopy(); ditch some caddr_t usage. 2003-04-04 12:14:00 +00:00
Matthew N. Dodd
2c56e246fa Back out support for RFC3514.
RFC3514 poses an unacceptale risk to compliant systems.
2003-04-02 20:14:44 +00:00
Matthew N. Dodd
4f6425f7ae - Use the correct constant define.
- Add a missing break.
2003-04-02 18:02:58 +00:00
Matthew N. Dodd
8faf6df9b3 Sync constant define with NetBSD.
Requested by:	 Tom Spindler <dogcow@babymeat.com>
2003-04-02 10:28:47 +00:00
Matthew N. Dodd
09139a4537 Implement support for RFC 3514 (The Security Flag in the IPv4 Header).
(See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt)

This fulfills the host requirements for userland support by
way of the setsockopt() IP_EVIL_INTENT message.

There are three sysctl tunables provided to govern system behavior.

	net.inet.ip.rfc3514:

		Enables support for rfc3514.  As this is an
		Informational RFC and support is not yet widespread
		this option is disabled by default.

	net.inet.ip.hear_no_evil

		 If set the host will discard all received evil packets.

	net.inet.ip.speak_no_evil

		If set the host will discard all transmitted evil packets.

The IP statistics counter 'ips_evil' (available via 'netstat') provides
information on the number of 'evil' packets recieved.

For reference, the '-E' option to 'ping' has been provided to demonstrate
and test the implementation.
2003-04-01 08:21:44 +00:00
Maxime Henrion
511e01e2d6 Try to make the MBUF_FRAG_TEST code work better.
- Don't try to fragment the packet if it's smaller than mbuf_frag_size.
- Preserve the size of the mbuf chain which is modified by m_split().
- Check that m_split() didn't return NULL.
- Make it so we don't end up with two M_PKTHDR mbuf in the chain.
- Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole
  chain and not just the first mbuf.
- Fix a nearby style bug and rework the logic of the loops so that it's
  more clear.

This is still not quite right, because we're clearly abusing m_split() to
do something it was not designed for, but at least it works now.  We
should probably move this code into a m_fragment() function when it's
correct.
2003-03-25 23:49:14 +00:00
Mike Silbersack
9d9edc5693 Add the MBUF_FRAG_TEST option. When compiled in, this option
allows you to tell ip_output to fragment all outgoing packets
into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes.
This is an excellent way to test if network drivers can properly
handle long mbuf chains being passed to them.

net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation)
so that you can at least boot before your network driver dies. :)
2003-03-25 05:45:05 +00:00
Jonathan Lemon
8608c4c1f9 Remove unused variables in the IPSEC case.
Submitted by:  Lars Eggert <larse@ISI.EDU>
2003-02-20 18:22:21 +00:00
Jonathan Lemon
340c35de6a Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block.  Allow the socket and tcpcb structures to be freed
earlier than inpcb.  Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs
2003-02-19 22:32:43 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Sam Leffler
9359ad861e FAST_IPSEC bandaid: act like KAME and ignore ENOENT error codes from
ipsec4_process_packet; they happen when a packet is dropped because
an SA acquire is initiated

Submitted by:	Doug Ambrisko <ambrisko@verniernetworks.com>
2003-01-30 05:45:45 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Luigi Rizzo
b375c9ec2c Back out the ip_fragment() code -- it is not urgent to have it in now,
I will put it back in in a better form after 5.0 is out.

Requested by: sam, rwatson, luigi (on second thought)
Approved by: re
2002-11-20 18:56:25 +00:00
Luigi Rizzo
3e372e140c Move the ip_fragment code from ip_output() to a separate function,
so that it can be reused elsewhere (there is a number of places
where it can be useful). This also trims some 200 lines from
the body of ip_output(), which helps readability a bit.

(This change was discussed a few weeks ago on the mailing lists,
Julian agreed, silence from others. It is not a functional change,
so i expect it to be ok to commit it now but i am happy to back it
out if there are objections).

While at it, fix some function headers and replace m_copy() with
m_copypacket() where applicable.

MFC after: 1 week
2002-11-17 16:30:44 +00:00
Luigi Rizzo
bbb4330b61 Massive cleanup of the ip_mroute code.
No functional changes, but:

  + the mrouting module now should behave the same as the compiled-in
    version (it did not before, some of the rsvp code was not loaded
    properly);
  + netinet/ip_mroute.c is now truly optional;
  + removed some redundant/unused code;
  + changed many instances of '0' to NULL and INADDR_ANY as appropriate;
  + removed several static variables to make the code more SMP-friendly;
  + fixed some minor bugs in the mrouting code (mostly, incorrect return
    values from functions).

This commit is also a prerequisite to the addition of support for PIM,
which i would like to put in before DP2 (it does not change any of
the existing APIs, anyways).

Note, in the process we found out that some device drivers fail to
properly handle changes in IFF_ALLMULTI, leading to interesting
behaviour when a multicast router is started. This bug is not
corrected by this commit, and will be fixed with a separate commit.

Detailed changes:
--------------------
netinet/ip_mroute.c     all the above.
conf/files              make ip_mroute.c optional
net/route.c             fix mrt_ioctl hook
netinet/ip_input.c      fix ip_mforward hook, move rsvp_input() here
                        together with other rsvp code, and a couple
                        of indentation fixes.
netinet/ip_output.c     fix ip_mforward and ip_mcast_src hooks
netinet/ip_var.h        rsvp function hooks
netinet/raw_ip.c        hooks for mrouting and rsvp functions, plus
                        interface cleanup.
netinet/ip_mroute.h     remove an unused and optional field from a struct

Most of the code is from Pavlin Radoslavov and the XORP project

Reviewed by: sam
MFC after: 1 week
2002-11-15 22:53:53 +00:00
Sam Leffler
ab94ca3cec correct fast ipsec logic: compare destination ip address against the
contents of the SA, not the SP

Submitted by:	"Doug Ambrisko" <ambrisko@verniernetworks.com>
2002-11-08 23:11:02 +00:00
Poul-Henning Kamp
53be11f680 Fix two instances of variant struct definitions in sys/netinet:
Remove the never completed _IP_VHL version, it has not caught on
anywhere and it would make us incompatible with other BSD netstacks
to retain this version.

Add a CTASSERT protecting sizeof(struct ip) == 20.

Don't let the size of struct ipq depend on the IPDIVERT option.

This is a functional no-op commit.

Approved by:	re
2002-10-20 22:52:07 +00:00
Sam Leffler
b9234fafa0 Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas.  Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by:	KAME, rwatson
Approved by:	silence
Supported by:	Vernier Networks
2002-10-16 02:25:05 +00:00
Sam Leffler
5d84645305 Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
  ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
  use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
  inpcb parameter to ip_output and ip6_output to allow the IPsec code to
  locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by:	julian, luigi (silent), -arch, -net, darren
Approved by:	julian, silence from everyone else
Obtained from:	openbsd (mostly)
MFC after:	1 month
2002-10-16 01:54:46 +00:00
Maxim Konovalov
cb7641e85b Slightly rearrange a code in rev. 1.164:
o Move len initialization closer to place of its first usage.
o Compare len with 0 to improve readability.
o Explicitly zero out phlen in ip_insertoptions() in failure case.

Suggested by:   jhb
Reviewed by:    jhb
MFC after:      2 weeks
2002-09-23 08:56:24 +00:00
Maxim Konovalov
e079ba8d93 In rare cases when there is no room for ip options ip_insertoptions()
can fail and corrupt a header length. Initialize len and check what
ip_insertoptions() returns.

Reviewed by:	archie, silence on -net
MFC after:	5 days
2002-09-17 11:13:04 +00:00
Robert Watson
4ed84624a2 Introduce support for Mandatory Access Control and extensible
kernel access control.

When fragmenting an IP datagram, invoke an appropriate MAC entry
point so that MAC labels may be copied (...) to the individual
IP fragment mbufs by MAC policies.

When IP options are inserted into an IP datagram when leaving a
host, preserve the label if we need to reallocate the mbuf for
alignment or size reasons.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 17:21:01 +00:00
Luigi Rizzo
3956b02345 Avoid dereferencing a null pointer in ro_rt.
This was always broken in HEAD (the offending statement was introduced
in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev.
1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30.

So I am also restoring these two lines in RELENG_4 now.
We might need another few things from 1.99.2.30.
2002-07-12 22:08:47 +00:00
Maxime Henrion
7627c6cbcc Warning fixes for 64 bits platforms. With this last fix,
I can build a GENERIC sparc64 kernel with -Werror.

Reviewed by:	luigi
2002-06-27 11:02:06 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Luigi Rizzo
51aed12e52 fix bad indentation and whitespace resulting from cut&paste 2002-06-23 09:15:43 +00:00
Luigi Rizzo
2b25acc158 Remove (almost all) global variables that were used to hold
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.

The variables removed by this change are:

        ip_divert_cookie        used by divert sockets
        ip_fw_fwd_addr          used for transparent ip redirection
        last_pkt                used by dynamic pipes in dummynet

Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().

On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.

Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.

option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.

NOTES:
 * there is at least one global variable left, sro_fwd, in ip_output().
   I am not sure if/how this can be removed.

 * I have deliberately avoided gratuitous style changes in this commit
   to avoid cluttering the diffs. Minor stule cleanup will likely be
   necessary

 * this commit only focused on the IP layer. I am sure there is a
   number of global variables used in the TCP and maybe UDP stack.

 * despite the number of files touched, there are absolutely no API's
   or data structures changed by this commit (except the interfaces of
   ip_fw_chk() and dummynet_io(), which are internal anyways), so
   an MFC is quite safe and unintrusive (and desirable, given the
   improved readability of the code).

MFC after: 10 days
2002-06-22 11:51:02 +00:00
Andrew R. Reiter
db40007d42 - Change the newly turned INVARIANTS #ifdef blocks (they were changed from
DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code
  readability.
2002-05-21 18:52:24 +00:00
Andrew R. Reiter
4cb674c960 - Turn a few DIAGNOSTIC into INVARIANTS since they are really sanity
checks.
2002-05-20 22:05:13 +00:00
Luigi Rizzo
d60315bef5 Cleanup the interface to ip_fw_chk, two of the input arguments
were totally useless and have been removed.

ip_input.c, ip_output.c:
    Properly initialize the "ip" pointer in case the firewall does an
    m_pullup() on the packet.

    Remove some debugging code forgotten long ago.

ip_fw.[ch], bridge.c:
    Prepare the grounds for matching MAC header fields in bridged packets,
    so we can have 'etherfw' functionality without a lot of kernel and
    userland bloat.
2002-05-09 10:34:57 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Ruslan Ermilov
e3f406b3c1 Prevent icmp_reflect() from calling ip_output() with a NULL route
pointer which will then result in the allocated route's reference
count never being decremented.  Just flood ping the localhost and
watch refcnt of the 127.0.0.1 route with netstat(1).

Submitted by:	jayanth

Back out ip_output.c,v 1.143 and ip_mroute.c,v 1.69 that allowed
ip_output() to be called with a NULL route pointer.  The previous
paragraph shows why this was a bad idea in the first place.

MFC after:	0 days
2002-03-22 16:45:54 +00:00
Alfred Perlstein
4d77a549fe Remove __P. 2002-03-19 21:25:46 +00:00
Mike Barcroft
fd8e4ebc8c o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
  source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
  Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
  POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
  and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
  complexities associated with having MD (asm and inline) versions, and
  having to prevent exposure of these functions in other headers that
  happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
  third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on:	alpha, i386
Reviewed by:	bde, jake, tmm
2002-02-18 20:35:27 +00:00
Ruslan Ermilov
51c8ec4a3d Moved the 127/8 check below so that IPF redirects have a chance of working.
MFC after:	1 day
2002-02-15 12:19:03 +00:00
Hajimu UMEMOTO
a4a6e77341 - Check the address family of the destination cached in a PCB.
- Clear the cached destination before getting another cached route.
  Otherwise, garbage in the padding space (which might be filled in if it was
  used for IPv4) could annoy rtalloc.

Obtained from:	KAME
2002-01-21 20:04:22 +00:00
Ruslan Ermilov
8c3f5566ae RFC1122 requires that addresses of the form { 127, <any> } MUST NOT
appear outside a host.

PR:		30792, 33996
Obtained from:	ip_input.c
MFC after:	1 week
2002-01-21 13:59:42 +00:00
Bill Fenner
92bdb2fa39 Pre-calculate the checksum for multicast packets sourced on a
multicast router.  This is overkill; it should be possible to
delay to hardware interfaces and only pre-calculate when forwarding
to a tunnel.
2002-01-05 18:23:53 +00:00
Julian Elischer
3efc30142c Fix ipfw fwd so that it acts as the docs say
when forwarding an incoming packet to another machine.

Obtained from:	Vicor Production tree
MFC after: 3 weeks
2001-12-28 21:21:57 +00:00
Yaroslav Tykhiy
3f9e31220b Don't try to free a NULL route when doing IPFIREWALL_FORWARD.
An old route will be NULL at that point if a packet were initially
routed to an interface (using the IP_ROUTETOIF flag.)

Submitted by:	Igor Timkin <ivt@gamma.ru>
2001-12-19 14:54:13 +00:00
Jonathan Lemon
aa1f5daa31 whitespace and style fixes recovered from -stable. 2001-12-14 19:34:11 +00:00
Ruslan Ermilov
04d59553b2 Allow for ip_output() to be called with a NULL route pointer.
This fixes a panic I introduced yesterday in ip_icmp.c,v 1.64.
2001-12-01 13:48:16 +00:00
Luigi Rizzo
7b109fa404 MFS: sync the ipfw/dummynet/bridge code with the one recently merged
into stable (mostly , but not only, formatting and comments changes).
2001-11-04 22:56:25 +00:00
Bill Paul
3528d68f71 Fix a (long standing?) bug in ip_output(): if ip_insertoptions() is
called and ip_output() encounters an error and bails (i.e. host
unreachable), we will leak an mbuf. This is because the code calls
m_freem(m0) after jumping to the bad: label at the end of the function,
when it should be calling m_freem(m). (m0 is the original mbuf list
_without_ the options mbuf prepended.)

Obtained from:	NetBSD
2001-10-30 18:15:48 +00:00
Jonathan Lemon
35609d458d When dropping a packet because there is no room in the queue (which itself
is somewhat bogus), update the statistics to indicate something was dropped.

PR: 13740
2001-10-30 14:58:27 +00:00
Paul Saab
db69a05dce Make it so dummynet and bridge can be loaded as modules.
Submitted by:	billf
2001-10-05 05:45:27 +00:00
Jonathan Lemon
ca925d9c17 Add a hash table that contains the list of internet addresses, and use
this in place of the in_ifaddr list when appropriate.  This improves
performance on hosts which have a large number of IP aliases.
2001-09-29 04:34:11 +00:00
Jonathan Lemon
9a10980e2a Centralize satosin(), sintosa() and ifatoia() macros in <netinet/in.h>
Remove local definitions.
2001-09-29 03:23:44 +00:00
Luigi Rizzo
830cc17841 Two main changes here:
+ implement "limit" rules, which permit to limit the number of sessions
   between certain host pairs (according to masks). These are a special
   type of stateful rules, which might be of interest in some cases.
   See the ipfw manpage for details.

 + merge the list pointers and ipfw rule descriptors in the kernel, so
   the code is smaller, faster and more readable. This patch basically
   consists in replacing "foo->rule->bar" with "rule->bar" all over
   the place.
   I have been willing to do this for ages!

MFC after: 1 week
2001-09-27 23:44:27 +00:00
Brooks Davis
9494d5968f Make faith loadable, unloadable, and clonable. 2001-09-25 18:40:52 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Jonathan Lemon
f9132cebdc Wrap array accesses in macros, which also happen to be lvalues:
ifnet_addrs[i - 1]  -> ifaddr_byindex(i)
        ifindex2ifnet[i]    -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.
2001-09-06 02:40:43 +00:00
Daniel C. Sobral
07203494d2 MFS: Avoid dropping fragments in the absence of an interface address.
Noticed by:	fenner
Submitted by:	iedowse
Not committed to current by:	iedowse ;-)
2001-08-03 17:36:06 +00:00
Ruslan Ermilov
38c1bc358b Avoid a NULL pointer derefence introduced in rev. 1.129.
Problem noticed by:	bde, gcc(1)
Panic caught by:	mjacob
Patch tested by:	mjacob
2001-07-23 16:50:01 +00:00
Ruslan Ermilov
f2c2962ee5 Backout non-functional changes from revision 1.128.
Not objected to by:	dcs
2001-07-19 07:10:30 +00:00
Daniel C. Sobral
3afefa3924 Skip the route checking in the case of multicast packets with known
interfaces.

Reviewed by:	people at that channel
Approved by:	silence on -net
2001-07-17 18:47:48 +00:00
Hajimu UMEMOTO
3384154590 Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
  - The definitions of SADB_* in sys/net/pfkeyv2.h are still different
    from RFC2407/IANA assignment because of binary compatibility
    issue.  It should be fixed under 5-CURRENT.
  - ip6po_m member of struct ip6_pktopts is no longer used.  But, it
    is still there because of binary compatibility issue.  It should
    be removed under 5-CURRENT.

Reviewed by:	itojun
Obtained from:	KAME
MFC after:	3 weeks
2001-06-11 12:39:29 +00:00
Kris Kennaway
64dddc1872 Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets.
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.

Reviewed by:    -net
Obtained from:  OpenBSD
2001-06-01 10:02:28 +00:00
Ruslan Ermilov
206a3274ef RFC768 (UDP) requires that "if the computed checksum is zero, it
is transmitted as all ones".  This got broken after introduction
of delayed checksums as follows.  Some guys (including Jonathan)
think that it is allowed to transmit all ones in place of a zero
checksum for TCP the same way as for UDP.  (The discussion still
takes place on -net.)  Thus, the 0 -> 0xffff checksum fixup was
first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65)
to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18,
INVERT expression).  Besides that I disagree that it is valid for
TCP, there was no real problem until in_cksum.c,v 1.20, where the
in_cksum() was made just a special version of in_cksum_skip().
The side effect was that now every incoming IP datagram failed to
pass the checksum test (in_cksum() returned 0xffff when it should
actually return zero).  It was fixed next day in revision 1.21,
by removing the INVERT expression.  The latter also broke the
0 -> 0xffff fixup for UDP checksums.

Before this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006:  udp 0 (ttl 64, id 1)
:                          4500 001c 0001 0000 4011 7cce 7f00 0001
:                          7f00 0001 80ed 80ee 0008 0000

After this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006:  udp 0 (ttl 64, id 1)
:                          4500 001c 0001 0000 4011 7cce 7f00 0001
:                          7f00 0001 80ed 80ee 0008 ffff
2001-03-13 17:07:06 +00:00
Ian Dowse
5d936aa181 In ip_output(), initialise `ia' in the case where the packet has
come from a dummynet pipe. Without this, the code which increments
the per-ifaddr stats can dereference an uninitialised pointer. This
should make dummynet usable again.

Reported by:	"Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua>
Reviewed by:	luigi, joe
2001-03-11 17:50:19 +00:00
Jeroen Ruigrok van der Werven
05f15c3dc3 Remove conditionals for vax support.
People who care much about this are welcomed to try 2.11BSD. :)

Noticed by:	luigi
Reviewed by:	jesper
2001-02-26 20:05:32 +00:00
Poul-Henning Kamp
37d4006626 Another round of the <sys/queue.h> FOREACH transmogriffer.
Created with:   sed(1)
Reviewed by:    md5(1)
2001-02-04 16:08:18 +00:00
Poul-Henning Kamp
fc2ffbe604 Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)
2001-02-04 13:13:25 +00:00
Luigi Rizzo
507b4b5432 MFS: bridge/ipfw/dummynet fixes (bridge.c will be committed separately) 2001-02-02 00:18:00 +00:00
Luigi Rizzo
7a726a2dd1 Pass up errors returned by dummynet. The same should be done with
divert.
2001-01-25 02:06:38 +00:00