Commit Graph

2413 Commits

Author SHA1 Message Date
Andre Oppermann
fe53256dc2 Use monotonic 'time_uptime' instead of 'time_second' as timebase
for rt->rt_rmx.rmx_expire.
2005-09-19 22:54:55 +00:00
Andre Oppermann
e6b9152d20 Use monotonic 'time_uptime' instead of 'time_second' as timebase
for timeouts.
2005-09-19 22:31:45 +00:00
Robert Watson
b1c53bc9c0 Take a first cut at cleaning up ifnet removal and multicast socket
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:

- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
  protocol to perform early tear-down on the interface early in
  if_detach().

- Annotate that if_detach() needs careful consideration.

- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
  this is not the place to detect interface removal!  This also
  removes what is basically a nasty (and now unnecessary) hack.

- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
  IPv4 sockets.

It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.

MFC after:	3 days
2005-09-18 17:36:28 +00:00
Andre Oppermann
db1240661f Do not ignore all other TCP options (eg. timestamp, window scaling)
when responding to TCP SYN packets with TCP_MD5 enabled and set.

PR:		kern/82963
Submitted by:	<demizu at dd.iij4u.or.jp>
MFC after:	3 days
2005-09-14 15:06:22 +00:00
Bjoern A. Zeeb
75398603ad Fix panic when kernel compiled without INET6 by rejecting
IPv6 opcodes which are behind #if(n)def INET6 now.

PR:		kern/85826
MFC after:	3 days
2005-09-14 07:53:54 +00:00
Andre Oppermann
ffabe3dce8 In tcp_ctlinput() do not swap ip->ip_len a second time. It
has been done in icmp_input() already.

This fixes the ICMP_UNREACH_NEEDFRAG case where no MTU was
proposed in the ICMP reply.

PR:		kern/81813
Submitted by:	Vitezslav Novy <vita at fio.cz>
MFC after:	3 days
2005-09-10 07:43:29 +00:00
Gleb Smirnoff
a20e25385c - Do not hold route entry lock, when calling arprequest(). One such
call was introduced by me in 1.139, the other one was present before.
- Do all manipulations with rtentry and la before dropping the lock.
- Copy interface address from route into local variable before dropping
  the lock. Supply this copy as argument to arprequest()

LORs fixed:
		http://sources.zabbadoz.net/freebsd/lor/003.html
		http://sources.zabbadoz.net/freebsd/lor/037.html
		http://sources.zabbadoz.net/freebsd/lor/061.html
		http://sources.zabbadoz.net/freebsd/lor/062.html
		http://sources.zabbadoz.net/freebsd/lor/064.html
		http://sources.zabbadoz.net/freebsd/lor/068.html
		http://sources.zabbadoz.net/freebsd/lor/071.html
		http://sources.zabbadoz.net/freebsd/lor/074.html
		http://sources.zabbadoz.net/freebsd/lor/077.html
		http://sources.zabbadoz.net/freebsd/lor/093.html
		http://sources.zabbadoz.net/freebsd/lor/135.html
		http://sources.zabbadoz.net/freebsd/lor/140.html
		http://sources.zabbadoz.net/freebsd/lor/142.html
		http://sources.zabbadoz.net/freebsd/lor/145.html
		http://sources.zabbadoz.net/freebsd/lor/152.html
		http://sources.zabbadoz.net/freebsd/lor/158.html
2005-09-09 10:06:27 +00:00
Gleb Smirnoff
5d40d65b5a When a carp(4) interface is being destroyed and is in a promiscous mode,
first interface is detached from parent and then bpfdetach() is called.
If the interface was the last carp(4) interface attached to parent, then
the mutex on parent is destroyed. When bpfdetach() calls if_setflags()
we panic on destroyed mutex.

To prevent the above scenario, clear pointer to parent, when we detach
ourselves from parent.
2005-09-09 08:41:39 +00:00
Sam Leffler
245c31ccaf clear lock on error in O_LIMIT case of install_state
Submitted by:	Ted Unangst
MFC after:	3 days
2005-09-04 17:33:40 +00:00
Andre Oppermann
e0aec68255 Use the correct mbuf type for MGET(). 2005-08-30 16:35:27 +00:00
Gleb Smirnoff
e3ea67a077 Add newline to debuging printf.
PR:		kern/85271
Submitted by:	Simon Morgan
2005-08-26 15:27:18 +00:00
Gleb Smirnoff
360856f60e - Refuse hashsize of 0, since it is invalid.
- Use defined constant instead of 512.
2005-08-25 13:57:00 +00:00
Gleb Smirnoff
510b360fc0 When we have a published ARP entry for some IP address, do reply on
ARP requests only on the network where this IP address belong, to.

Before this change we did replied on all interfaces. This could
lead to an IP address conflict with host we are doing ARP proxy
for.

PR:		kern/75634
Reviewed by:	andre
2005-08-25 13:25:57 +00:00
Paul Saab
4d3b134633 Remove a KASSERT in the sack path that fails because of a interaction
between sack and a bug in the "bad retransmit recovery" logic. This is
a workaround, the underlying bug will be fixed later.

Submitted by:   Mohan Srinivasan, Noritoshi Demizu
2005-08-24 02:48:45 +00:00
Paul Saab
b24de0e665 Fix up the comment for MAX_SACK_BLKS.
Submitted by:	Noritoshi Demizu
2005-08-24 02:47:16 +00:00
Andre Oppermann
ef8fd90476 Remove unnecessary IPSEC includes.
MFC after:	2 weeks
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-08-23 14:42:40 +00:00
Andre Oppermann
23655387e9 o Fix a logic error when not doing mbuf cluster allocation.
o Change an old panic() to a clean function exit.

MFC after:	2 weeks
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-08-22 22:13:41 +00:00
Andre Oppermann
936cd18dad Add socketoption IP_MINTTL. May be used to set the minimum acceptable
TTL a packet must have when received on a socket.  All packets with a
lower TTL are silently dropped.  Works on already connected/connecting
and listening sockets for RAW/UDP/TCP.

This option is only really useful when set to 255 preventing packets
from outside the directly connected networks reaching local listeners
on sockets.

Allows userland implementation of 'The Generalized TTL Security Mechanism
(GTSM)' according to RFC3682.  Examples of such use include the Cisco IOS
BGP implementation command "neighbor ttl-security".

MFC after:	2 weeks
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-08-22 16:13:08 +00:00
Andre Oppermann
6b773dff30 Always quote the entire TCP header when responding and allocate an mbuf
cluster if needed.

Fixes the TCP issues raised in I-D draft-gont-icmp-payload-00.txt.

This aids in-the-wild debugging a lot and allows the receiver to do
more elaborate checks on the validity of the response.

MFC after:	2 weeks
Sponsored by:	TCP/IP Optimization Fundraise 2005
2005-08-22 14:12:18 +00:00
Andre Oppermann
d56ea155bd Handle pure layer 2 broad- and multicasts properly and simplify related
checks.

PR:		kern/85052
Submitted by:	Dmitrij Tejblum <tejblum at yandex-team.ru>
MFC after:	3 days
2005-08-22 12:06:26 +00:00
Andre Oppermann
bb10780f9f Commit correct version of the change and note the name of the new
sysctl: net.inet.icmp.quotelen and defaults to 8 bytes.

Pointy hat to:	andre
2005-08-21 15:18:00 +00:00
Andre Oppermann
e875dfb826 Add a sysctl to change to length of the quotation of the original
packet in an ICMP reply.  The minimum of 8 bytes is internally
enforced.  The maximum quotation is the remaining space in the
reply mbuf.

This option is added in response to the issues raised in I-D
draft-gont-icmp-payload-00.txt.

MFC after:	2 weeks
Spnsored by:	TCP/IP Optimizations Fundraise 2005
2005-08-21 15:09:07 +00:00
Andre Oppermann
a0866c8d4e Add an option to have ICMP replies to non-local packets generated with
the IP address the packet came through in.  This is useful for routers
to show in traceroutes the actual path a packet has taken instead of
the possibly different return path.

The new sysctl is named net.inet.icmp.reply_from_interface and defaults
to off.

MFC after:	2 weeks
2005-08-21 12:29:39 +00:00
Gleb Smirnoff
1ae954096e In order to support CARP interfaces kernel was taught to handle more
than one interface in one subnet. However, some userland apps rely on
the believe that this configuration is impossible.

Add a sysctl switch net.inet.ip.same_prefix_carp_only. If the switch
is on, then kernel will refuse to add an additional interface to
already connected subnet unless the interface is CARP. Default
value is off.

PR:			bin/82306
In collaboration with:	mlaier
2005-08-18 10:34:30 +00:00
Bjoern A. Zeeb
bd2e5495d1 Fix broken build of rev. 1.108 in case of no INET6 and IPFIREWALL
compiled into kernel.

Spotted and tested by:	Michal Mertl <mime at traveller.cz>
2005-08-14 18:20:33 +00:00
Bjoern A. Zeeb
9066356ba1 * Add dynamic sysctl for net.inet6.ip6.fw.
* Correct handling of IPv6 Extension Headers.
* Add unreach6 code.
* Add logging for IPv6.

Submitted by:	sysctl handling derived from patch from ume needed for ip6fw
Obtained from:	is_icmp6_query and send_reject6 derived from similar
		functions of netinet6,ip6fw
Reviewed by:	ume, gnn; silence on ipfw@
Test setup provided by: CK Software GmbH
MFC after:	6 days
2005-08-13 11:02:34 +00:00
Craig Rodrigues
eee9fe3078 Add NATM_LOCK() and NATM_UNLOCK() in places where npcb_add() and
npcb_free() are called, in order to eliminate witness panics.
This was overlooked in removal of GIANT from ATM.

Reviewed by: rwatson
2005-08-12 02:38:20 +00:00
Gleb Smirnoff
1ed7bf1e3b o Fix a race between three threads: output path,
incoming ARP packet and route request adding/removing
  ARP entries. The root of the problem is that
  struct llinfo_arp was accessed without any locks.
  To close race we will use locking provided by
  rtentry, that references this llinfo_arp:
  - Make arplookup() return a locked rtentry.
  - In arpresolve() hold the lock provided by
    rt_check()/arplookup() until the end of function,
    covering all accesses to the rtentry itself and
    llinfo_arp it refers to.
  - In in_arpinput() do not drop lock provided by
    arplookup() during first part of the function.
  - Simplify logic in the first part of in_arpinput(),
    removing one level of indentation.
  - In the second part of in_arpinput() hold rtentry
    lock while copying address.

o Fix a condition when route entry is destroyed, while
  another thread is contested on its lock:
  - When storing a pointer to rtentry in llinfo_arp list,
    always add a reference to this rtentry, to prevent
    rtentry being destroyed via RTM_DELETE request.
  - Remove this reference when removing entry from
    llinfo_arp list.

o Further cleanup of arptimer():
  - Inline arptfree() into arptimer().
  - Use official queue(3) way to pass LIST.
  - Hold rtentry lock while reading its structure.
  - Do not check that sdl_family is AF_LINK, but
    assert this.

Reviewed by:	sam
Stress test:	http://www.holm.cc/stress/log/cons141.html
Stress test:	http://people.freebsd.org/~pho/stress/log/cons144.html
2005-08-11 08:25:48 +00:00
David E. O'Brien
c11ba30c9a Remove public declarations of variables that were forgotten when they were
made static.
2005-08-10 07:10:02 +00:00
David E. O'Brien
31793d594b Match IPv6 and use a static struct pr_usrreqs nousrreqs. 2005-08-10 06:41:04 +00:00
Robert Watson
a2dc1f5021 Add helper function ip_findmoptions(), which accepts an inpcb, and attempts
to atomically return either an existing set of IP multicast options for the
PCB, or a newlly allocated set with default values.  The inpcb is returned
locked.  This function may sleep.

Call ip_moptions() to acquire a reference to a PCB's socket options, and
perform the update of the options while holding the PCB lock.  Release the
lock before returning.

Remove garbage collection of multicast options when values return to the
default, as this complicates locking substantially.  Most applications
allocate a socket either to be multicast, or not, and don't tend to keep
around sockets that have previously been used for multicast, then used for
unicast.

This closes a number of race conditions involving multiple threads or
processes modifying the IP multicast state of a socket simultaenously.

MFC after:	7 days
2005-08-09 17:19:21 +00:00
Robert Watson
13f4c340ae Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
Gleb Smirnoff
9bd8ca3014 In preparation for fixing races in ARP (and probably in other
L2/L3 mappings) make rt_check() return a locked rtentry.
2005-08-09 08:39:56 +00:00
Robert Watson
dd5a318ba3 Introduce in_multi_mtx, which will protect IPv4-layer multicast address
lists, as well as accessor macros.  For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.

For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.

Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().

Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.

Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.

Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.

Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		10 days
2005-08-03 19:29:47 +00:00
Robert Watson
bccb41014a Modify network protocol consumers of the ifnet multicast address lists
to lock if_addr_mtx.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		1 week
2005-08-02 23:51:22 +00:00
Hajimu UMEMOTO
4dad226e45 recover the line which was wrongly disappeared during scope cleanup.
tcpdrop(8) should work for IPv6, again.
2005-08-01 12:08:49 +00:00
Bjoern A. Zeeb
9e669156d4 Add support for IPv6 over GRE [1]. PR kern/80340 includes the
FreeBSD specific ip_newid() changes NetBSD does not have.
Correct handling of non AF_INET packets passed to bpf [2].

PR:		kern/80340[1], NetBSD PRs 29150[1], 30844[2]
Obtained from:	NetBSD ip_gre.c rev. 1.34,1.35, if_gre.c rev. 1.56
Submitted by:	Gert Doering <gert at greenie.muc.de>[2]
MFC after:	4 days
2005-08-01 08:14:21 +00:00
Hajimu UMEMOTO
c85ed85b1c include scope6_var.h for in6_clearscope(). 2005-07-26 00:19:58 +00:00
Hajimu UMEMOTO
29da8af658 include netinet6/scope6_var.h. 2005-07-25 12:36:43 +00:00
Hajimu UMEMOTO
a1f7e5f8ee scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
  scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
  scoped addresses as a special case.
- scope boundary check will be stricter.  For example, the current
  *BSD code allows a packet with src=::1 and dst=(some global IPv6
  address) to be sent outside of the node, if the application do:
    s = socket(AF_INET6);
    bind(s, "::1");
    sendto(s, some_global_IPv6_addr);
  This is clearly wrong, since ::1 is only meaningful within a single
  node, but the current implementation of the *BSD kernel cannot
  reject this attempt.

Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from:	KAME
2005-07-25 12:31:43 +00:00
Giorgos Keramidas
a09ad79379 Misc spelling and/or English fixes in comments.
Reviewed by:	glebius, andre
2005-07-23 00:59:13 +00:00
Hajimu UMEMOTO
6c4eaa873f move RFC3542 related definitions into ip6.h.
Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Reviewed by:	mlaier
Obtained from:	KAME
2005-07-20 10:30:52 +00:00
Hajimu UMEMOTO
77b6f9ed40 add missing RFC3542 definition.
Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-20 09:17:41 +00:00
Hajimu UMEMOTO
18b35df8fe update comments:
- RFC2292bis -> RFC3542
  - typo fixes

Submitted by:	Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from:	KAME
2005-07-20 08:59:45 +00:00
Robert Watson
de35559f82 Remove no-op spl references in in_pcb.c, since in_pcb locking has been
basically complete for several years now.  Update one spl comment to
reference the locking strategy.

MFC after:	3 days
2005-07-19 12:24:27 +00:00
Robert Watson
f59a9ebf10 Remove no-op spl's and most comment references to spls, as TCP locking
is believed to be basically done (modulo any remaining bugs).

MFC after:	3 days
2005-07-19 12:21:26 +00:00
Robert Watson
b77634d046 Remove spl() calls from ip_slowtimo(), as IP fragment queue locking was
merged several years ago.

Submitted by:	gnn
MFC after:	1 day
2005-07-19 12:14:22 +00:00
Max Laier
6de8d9dc52 Export pfsyncstats via sysctl "net.inet.pfsync" in order to print them with
netstat (seperate commit).

Requested by:	glebius
MFC after:	1 week
2005-07-14 22:22:51 +00:00
Robert Watson
3c308b091f Eliminate MAC entry point mac_create_mbuf_from_mbuf(), which is
redundant with respect to existing mbuf copy label routines.  Expose
a new mac_copy_mbuf() routine at the top end of the Framework and
use that; use the existing mpo_copy_mbuf_label() routine on the
bottom end.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA, SPAWAR
Approved by:	re (scottl)
2005-07-05 23:39:51 +00:00
Paul Saab
d758711729 Fix for a bug in newreno partial ack handling where if a large amount
of data is partial acked, snd_cwnd underflows, causing a burst.

Found, Submitted by:	Noritoshi Demizu
Approved by:		re
2005-07-05 19:23:02 +00:00
Max Laier
b4373150d9 Remove ambiguity from hlen. IPv4 is now indicated by is_ipv4 and we need a
proper hlen value for IPv6 to implement O_REJECT and O_LOG.

Reviewed by:	glebius, brooks, gnn
Approved by:	re (scottl)
2005-07-03 15:42:22 +00:00
Andrew Thompson
2fcb030ad5 Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

 IP_HDR_ALIGNED_P()
 IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR:		ia64/81284
Obtained from:	NetBSD
Approved by:	re (dwhite), mlaier (mentor)
2005-07-02 23:13:31 +00:00
Paul Saab
482ac96888 Fix for a bug in the change that defers sack option processing until
after PAWS checks. The symptom of this is an inconsistency in the cached
sack state, caused by the fact that the sack scoreboard was not being
updated for an ACK handled in the header prediction path.

Found by:	Andrey Chernov.
Submitted by:	Noritoshi Demizu, Raja Mukerji.
Approved by:	re
2005-07-01 22:54:18 +00:00
Paul Saab
69e0362019 Fix for a SACK crash caused by a bug in tcp_reass(). tcp_reass()
does not clear tlen and frees the mbuf (leaving th pointing at
freed memory), if the data segment is a complete duplicate.
This change works around that bug. A fix for the tcp_reass() bug
will appear later (that bug is benign for now, as neither th nor
tlen is referenced in tcp_input() after the call to tcp_reass()).

Found by:	Pawel Jakub Dawidek.
Submitted by:	Raja Mukerji, Noritoshi Demizu.
Approved by:	re
2005-07-01 22:52:46 +00:00
Gleb Smirnoff
a196a3c8aa When doing ARP load balancing source IP is taken in network byte order,
so residue of division for all hosts on net is the same, and thus only
one VHID answers. Change source IP in host byte order.

Reviewed by:	mlaier
Approved by:	re (scottl)
2005-07-01 08:22:13 +00:00
Simon L. B. Nielsen
0a389eab22 Fix ipfw packet matching errors with address tables.
The ipfw tables lookup code caches the result of the last query.  The
kernel may process multiple packets concurrently, performing several
concurrent table lookups.  Due to an insufficient locking, a cached
result can become corrupted that could cause some addresses to be
incorrectly matched against a lookup table.

Submitted by:	ru
Reviewed by:	csjp, mlaier
Security:	CAN-2005-2019
Security:	FreeBSD-SA-05:13.ipfw

Correct bzip2 permission race condition vulnerability.

Obtained from:	Steve Grubb via RedHat
Security:	CAN-2005-0953
Security:	FreeBSD-SA-05:14.bzip2
Approved by:	obrien

Correct TCP connection stall denial of service vulnerability.

A TCP packets with the SYN flag set is accepted for established
connections, allowing an attacker to overwrite certain TCP options.

Submitted by:	Noritoshi Demizu
Reviewed by:	andre, Mohan Srinivasan
Security:	CAN-2005-2068
Security:	FreeBSD-SA-05:15.tcp

Approved by:	re (security blanket), cperciva
2005-06-29 21:36:49 +00:00
Paul Saab
5a53ca1627 - Postpone SACK option processing until after PAWS checks. SACK option
processing is now done in the ACK processing case.
- Merge tcp_sack_option() and tcp_del_sackholes() into a new function
  called tcp_sack_doack().
- Test (SEG.ACK < SND.MAX) before processing the ACK.

Submitted by:	Noritoshi Demizu
Reveiewed by:	Mohan Srinivasan, Raja Mukerji
Approved by:	re
2005-06-27 22:27:42 +00:00
Poul-Henning Kamp
dca9c930da Libalias incorrectly applies proxy rules to the global divert
socket: it should only look for existing translation entries,
not create new ones (no matter how it got the idea).

Approved by:	re(scottl)
2005-06-27 22:21:42 +00:00
Gleb Smirnoff
59dde15e82 Disable checksum processing in LibAlias, when it works as a
kernel module. LibAlias is not aware about checksum offloading,
so the caller should provide checksum calculation. (The only
current consumer is ng_nat(4)). When TCP packet internals has
been changed and it requires checksum recalculation, a cookie
is set in th_x2 field of TCP packet, to inform caller that it
needs to recalculate checksum. This ugly hack would be removed
when LibAlias is made more kernel friendly.

Incremental checksum updates are left as is, since they don't
conflict with offloading.

Approved by:	re (scottl)
2005-06-27 07:36:02 +00:00
David Malone
01399f34a5 Fix some long standing bugs in writing to the BPF device attached to
a DLT_NULL interface. In particular:

        1) Consistently use type u_int32_t for the header of a
           DLT_NULL device - it continues to represent the address
           family as always.
        2) In the DLT_NULL case get bpf_movein to store the u_int32_t
           in a sockaddr rather than in the mbuf, to be consistent
           with all the DLT types.
        3) Consequently fix a bug in bpf_movein/bpfwrite which
           only permitted packets up to 4 bytes less than the MTU
           to be written.
        4) Fix all DLT_NULL devices to have the code required to
           allow writing to their bpf devices.
        5) Move the code to allow writing to if_lo from if_simloop
           to looutput, because it only applies to DLT_NULL devices
           but was being applied to other devices that use if_simloop
           possibly incorrectly.

PR:		82157
Submitted by:	Matthew Luckie <mjl@luckie.org.nz>
Approved by:	re (scottl)
2005-06-26 18:11:11 +00:00
Stephan Uphoff
68d376254c Fix a timer ticks wrap around bug for minmssoverload processing.
Approved by:	re (scottl,dwhite)
MFC after:	4 weeks
2005-06-25 22:24:45 +00:00
Warner Losh
d980b05275 Add back missing copyright and license statement. This is identical
to the statement in ip_mroute.h, as well as being the same as what
OpenBSD has done with this file.  It matches the copyright in NetBSD's
1.1 through 1.14 versions of the file as well, which they subsequently
added back.

It appears to have been lost in the 4.4-lite1 import for FreeBSD 2.0,
but where and why I've not investigated further.  OpenBSD had the same
problem.  NetBSD had a copyright notice until Multicast 3.5 was
integrated verbatim back in 1995.  This appears to be the version that
made it into 4.4-lite1.

Approved by: re (scottl)
MFC after: 3 days
2005-06-23 18:42:58 +00:00
Paul Saab
9004ded9df Fix for a bug in tcp_sack_option() causing crashes.
Submitted by:	Noritoshi Demizu, Mohan Srinivasan.
Approved by:	re (scottl blanket SACK)
2005-06-23 00:18:54 +00:00
Bjoern A. Zeeb
67df9f3896 Fix IP(v6) over IP tunneling most likely broken with ifnet changes.
Reviewed by:	gnn
Approved by:	re (dwhite), rwatson (mentor)
2005-06-20 08:39:30 +00:00
Gleb Smirnoff
72f2d6578c - Don't use legacy function in a non-legacy one. This gives us
possibility to compile libalias without legacy support.
- Use correct way to mark variable as unused.

Approved by:	re (dwhite)
2005-06-20 08:31:48 +00:00
Max Laier
e4c959952b In verify_rev_path6():
- do not use static memory as we are under a shared lock only
 - properly rtfree routes allocated with rtalloc
 - rename to verify_path6()
 - implement the full functionality of the IPv4 version

Also make O_ANTISPOOF work with IPv6.

Reviewed by:	gnn
Approved by:	re (blanket)
2005-06-16 14:55:58 +00:00
Max Laier
ad7abe197d Fix indentation in INET6 section in preperation of more serious work.
Approved by:	re (blanket ip6fw removal)
2005-06-16 13:20:36 +00:00
Max Laier
cf21d53cbf When doing matching based on dst_ip/src_ip make sure we are really looking
on an IPv4 packet as these variables are uninitialized if not.  This used to
allow arbitrary IPv6 packets depending on the value in the uninitialized
variables.

Some opcodes (most noteably O_REJECT) do not support IPv6 at all right now.

Reviewed by:	brooks, glebius
Security:	IPFW might pass IPv6 packets depending on stack contents.
Approved by:	re (blanket)
2005-06-12 16:27:10 +00:00
Brooks Davis
fc74a9f93a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
Brian Feldman
b34d56f1ef Modify send_pkt() to return the generated packet and have the caller
do the subsequent ip_output() in IPFW.  In ipfw_tick(), the keep-alive
packets must be generated from the data that resides under the
stateful lock, but they must not be sent at that time, as this would
cause a lock order reversal with the normal ordering (interface's
lock, then locks belonging to the pfil hooks).

In practice, this caused deadlocks when using IPFW and if_bridge(4)
together to do stateful transparent filtering.

MFC after: 1 week
2005-06-10 12:28:17 +00:00
Andrew Thompson
c8b0129238 Add dummynet(4) support to if_bridge, this code is largely based on bridge.c.
This is the final piece to match bridge.c in functionality, we can now be a
drop-in replacement.

Approved by:	mlaier (mentor)
2005-06-10 01:25:22 +00:00
Paul Saab
e912f906d0 Fix a mis-merge. Remove a redundant call to tcp_sackhole_insert
Submitted by:	Mohan Srinivasan
2005-06-09 17:55:29 +00:00
Paul Saab
8b9bbaaa94 Fix for a crash in tcp_sack_option() caused by hitting the limit on
the number of sack holes.

Reported by:	Andrey Chernov
Submitted by:	Noritoshi Demizu
Reviewed by:	Raja Mukerji
2005-06-09 14:01:04 +00:00
Paul Saab
db4b83fe49 Fix for a bug in the change that walks the scoreboard backwards from
the tail (in tcp_sack_option()). The bug was caused by incorrect
accounting of the retransmitted bytes in the sackhint.

Reported by:    Kris Kennaway.
Submitted by:   Noritoshi Demizu.
2005-06-06 19:46:53 +00:00
Andrew Thompson
8f86751705 Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by:	mlaier (mentor)
Obtained from:	NetBSD
2005-06-05 03:13:13 +00:00
Brian Feldman
5278d40bcc Better explain, then actually implement the IPFW ALTQ-rule first-match
policy.  It may be used to provide more detailed classification of
traffic without actually having to decide its fate at the time of
classification.

MFC after:	1 week
2005-06-04 19:04:31 +00:00
Paul Saab
9d17a7a64a Changes to tcp_sack_option() that
- Walks the scoreboard backwards from the tail to reduce the number of
  comparisons for each sack option received.
- Introduce functions to add/remove sack scoreboard elements, making
  the code more readable.

Submitted by:   Noritoshi Demizu
Reviewed by:    Raja Mukerji, Mohan Srinivasan
2005-06-04 08:03:28 +00:00
Max Laier
57cd6d263b Add support for IPv4 only rules to IPFW2 now that it supports IPv6 as well.
This is the last requirement before we can retire ip6fw.

Reviewed by:	dwhite, brooks(earlier version)
Submitted by:	dwhite (manpage)
Silence from:	-ipfw
2005-06-03 01:10:28 +00:00
Ian Dowse
ba5da2a06f Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface
if_ioctl routine. This should fix a number of code paths through
soo_ioctl() that could call into Giant-locked network drivers without
first acquiring Giant.
2005-06-02 00:04:08 +00:00
Robert Watson
303939942c When aborting tcp_attach() due to a problem allocating or attaching the
tcpcb, lock the inpcb before calling in_pcbdetach() or in6_pcbdetach(),
as they expect the inpcb to be passed locked.

MFC after:	7 days
2005-06-01 12:14:56 +00:00
Robert Watson
e6e0b5ffd1 Assert tcbinfo lock, inpcb lock in tcp_disconnect().
Assert tcbinfo lock, inpcb lock in in tcp_usrclosed().

MFC after:	7 days
2005-06-01 12:08:15 +00:00
Robert Watson
e3d5315d01 Assert tcbinfo lock in tcp_drop() due to its call of tcp_close()
Assert tcbinfo lock in tcp_close() due to its call to in{,6}_detach()
Assert tcbinfo lock in tcp_drop_syn_sent() due to its call to tcp_drop()

MFC after:	7 days
2005-06-01 12:06:07 +00:00
Robert Watson
1e2d989d0d Assert that tcbinfo is locked in tcp_input() before calling into
tcp_drop().

MFC after:	7 days
2005-06-01 12:03:18 +00:00
Robert Watson
416738a781 Assert the tcbinfo lock whenever tcp_close() is to be called by
tcp_input().

MFC after:	7 days
2005-06-01 11:49:14 +00:00
Robert Watson
7609aad7d9 Assert tcbinfo lock in tcp_attach(), as it is required; the caller
(tcp_usr_attach()) currently grabs it.

MFC after:	7 days
2005-06-01 11:44:43 +00:00
Robert Watson
fe6bfc3730 Commit correct version of previous commit (in_pcb.c:1.164). Use the
local variables as currently named.

MFC after:	7 days
2005-06-01 11:43:39 +00:00
Robert Watson
6b348152be Assert pcbinfo lock in in_pcbdisconnect() and in_pcbdetach(), as the
global pcb lists are modified.

MFC after:	7 days
2005-06-01 11:39:42 +00:00
Robert Watson
3ca1570c82 Slight white space tweak.
MFC after:	7 days
2005-06-01 11:38:35 +00:00
Robert Watson
277afaff66 De-spl UDP.
MFC after:	3 days
2005-06-01 11:24:00 +00:00
Seigo Tanimura
29ea671b36 Let OSPFv3 go through ipfw. Some more additional checks would be
desirable, though.
2005-05-28 07:46:44 +00:00
Paul Saab
808f11b768 This is conform with the terminology in
M.Mathis and J.Mahdavi,
  "Forward Acknowledgement: Refining TCP Congestion Control"
  SIGCOMM'96, August 1996.

Submitted by:   Noritoshi Demizu, Raja Mukerji
2005-05-25 17:55:27 +00:00
Paul Saab
64b5fbaa04 Rewrite of tcp_sack_option(). Kentaro Kurahone (NetBSD) pointed out
that if we sort the incoming SACK blocks, we can update the scoreboard
in one pass of the scoreboard. The added overhead of sorting upto 4
sack blocks is much lower than traversing (potentially) large
scoreboards multiple times. The code was updating the scoreboard with
multiple passes over it (once for each sack option). The rewrite fixes
that, reducing the complexity of the main loop from O(n^2) to O(n).

Submitted by:   Mohan Srinivasan, Noritoshi Demizu.
Reviewed by:    Raja Mukerji.
2005-05-23 19:22:48 +00:00
Paul Saab
2cdbfa66ee Replace t_force with a t_flag (TF_FORCEDATA).
Submitted by:   Raja Mukerji.
Reviewed by:    Mohan, Silby, Andre Opperman.
2005-05-21 00:38:29 +00:00
Paul Saab
4fc5324557 Introduce routines to alloc/free sack holes. This cleans up the code
considerably.

Submitted by:   Noritoshi Demizu.
Reviewed by:    Raja Mukerji, Mohan Srinivasan.
2005-05-16 19:26:46 +00:00
Gleb Smirnoff
32247f8629 - When carp interface is destroyed, and it affects global preemption
suppresion counter, decrease the latter. [1]
- Add sysctl to monitor preemption suppression.

PR:		kern/80972 [1]
Submitted by:	Frank Volf [1]
MFC after:	1 week
2005-05-15 01:44:26 +00:00
Paul Saab
fdace17f81 Fix for a bug where the "nexthole" sack hint is out of sync with the
real next hole to retransmit from the scoreboard, caused by a bug
which did not update the "nexthole" hint in one case in
tcp_sack_option().

Reported by:    Daniel Eriksson
Submitted by:   Mohan Srinivasan
2005-05-13 18:02:02 +00:00
Gleb Smirnoff
b3cf6808ce In div_output() explicitly set m->m_nextpkt to NULL. If divert socket
is not userland, but ng_ksocket, then m->m_nextpkt may be non-NULL. In
this case we would panic in sbappend.
2005-05-13 11:44:37 +00:00
Paul Saab
0077b0163f When looking for the next hole to retransmit from the scoreboard,
or to compute the total retransmitted bytes in this sack recovery
episode, the scoreboard is traversed. While in sack recovery, this
traversal occurs on every call to tcp_output(), every dupack and
every partial ack. The scoreboard could potentially get quite large,
making this traversal expensive.

This change optimizes this by storing hints (for the next hole to
retransmit and the total retransmitted bytes in this sack recovery
episode) reducing the complexity to find these values from O(n) to
constant time.

The debug code that sanity checks the hints against the computed
value will be removed eventually.

Submitted by:   Mohan Srinivasan, Noritoshi Demizu, Raja Mukerji.
2005-05-11 21:37:42 +00:00
Colin Percival
fe2eee8231 Fix two issues which were missed in FreeBSD-SA-05:08.kmem.
Reported by:	Uwe Doering
2005-05-07 00:41:36 +00:00
Gleb Smirnoff
cbfbc555e0 Add a workaround for 64-bit archs: store unsigned long return value in
temporary variable, check it and then cast to in_addr_t.
2005-05-06 13:01:31 +00:00