2039 Commits

Author SHA1 Message Date
rwatson
87aa99bbbb White space cleanup for netinet before branch:
- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by:	re (scottl)
Submitted by:	Xin LI <delphij@frontfree.net>
2004-08-16 18:32:07 +00:00
obrien
3558849b92 Put the 'antispoof' opcode in the proper place in the opcode list such
that it doesn't break the ipfw2 ABI.
2004-08-16 12:05:19 +00:00
dwmalone
5df13d37b2 Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD
have already done this, so I have styled the patch on their work:

        1) introduce a ip_newid() static inline function that checks
        the sysctl and then decides if it should return a sequential
        or random IP ID.

        2) named the sysctl net.inet.ip.random_id

        3) IPv6 flow IDs and fragment IDs are now always random.
        Flow IDs and frag IDs are significantly less common in the
        IPv6 world (ie. rarely generated per-packet), so there should
        be smaller performance concerns.

The sysctl defaults to 0 (sequential IP IDs).

Reviewed by:	andre, silby, mlaier, ume
Based on:	NetBSD
MFC after:	2 months
2004-08-14 15:32:40 +00:00
phk
271672aa9c Fix outgoing ICMP on global instance. 2004-08-14 14:21:09 +00:00
csjp
6661aed38d Add the ability to associate ipfw rules with a specific prison ID.
Since the only thing truly unique about a prison is it's ID, I figured
this would be the most granular way of handling this.

This commit makes the following changes:

- Adds tokenizing and parsing for the ``jail'' command line option
  to the ipfw(8) userspace utility.
- Append the ipfw opcode list with O_JAIL.
- While Iam here, add a comment informing others that if they
  want to add additional opcodes, they should append them to the end
  of the list to avoid ABI breakage.
- Add ``fw_prid'' to the ipfw ucred cache structure.
- When initializing ucred cache, if the process is jailed,
  set fw_prid to the prison ID, otherwise set it to -1.
- Update man page to reflect these changes.

This change was a strong motivator behind the ucred caching
mechanism in ipfw.

A sample usage of this new functionality could be:

    ipfw add count ip from any to any jail 2

It should be noted that because ucred based constraints
are only implemented for TCP and UDP packets, the same
applies for jail associations.

Conceptual head nod by:	pjd
Reviewed by:	rwatson
Approved by:	bmilekic (mentor)
2004-08-12 22:06:55 +00:00
dwmalone
874318f896 In tcp6_ctlinput, lock tcbinfo around the call to syncache_unreach
so that the locks held are the same as the IPv4 case.

Reviewed by:	rwatson
2004-08-12 18:19:36 +00:00
andre
6822e5677f Fix two cases of incorrect IPQ_UNLOCK'ing in the merged ip_reass() function.
The first one was going to 'dropfrag', which unlocks the IPQ, before the lock
was aquired; The second one doing a unlock and then a 'goto dropfrag' which
led to a double-unlock.

Tripped over by:	des
2004-08-12 08:37:42 +00:00
rwatson
c1da641947 When udp_send() fails, make sure to free the control mbufs as well as
the data mbuf.  This was done in most error cases, but not the case
where the inpcb pointer is surprisingly NULL.
2004-08-12 01:34:27 +00:00
andre
d87fe3ee1e Backout removal of UMA_ZONE_NOFREE flag for all zones which are established
for structures with timers in them.  It might be that a timer might fire
even when the associated structure has already been free'd.  Having type-
stable storage in this case is beneficial for graceful failure handling and
debugging.

Discussed with:	bosko, tegge, rwatson
2004-08-11 20:30:08 +00:00
andre
a6a5e26503 Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP and
TCP code.  This flag would have prevented giving back excessive free slabs
to the global pool after a transient peak usage.
2004-08-11 17:08:31 +00:00
andre
2dba36f65b Make use of in_localip() function and replace previous direct LIST_FOREACH
loops over INADDR_HASH.
2004-08-11 12:32:10 +00:00
andre
a93503bce5 Add the function in_localip() which returns 1 if an internet address is for
the local host and configured on one of its interfaces.
2004-08-11 11:49:48 +00:00
andre
957506e985 Only invoke verify_path() for verrevpath and versrcreach when we have an IP packet. 2004-08-11 11:41:11 +00:00
andre
3d16b5d93e Only check for local broadcast addresses if the mbuf is flagged with M_BCAST. 2004-08-11 10:49:56 +00:00
andre
47aa08bf94 Consistently use NULL for pointer comparisons. 2004-08-11 10:46:15 +00:00
andre
d03ce8b4a3 Make IP fastforwarding ALTQ-aware by adding the input traffic conditioner
check and disabling the early output interface queue length check.
2004-08-11 10:42:59 +00:00
andre
0ad10fdbdf Correct the displayed bandwidth calculation for a readout via sysctl. The
saved value does not have to be scaled with HZ; it is already in bytes per
second.  Only the multiply by eight remains to show bits per second (bps).
2004-08-11 10:12:16 +00:00
rwatson
4bd194b32a Assert the locks of inpcbinfo's and inpcb's passed into in_pcbconnect()
and in_pcbconnect_setup(), since these functions frob the port and
address state of inpcbs.
2004-08-11 04:35:20 +00:00
andre
e7784143d4 Make a comment that IP source routing is not SMP and PREEMPTION safe. 2004-08-09 16:17:37 +00:00
andre
1b887d9d08 Make a comment that "ipfw forward" is not SMP and PREEMPTION safe. 2004-08-09 16:16:10 +00:00
andre
649b4336f4 New ipfw option "antispoof":
For incoming packets, the packet's source address is checked if it
 belongs to a directly connected network.  If the network is directly
 connected, then the interface the packet came on in is compared to
 the interface the network is connected to.  When incoming interface
 and directly connected interface are not the same, the packet does
 not match.

Usage example:

 ipfw add deny ip from any to any not antispoof in

Manpage education by:	ru
2004-08-09 16:12:10 +00:00
rwatson
11a2e8ce3c Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead
structures, allowing in6_pcbnotify() to lock the pcbinfo and each
inpcb that it notifies of ICMPv6 events.  This prevents inpcb
assertions from firing when IPv6 generates and delievers event
notifications for inpcbs.

Reported by:	kuriyama
Tested by:	kuriyama
2004-08-06 03:45:45 +00:00
rwatson
2ce46f099e When iterating the UDP inpcb list processing an inbound broadcast
or multicast packet, we don't need to acquire the inpcb mutex
unless we are actually using inpcb fields other than the bound port
and address.  Since we hold the pcbinfo lock already, these can't
change.  Defer acquiring the inpcb mutex until we have a high
chance of a match.  This avoids about 120 mutex operations per UDP
broadcast packet received on one of my work systems.

Reviewed by:	sam
2004-08-06 02:08:31 +00:00
rwatson
bc0c491205 Now that IPv6 performs basic in6pcb and inpcb locking, enable inpcb
lock assertions even if IPv6 is compiled into the kernel.  Previously,
inclusion of IPv6 and locking assertions would result in a rapid
assertion failure as IPv6 was not properly locking inpcbs.
2004-08-04 18:27:55 +00:00
marcus
c8262f39d1 Fix Skinny and PPTP NAT'ing after the introduction of the {ip,tcp,udp}_next
functions.  Basically, the ip_next() function was used to get the PPTP and
Skinny headers when tcp_next() should have been used instead.  Symptoms of
this included a segfault in natd when trying to process a PPTP or Skinny
packet.

Approved by:	des
2004-08-04 15:17:08 +00:00
andre
3a7a087025 o Delayed checksums are now calculated in divert_packet() for diverted packets
Remove the XXX-escaped code that did it in ip_output()'s IPHACK section.
2004-08-03 14:13:36 +00:00
andre
f1492e3a5a o Move the inflight sysctls to their own sub-tree under net.inet.tcp to be
more consistent with the other sysctls around it.
2004-08-03 13:54:11 +00:00
andre
fd96163246 o Move all parts of the IP reassembly process into the function ip_reass() to
make it fully self-contained.
o ip_reass() now returns a new mbuf with the reassembled packet and ip->ip_len
  including the IP header.
o Computation of the delayed checksum is moved into divert_packet().

Reviewed by:	silby
2004-08-03 12:31:38 +00:00
hsu
4b4df6655a Fix bug with tracking the previous element in a list.
Found by:	edrt@citiz.net
Submitted by:	pavlin@icir.org
2004-08-03 02:01:44 +00:00
yar
1d71ae12e0 Disallow a particular kind of port theft described by the following scenario:
Alice is too lazy to write a server application in PF-independent
	manner.  Therefore she knocks up the server using PF_INET6 only
	and allows the IPv6 socket to accept mapped IPv4 as well.  An evil
	hacker known on IRC as cheshire_cat has an account in the same
	system.  He starts a process listening on the same port as used
	by Alice's server, but in PF_INET.  As a consequence, cheshire_cat
	will distract all IPv4 traffic supposed to go to Alice's server.

Such sort of port theft was initially enabled by copying the code that
implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for
the implied case of the same owner for both connections.  After this
change, the above scenario will be impossible.  In the same setting,
the user who attempts to start his server last will get EADDRINUSE.

Of course, using IPv4 mapped to IPv6 leads to security complications
in the first place, but there is no reason to make it even more unsafe.

This change doesn't apply to KAME since it affects a FreeBSD-specific
part of the code.  It doesn't modify the out-of-box behaviour of the
TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.

MFC after:	1 month
2004-07-28 13:03:07 +00:00
jayanth
b754d213ab Fix a bug in the sack code that was causing data to be retransmitted
with the FIN bit set for all segments, if a FIN has already been sent before.
The fix will allow the FIN bit to be set for only the last segment, in case
it has to be retransmitted.

Fix another bug that would have caused snd_nxt to be pulled by len if
there was an error from ip_output. snd_nxt should not be touched
during sack retransmissions.
2004-07-28 02:15:14 +00:00
jayanth
ef090090cb Fix for a SACK bug where the very last segment retransmitted
from the SACK scoreboard could result in the next (untransmitted)
segment to be skipped.
2004-07-26 23:41:12 +00:00
jmg
b138ddd4e5 compare pointer against NULL, not 0
when inpcb is NULL, this is no longer invalid since jlemon added the
tcp_twstart function... this prevents close "failing" w/ EINVAL when it
really was successful...

Reviewed by:	jeremy (NetBSD)
2004-07-26 21:29:56 +00:00
cperciva
d9fecc83c8 Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with:	rwatson, scottl
Requested by:	jhb
2004-07-26 07:24:04 +00:00
andre
695543e4da Extend versrcreach by checking against the rt_flags for RTF_REJECT and
RTF_BLACKHOLE as well.

To quote the submitter:

 The uRPF loose-check implementation by the industry vendors, at least on Cisco
 and possibly Juniper, will fail the check if the route of the source address
 is pointed to Null0 (on Juniper, discard or reject route). What this means is,
 even if uRPF Loose-check finds the route, if the route is pointed to blackhole,
 uRPF loose-check must fail. This allows people to utilize uRPF loose-check mode
 as a pseudo-packet-firewall without using any manual filtering configuration --
 one can simply inject a IGP or BGP prefix with next-hop set to a static route
 that directs to null/discard facility. This results in uRPF Loose-check failing
 on all packets with source addresses that are within the range of the nullroute.

Submitted by:	James Jun <james@towardex.com>
2004-07-21 19:55:14 +00:00
rwatson
4557c47e2f M_PREPEND() the IP header on to the front of an outgoing raw IP packet
using M_DONTWAIT rather than M_WAITOK to avoid sleeping on memory
while holding a mutex.
2004-07-20 20:52:30 +00:00
jayanth
3781ade946 Let IN_FASTREOCOVERY macro decide if we are in recovery mode.
Nuke sackhole_limit for now. We need to add it back to limit the total
number of sack blocks in the system.
2004-07-19 22:37:33 +00:00
jayanth
48943ed977 Fix a potential panic in the SACK code that was causing
1) data to be sent to the right of snd_recover.
2) send more data then whats in the send buffer.

The fix is to postpone sack retransmit to a subsequent recovery episode
if the current retransmit pointer is beyond snd_recover.

Thanks to Mohan Srinivasan for helping fix the bug.

Submitted by:Daniel Lang
2004-07-19 22:06:01 +00:00
dwmalone
ccfd16b40a Fix the !INET6 build.
Reported by:	alc
2004-07-17 21:40:14 +00:00
dwmalone
71eccf2cf5 The tcp syncache code was leaving the IPv6 flowlabel uninitialised
for the SYN|ACK packet and then letting in6_pcbconnect set the
flowlabel later. Arange for the syncache/syncookie code to set and
recall the flow label so that the flowlabel used for the SYN|ACK
is consistent. This is done by using some of the cookie (when tcp
cookies are enabeled) and by stashing the flowlabel in syncache.

Tested and Discovered by:	Orla McGann <orly@cnri.dit.ie>
Approved by:			ume, silby
MFC after:			1 month
2004-07-17 19:44:13 +00:00
mlaier
512e25ff0c Define semantic of M_SKIP_FIREWALL more precisely, i.e. also pass associated
icmp_error() packets. While here retire PACKET_TAG_PF_GENERATED (which
served the same purpose) and use M_SKIP_FIREWALL in pf as well. This should
speed up things a bit as we get rid of the tag allocations.

Discussed with:	juli
2004-07-17 05:10:06 +00:00
jmallett
111d2dd115 Make M_SKIP_FIREWALL a global (and semantic) flag, preventing anything from
using M_PROTO6 and possibly shooting someone's foot, as well as allowing the
firewall to be used in multiple passes, or with a packet classifier frontend,
that may need to explicitly allow a certain packet.  Presently this is handled
in the ipfw_chk code as before, though I have run with it moved to upper
layers, and possibly it should apply to ipfilter and pf as well, though this
has not been investigated.

Discussed with:	luigi, rwatson
2004-07-17 02:40:13 +00:00
ume
6418d70e35 when IN6P_AUTOFLOWLABEL is set, the flowlabel is not set on
outgoing tcp connections.

Reported by:	Orla McGann <orly@cnri.dit.ie>
Reviewed by:	Orla McGann <orly@cnri.dit.ie>
Obtained from:	KAME
2004-07-16 18:08:13 +00:00
phk
5c95d686a1 Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
stefanf
355a8ec494 Remove erroneous semicolons. 2004-07-13 16:06:19 +00:00
rwatson
9d5e898163 After each label in tcp_input(), assert the inpcbinfo and inpcb lock
state that we expect.
2004-07-12 19:28:07 +00:00
brian
aae31dbf32 Change the following environment variables to kernel options:
bootp -> BOOTP
    bootp.nfsroot -> BOOTP_NFSROOT
    bootp.nfsv3 -> BOOTP_NFSV3
    bootp.compat -> BOOTP_COMPAT
    bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit.  It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone
2004-07-08 22:35:36 +00:00
brian
2821a50eaa Change the following kernel options to environment variables:
BOOTP -> bootp
    BOOTP_NFSROOT -> bootp.nfsroot
    BOOTP_NFSV3 -> bootp.nfsv3
    BOOTP_COMPAT -> bootp.compat
    BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

    bootp="YES"
    bootp.nfsroot="YES"
    bootp.nfsv3="YES"
    bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.
2004-07-08 13:40:33 +00:00
des
9d07523073 Push WARNS back up to 6, but define NO_WERROR; I want the warts out in the
open where people can see them and hopefully fix them.
2004-07-06 12:15:24 +00:00
des
93180ebf2d Introduce inline {ip,udp,tcp}_next() functions which take a pointer to an
{ip,udp,tcp} header and return a void * pointing to the payload (i.e. the
first byte past the end of the header and any required padding).  Use them
consistently throughout libalias to a) reduce code duplication, b) improve
code legibility, c) get rid of a bunch of alignment warnings.
2004-07-06 12:13:28 +00:00