problems reported recently (the rtentry pointer in the dummynet
queue was not initialized in all cases, resulting in spurious
rt_refcnt decreases in the lucky cases, and memory trashing in
other cases.
have all fields in network order, whereas ipfw expects some to be
in host order. This resulted in some incorrect matching, e.g. some
packets being identified as fragments, or bandwidth not being
correctly enforced.
NOTE: this only affects bridge+ipfw, normal ipfw usage was already
correct).
Reported-By: Dave Alden and others.
Add bounds checking to netbios NS packet resolving code. This should
prevent natd from crashing on badly formed netbios packets (as might be
heard when the machine is sitting on a cable modem or certain DSL
networks), and also closes potential security holes that might have
exploited the lack of bounds checking in the previous version of the
code.
If timer calculation results in degenerate value (0), force it to 1
to avoid divide-by-zero panic later on in calls to IGMP_RANDOM_DELAY().
I considered simply adding 1 to the timer calculation, but was unsure
if the calculation was part of the IGMP standard or not so did not want
to mess with it for all cases.
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>
option not defined the sysctl int value is set to -1 and read-only.
#ifdef KERNEL's added appropriately to wall off visibility of kernel
routines from user code.
Add ICMP_BANDLIM option and 'net.inet.icmp.icmplim' sysctl. If option
is specified in kernel config, icmplim defaults to 100 pps. Setting it
to 0 will disable the feature. This feature limits ICMP error responses
for packets sent to bad tcp or udp ports, which does a lot to help the
machine handle network D.O.S. attacks.
The kernel will report packet rates that exceed the limit at a rate of
one kernel printf per second. There is one issue in regards to the
'tail end' of an attack... the kernel will not output the last report
until some unrelated and valid icmp error packet is return at some
point after the attack is over. This is a minor reporting issue only.
when a TCP "stealth" scan is directed at a *BSD box by ensuring the window
is 0 for all RST packets generated through tcp_respond()
Reviewed by: Don Lewis <Don.Lewis@tsc.tdk.com>
Obtained from: Bugtraq (from: Darren Reed <avalon@COOMBS.ANU.EDU.AU>)
This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
- Don't bother checking for conflicting sockets if we're binding to a
multicast address.
- Don't return an error if we're binding to INADDR_ANY, the conflicting
socket is bound to INADDR_ANY, and the conflicting socket has SO_REUSEPORT
set.
PR: kern/7713
addresses by default.
Add a knob "icmp_bmcastecho" to "rc.network" to allow this
behaviour to be controlled from "rc.conf".
Document the controlling sysctl variable "net.inet.icmp.bmcastecho"
in sysctl(3).
Reviewed by: dg, jkh
Reminded on -hackers by: Steinar Haug <sthaug@nethelp.no>
4.1.4. Experimental Protocol
A system should not implement an experimental protocol unless it
is participating in the experiment and has coordinated its use of
the protocol with the developer of the protocol.
Pointed out by: Steinar Haug <sthaug@nethelp.no>
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
several new features are added:
- support vc/vp shaping
- support pvc shadow interface
code cleanup:
- remove WMAYBE related code. ENI WMAYBE DMA doen't work.
- remove updating if_lastchange for every packet.
- BPF related code is moved to midway.c as it should be.
(bpfwrite should work if atm_pseudohdr and LLC/SNAP are
prepended.)
- BPF link type is changed to DLT_ATM_RFC1483.
BPF now understands only LLC/SNAP!! (because bpf can't
handle variable link header length.)
It is recommended to use LLC/SNAP instead of NULL
encapsulation for various reasons. (BPF, IPv6,
interoperability, etc.)
the code has been used for months in ALTQ and KAME IPv6.
OKed by phk long time ago.
`len = min(so->so_snd.sb_cc, win) - off;'. min() has type u_int
and `off' has type int, so when min() is 0 and `off' is 1, the RHS
overflows to 0U - 1 = UINT_MAX. `len' has type long, so when
sizeof(long) == sizeof(int), the LHS normally overflows to to the
correct value of -1, but when sizeof(long) > sizeof(int), the LHS
is UINT_MAX.
Fixed some u_long's that should have been fixed-sized types.
mismatches exposed by this (the prototype for tcp_respond() didn't
match the function definition lexically, and still depends on a
gcc feature to match if ints have more than 32 bits).
Any packet that can be matched by a ipfw rule can be redirected
transparently to another port or machine. Redirection to another port
mostly makes sense with tcp, where a session can be set up
between a proxy and an unsuspecting client. Redirection to another machine
requires that the other machine also be expecting to receive the forwarded
packets, as their headers will not have been modified.
/sbin/ipfw must be recompiled!!!
Reviewed by: Peter Wemm <peter@freebsd.org>
Submitted by: Chrisy Luke <chrisy@flix.net>
Remove lots'o'hacks.
looutput is now static.
Other callers who want to use loopback to allow shortcutting
should call the special entrypoint for this, if_simloop(), which is
specifically designed for this purpose. Using looutput for this purpose
was problematic, particularly with bpf and trying to keep track
of whether one should be using the charateristics of the loopback interface
or the interface (e.g. if_ethersubr.c) that was requesting the loopback.
There was a whole class of errors due to this mis-use each of which had
hacks to cover them up.
Consists largly of hack removal :-)
or unsigned int (this doesn't change the struct layout, size or
alignment in any of the files changed in this commit, at least for
gcc on i386's. Using bitfields of type u_char may affect size and
alignment but not packing)).
ifdef tangle. The previous commit to ip_fil.h didn't change the
one that actually applies to the current FreeBSD kernel, of course.
Fixed.
Fixed style bugs in previous commit to ip_fil.h.
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
so that the new behaviour is now default.
Solves the "infinite loop in diversion" problem when more than one diversion
is active.
Man page changes follow.
The new code is in -stable as the NON default option.
net.inet.ip.icmp.bmcastecho = 0 by removing the extra check for the
address being a multicast address. The test now relies on the link
layer flags that indicate it was received via multicast. The previous
logic was broken and replied to ICMP echo/timestamp broadcasts even
when the sysctl option disallowed them.
Reviewed by: wollman
Prior to this change, Accidental recursion protection was done by
the diverted daemon feeding back the divert port number it got
the packet on, as the port number on a sendto(). IPFW knew not to
redivert a packet to this port (again). Processing of the ruleset
started at the beginning again, skipping that divert port.
The new semantic (which is how we should have done it the first time)
is that the port number in the sendto() is the rule number AFTER which
processing should restart, and on a recvfrom(), the port number is the
rule number which caused the diversion. This is much more flexible,
and also more intuitive. If the user uses the same sockaddr received
when resending, processing resumes at the rule number following that
that caused the diversion. The user can however select to resume rule
processing at any rule. (0 is restart at the beginning)
To enable the new code use
option IPFW_DIVERT_RESTART
This should become the default as soon as people have looked at it a bit
passed to the user process for incoming packets. When the sockaddr_in
is passed back to the divert socket later, use thi sas the primary
interface lookup and only revert to the IP address when the name fails.
This solves a long standing bug with divert sockets:
When two interfaces had the same address (P2P for example) the interface
"assigned" to the reinjected packet was sometimes incorect.
Probably we should define a "sockaddr_div" to officially hold this
extended information in teh same manner as sockaddr_dl.
of the TCP payload. See RFC1122 section 4.2.2.6 . This allows
Path MTU discovery to be used along with IP options.
PR: problem discovered by Kevin Lahey <kml@nas.nasa.gov>
so it must be adjusted (minus 1) before using it to do the length check.
I'm not sure who to give the credit to, but the bug was reported by
Jennifer Dawn Myers <jdm@enteract.com>, who also supplied a patch. It
was also fixed in OpenBSD previously by andreas.gunnarsson@emw.ericsson.se,
and of course I did the homework to verify that the fix was correct per
the specification.
PR: 6738
NetBSD, ported to FreeBSD by Pierre Beyssac <pb@fasterix.freenix.org> and
minorly tweaked by me.
This is a standard part of FreeBSD, but must be enabled with:
"sysctl -w net.inet.ip.fastforwarding=1" ...and of course forwarding must
also be enabled. This should probably be modified to use the zone
allocator for speed and space efficiency. The current algorithm also
appears to lose if the number of active paths exceeds IPFLOW_MAX (256),
in which case it wastes lots of time trying to figure out which cache
entry to drop.
Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.
Convert PF_LOCAL PCB structures to be type-stable and add a version number.
Define an external format for infomation about socket structures and use
it in several places.
Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time. This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.
It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
is believed to have been broken with the Brakmo/Peterson srtt
calculation changes. The result of this bug is that TCP connections
could time out extremely quickly (in 12 seconds).
Also backed out jdp's partial fix for this problem in rev 1.17 of
tcp_timer.c as it is obsoleted by this commit.
Bug was pointed out by Kevin Lehey <kml@roller.nas.nasa.gov>.
PR: 6068
ftp://ftp.isi.edu/in-notes/iana/assignments/port-numbers
port numbers are divided into three ranges:
0 - 1023 Well Known Ports
1024 - 49151 Registered Ports
49152 - 65535 Dynamic and/or Private Ports
This patch changes the "local port range" from 40000-44999
to the range shown above (plus fix the comment in in_pcb.c).
WARNING: This may have an impact on firewall configurations!
PR: 5402
Reviewed by: phk
Submitted by: Stephen J. Roznowski <sjr@home.net>
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.
Reviewed by: bde
all the LKM load/unload junk, and don't forget to register the SYSINIT
so that the cdevsw entry is attached.
BTW: I think the way it builds it's /dev nodes on the fly as an LKM with
vnode ops is kinda cute - I guess that'd be one way to solve the devfs
persistance problems.. :-) (ie: have the drivers make the nodes in /dev
on disk directly if they are missing, but leave them alone if present).
its own zone; this is used particularly by TCP which allocates both inpcb and
tcpcb in a single allocation. (Some hackery ensures that the tcpcb is
reasonably aligned.) Also keep track of the number of pcbs of each type
allocated, and keep a generation count (instance version number) for future
use.
connect. This check was added as part of the defense against the "land"
attack, to prevent attacks which guess the ISS from going into ESTABLISHED.
The "src == dst" check will still prevent the single-homed case of the
"land" attack, and guessing ISS's should be hard anyway.
Submitted by: David Borman <dab@bsdi.com>
since there might be permanent entries still left after
calls to DeleteLink (it will be nullified by DeleteLink
if all entries are deleted, won't it ?)
2) in PacketAliasSetAddress, set the aliasing address
even when PKT_ALIAS_RESET_ON_ADDR_CHANGE is in effect.
Just don't clean up links in this case.
Submitted by: Ari Suutari <ari@suutari.iki.fi>
via: Charles Mott <cmott@srv.net>
PR: 5041
It controls if the system is to accept source routed packets.
It used to be such that, no matter if the setting of net.inet.ip.sourceroute,
source routed packets destined at us would be accepted. Now it is
controllable with eth default set to NOT accept those.
offset is non-zero:
- Do not match fragmented packets if the rule specifies a port or
TCP flags
- Match fragmented packets if the rule does not specify a port and
TCP flags
Since ipfw cannot examine port numbers or TCP flags for such packets,
it is now illegal to specify the 'frag' option with either ports or
tcpflags. Both kernel and ipfw userland utility will reject rules
containing a combination of these options.
BEWARE: packets that were previously passed may now be rejected, and
vice versa.
Reviewed by: Archie Cobbs <archie@whistle.com>
This means that a FreeBSD will only forward source routed packets
when both net.inet.ip.forwarding and net.inet.ip.sourceroute are set
to 1.
You can hit me now ;-)
Submitted by: Thomas Ptacek
TCP and UDP port numbers in fragmented packets when IP offset != 0.
2.2.6 candidate.
Discovered by: Marc Slemko <marcs@znep.com>
Submitted by: Archie Cobbs <archie@whistle.com> w/fix from me
a hashed port list. In the new scheme, in_pcblookup() goes away and is
replaced by a new routine, in_pcblookup_local() for doing the local port
check. Note that this implementation is space inefficient in that the PCB
struct is now too large to fit into 128 bytes. I might deal with this in the
future by using the new zone allocator, but I wanted these changes to be
extensively tested in their current form first.
Also:
1) Fixed off-by-one errors in the port lookup loops in in_pcbbind().
2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash()
to do the initialial hash insertion.
3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability.
4) Added a new routine, in_pcbremlists() to remove the PCB from the various
hash lists.
5) Added/deleted comments where appropriate.
6) Removed unnecessary splnet() locking. In general, the PCB functions should
be called at splnet()...there are unfortunately a few exceptions, however.
7) Reorganized a few structs for better cache line behavior.
8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in
the future, however.
These changes have been tested on wcarchive for more than a month. In tests
done here, connection establishment overhead is reduced by more than 50
times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a
large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be
recompiled; at the very least, this includes netstat(1).
rev 1.66. This fix contains both belt and suspenders.
Belt: ignore packets where src == dst and srcport == dstport in TCPS_LISTEN.
These packets can only legitimately occur when connecting a socket to itself,
which doesn't go through TCPS_LISTEN (it goes CLOSED->SYN_SENT->SYN_RCVD->
ESTABLISHED). This prevents the "standard" "land" attack, although doesn't
prevent the multi-homed variation.
Suspenders: send a RST in response to a SYN/ACK in SYN_RECEIVED state.
The only packets we should get in SYN_RECEIVED are
1. A retransmitted SYN, or
2. An ack of our SYN/ACK.
The "land" attack depends on us accepting our own SYN/ACK as an ACK;
in SYN_RECEIVED state; this should prevent all "land" attacks.
We also move up the sequence number check for the ACK in SYN_RECEIVED.
This neither helps nor hurts with respect to the "land" attack, but
puts more of the validation checking in one spot.
PR: kern/5103
This will not make any of object files that LINT create change; there
might be differences with INET disabled, but hardly anything compiled
before without INET anyway. Now the 'obvious' things will give a
proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The
only thing that _should_ work (but can't be made to compile reasonably
easily) is sppp :-(
This commit move struct arpcom from <netinet/if_ether.h> to
<net/if_arp.h>.
consequence, ipfw's list command now adjusts its output at runtime
based on the largest packet/byte counter values.
NOTE:
o The ipfw struct has changed requiring a recompile of both kernel
and userland ipfw utility.
o This probably should not be brought into 2.2.
PR: 3738
fix PR#3618 weren't sufficient since malloc() can block - allowing the
net interrupts in and leading to the same problem mentioned in the
PR (a panic). The order of operations has been changed so that this
is no longer a problem.
Needs to be brought into the 2.2.x branch.
PR: 3618
where if you are using the "reset tcp" firewall command,
the kernel would write ethernet headers onto random kernel stack locations.
Fought to the death by: terry, julian, archie.
fix valid for 2.2 series as well.
The #ifdef IPXIP in netipx/ipx_if.h is OK (used from ipx_usrreq.c and
ifconfig.c only).
I also fixed a typo IPXTUNNEL -> IPTUNNEL (and #ifdef'ed out the code
inside, as it never could have compiled - doh.)
close small security hole where an atacker could sendpackets with
IPDIVERT protocol, and select how it would be diverted thus bypassing
the ipfirewall. Discovered by inspection rather than attack.
(you'd have to know how the firewall was configured (EXACTLY) to
make use of this but..)
hope i've found out all files that actually depend on this dependancy.
IMHO, it's not very good practice to change the size of internal
structs depending on kernel options.
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
represent in the TCP header. The old code did effectively:
win = min(win, MAX_ALLOWED);
win = max(win, what_i_think_i_advertised_last_time);
so if what_i_think_i_advertised_last_time is bigger than can be
represented in the header (e.g. large buffers and no window scaling)
then we stuff a too-big number into a short. This fix reverses the
order of the comparisons.
PR: kern/4712
RST's being ignored, keeping a connection around until it times out, and
thus has the opposite effect of what was intended (which is to make the
system more robust to DoS attacks).
you don't want this (and the documentation explains why), but if you
use ipfw as an as-needed casual filter as needed which normally runs as
'allow all' then having the kernel and /sbin/ipfw get out of sync is a
*MAJOR* pain in the behind.
PR: 4141
Submitted by: Heikki Suonsivu <hsu@mail.clinet.fi>
potential problems with other automatic-reply ICMPs, but some of them may
depend on broadcast/multicast to operate. (This code can simply be
moved to the `reflect' label to generalize it.)
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
accommodate the expanded name, the ICMP types bitmap has been
reduced from 256 bits to 32.
A recompile of kernel and user level ipfw is required.
To be merged into 2.2 after a brief period in -current.
PR: bin/4209
Reviewed by: Archie Cobbs <archie@whistle.com>
be dropped when it has an unusual traffic pattern. For full details
as well as a test case that demonstrates the failure, see the
referenced PR.
Under certain circumstances involving the persist state, it is
possible for the receive side's tp->rcv_nxt to advance beyond its
tp->rcv_adv. This causes (tp->rcv_adv - tp->rcv_nxt) to become
negative. However, in the code affected by this fix, that difference
was interpreted as an unsigned number by max(). Since it was
negative, it was taken as a huge unsigned number. The effect was
to cause the receiver to believe that its receive window had negative
size, thereby rejecting all received segments including ACKs. As
the test case shows, this led to fruitless retransmissions and
eventually to a dropped connection. Even connections using the
loopback interface could be dropped. The fix substitutes the signed
imax() for the unsigned max() function.
PR: closes kern/3998
Reviewed by: davidg, fenner, wollman
these are quite extensive additions to the ipfw code.
they include a change to the API because the old method was
broken, but the user view is kept the same.
The new code allows a particular match to skip forward to a particular
line number, so that blocks of rules can be
used without checking all the intervening rules.
There are also many more ways of rejecting
connections especially TCP related, and
many many more ...
see the man page for a complete description.
switch. I needed 'LINT' to compile for other reasons so I kinda got the
blood on my hands. Note: I don't know how to test this, I don't know if
it works correctly.
Don't search for interface addresses matching interface "NULL"
it's likely to cause a page fault..
this can be triggered by the ipfw code rejecting a locally generated
packet (e.g. you decide to make some network unreachable by local users)
ppp (or will be shortly). Natd can now be updated to use
this library rather than carrying its own version of the code.
Submitted by: Charles Mott <cmott@srv.net>
in_setsockaddr and in_setpeeraddr.
Handle the case where the socket was disconnected before the network
interrupts were disabled.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
to fill in the nfs_diskless structure, at the cost of some kernel
bloat. The advantage is that this code works on a wider range of
network adapters than netboot. Several new kernel options are
documented in LINT.
Obtained from: parts of the code comes from NetBSD.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.
2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
Use the name argument almost the same in all LKM types. Maintain
the current behavior for the external (e.g., modstat) name for DEV,
EXEC, and MISC types being #name ## "_mod" and SYCALL and VFS only
#name. This is a candidate for change and I vote just the name without
the "_mod".
Change the DISPATCH macro to MOD_DISPATCH for consistency with the
other macros.
Add an LKM_ANON #define to eliminate the magic -1 and associated
signed/unsigned warnings.
Add MOD_PRIVATE to support wcd.c's poking around in the lkm structure.
Change source in tree to use the new interface.
Reviewed by: Bruce Evans