be dropped when it has an unusual traffic pattern. For full details
as well as a test case that demonstrates the failure, see the
referenced PR.
Under certain circumstances involving the persist state, it is
possible for the receive side's tp->rcv_nxt to advance beyond its
tp->rcv_adv. This causes (tp->rcv_adv - tp->rcv_nxt) to become
negative. However, in the code affected by this fix, that difference
was interpreted as an unsigned number by max(). Since it was
negative, it was taken as a huge unsigned number. The effect was
to cause the receiver to believe that its receive window had negative
size, thereby rejecting all received segments including ACKs. As
the test case shows, this led to fruitless retransmissions and
eventually to a dropped connection. Even connections using the
loopback interface could be dropped. The fix substitutes the signed
imax() for the unsigned max() function.
PR: closes kern/3998
Reviewed by: davidg, fenner, wollman
these are quite extensive additions to the ipfw code.
they include a change to the API because the old method was
broken, but the user view is kept the same.
The new code allows a particular match to skip forward to a particular
line number, so that blocks of rules can be
used without checking all the intervening rules.
There are also many more ways of rejecting
connections especially TCP related, and
many many more ...
see the man page for a complete description.
switch. I needed 'LINT' to compile for other reasons so I kinda got the
blood on my hands. Note: I don't know how to test this, I don't know if
it works correctly.
Don't search for interface addresses matching interface "NULL"
it's likely to cause a page fault..
this can be triggered by the ipfw code rejecting a locally generated
packet (e.g. you decide to make some network unreachable by local users)
ppp (or will be shortly). Natd can now be updated to use
this library rather than carrying its own version of the code.
Submitted by: Charles Mott <cmott@srv.net>
in_setsockaddr and in_setpeeraddr.
Handle the case where the socket was disconnected before the network
interrupts were disabled.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
to fill in the nfs_diskless structure, at the cost of some kernel
bloat. The advantage is that this code works on a wider range of
network adapters than netboot. Several new kernel options are
documented in LINT.
Obtained from: parts of the code comes from NetBSD.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.
2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
Use the name argument almost the same in all LKM types. Maintain
the current behavior for the external (e.g., modstat) name for DEV,
EXEC, and MISC types being #name ## "_mod" and SYCALL and VFS only
#name. This is a candidate for change and I vote just the name without
the "_mod".
Change the DISPATCH macro to MOD_DISPATCH for consistency with the
other macros.
Add an LKM_ANON #define to eliminate the magic -1 and associated
signed/unsigned warnings.
Add MOD_PRIVATE to support wcd.c's poking around in the lkm structure.
Change source in tree to use the new interface.
Reviewed by: Bruce Evans
cache lines. Removed the struct ip proto since only a couple of chars
were actually being used in it. Changed the order of compares in the
PCB hash lookup to take advantage of partial cache line fills (on PPro).
Discussed-with: wollman
the quality of the hash distribution. This does not fix a problem dealing
with poor distribution when using lots of IP aliases and listening
on the same port on every one of them...some other day perhaps; fixing
that requires significant code changes.
The use of xor was inspired by David S. Miller <davem@jenolan.rutgers.edu>
connect in TCP while sending urgent data. It is not clear what
purpose is served by doing this, but there's no good reason why it
shouldn't work.
Submitted by: tjevans@raleigh.ibm.com via wpaul
pr_usrreqs. Collapse duplicates with udp_usrreq.c and
tcp_usrreq.c (calling the generic routines in uipc_socket2.c and
in_pcb.c). Calling sockaddr()_ or peeraddr() on a detached
socket now traps, rather than harmlessly returning an error; this
should never happen. Allow the raw IP buffer sizes to be
controlled via sysctl.
in the route. This allows us to remove the unconditional setting of the
pipesize in the route, which should mean that SO_SNDBUF and SO_RCVBUF
should actually work again. While we're at it:
- Convert udp_usrreq from `mondo switch statement from Hell' to new-style.
- Delete old TCP mondo switch statement from Hell, which had previously
been diked out.
is administratively downed, all routes to that interface (including the
interface route itself) which are not static will be deleted. When
it comes back up, and addresses remaining will have their interface routes
re-added. This solves the problem where, for example, an Ethernet interface
is downed by traffic continues to flow by way of ARP entries.
affect programs that sit on top of divert(4) sockets. The
multicast routing code already unconditionally zeros the sum
before recalculating.
Any code that unconditionaly sums a packet without first zeroing
the sum (assuming that it's already zero'd) will break. No such
code seems to exist.
set it in the first place, independent of whether sin->sin_port
is set.
The result is that diverted packets that are being forwarded
will be diverted once and only once on the way in (ip_input())
and again, once and only once on the way out (ip_output()) -
twice in total. ICMP packets that don't contain a port will
now also be diverted.
to -current.
Thanks goes to Ulrike Nitzsche <ulrike@ifw-dresden.de> for giving me
a chance to test this. Only the PCI driver is tested though.
One final patch will follow in a separate commit. This is so that
everything up to here can be dragged into 2.2, if we decide so.
Reviewed by: joerg
Submitted by: Matt Thomas <matt@3am-software.com>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).
The rest of the code was treating it as a header mbuf, but it was
allocated as a normal mbuf.
This fixes the panic: ip_output no HDR when you have a multicast
tunnel configured.
duplicate ip address 204.162.228.7! sent from ethernet address: 08:00:20:09:7b:1d
changed to
arp: 08:00:20:09:7b:1d is using my IP address 204.162.228.7!
and
arp info overwritten for 204.162.228.2 by 08:00:20:09:7b:1d
changed to
arp: 204.162.228.2 moved from 08:00:20:07:b6:a0 to 08:00:20:09:7b:1d
I think the new wordings are more clear and could save some support
questions.
using a sockaddr_dl.
Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR)
to work for multicast UDP and raw sockets as well. (They previously only
worked for unicast UDP).
"high" and "secure"), we can't use a single variable to track the most
recently used port in all three ranges.. :-] This caused the next
transient port to be allocated from the start of the range more often than
it should.
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.
(yes I had tested the hell out of this).
I've also temporarily disabled the code so that it behaves as it previously
did (tail drop's the syns) pending discussion with fenner about some socket
state flags that I don't fully understand.
Submitted by: fenner
callers of it to take advantage of this. This reduces new connection
request overhead in the face of a large number of PCBs in the system.
Thanks to David Filo <filo@yahoo.com> for suggesting this and providing
a sample implementation (which wasn't used, but showed that it could be
done).
Reviewed by: wollman
drop the oldest entry in the queue.
There was a fair bit of discussion as to whether or not the
proper action is to drop a random entry in the queue. It's
my conclusion that a random drop is better than a head drop,
however profiling this section of code (done by John Capo)
shows that a head-drop results in a significant performance
increase.
There are scenarios where a random drop is more appropriate.
If I find one in reality, I'll add the random drop code under
a conditional.
Obtained from: discussions and code done by Vernon Schryver (vjs@sgi.com).
time, in seconds, that state for non-established TCP sessions stays about)
a sysctl modifyable variable.
[part 1 of two commits, I just realized I can't play with the indices as
I was typing this commit message.]
to "keepidle". this should not occur unless the connection has
been established via the 3-way handshake which requires an ACK
Submitted by: jmb
Obtained from: problem discussed in Stevens vol. 3
IPPORT_RESERVED that is used for selection when bind() is told to allocate
a reserved port.
Also, implement simple sanity checking for all the addresses set, to make
it a little harder for a user/sysadmin to shoot themselves in the feet.
pulled up already. This bug can cause the first packet from a source
to a group to be corrupted when it is delivered to a process listening
on the mrouter.
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
This stuff should not be too destructive if the IPDIVERT is not compiled in..
be aware that this changes the size of the ip_fw struct
so ipfw needs to be recompiled to use it.. more changes coming to clean this up.
- State when we've reached the limit on a particular rule in the kernel logfile
- State when a rule or all rules have been zero'd.
This gives a log of all actions that occur w/regard to the firewall
occurances, and can explain why a particular break-in attempt might not
get logged due to the limit being reached.
Reviewed by: alex
Reviewed by: phk
Reject the addition of rules that will never match (for example,
1.2.3.4:255.255.255.0). User level utilities specify the policy by either
masking the IP address for the user (as ipfw(8) does) or rejecting the
entry with an error. In either case, the kernel should not modify chain
entries to make them work.
LKM'ness. ACTUALLY_LKM_NOT_KERNEL is supposed to be so ugly that it
only gets used until <machine/conf.h> goes away. bsd.kmod.mk should
define a better-named general macro for this. Some places use
PSEUDO_LKM. This is another bad name.
Makefile:
Added IPFIREWALL_VERBOSE_LIMIT option (commented out).
broadcast and multicast routes, otherwise they will be expired by
arptimeout after a few minutes, reverting to " (incomplete)". This makes
the work done by rev 1.27 stay around until the route itself is deleted.
This is mainly cosmetic for 'arp' and 'netstat -r'.
for ARP requests.
The NetBSD version of this patch (see NetBSD PR kern/2381) has this change
already. This should close our PR kern/1140 .
Although it's not quite what he submitted, I got the idea from him so
Submitted by: Jin Guojun <jin@george.lbl.gov>
/kernel: in_rtqtimo: adjusted rtq_reallyold to 1066
/kernel: in_rtqtimo: adjusted rtq_reallyold to 710
inside of #ifdef DIAGNOSTIC to avoid the support questions from folks
asking what this means.
- Log ICMP type during verbose output.
- Added IPFIREWALL_VERBOSE_LIMIT option to prevent denial of service
attacks via syslog flooding.
- Filter based on ICMP type.
- Timestamp chain entries when they are matched.
- Interfaces can now be matched with a wildcard specification (i.e.
will match any interface unit for a given name).
- Prevent the firewall chain from being manipulated when securelevel
is greater than 2.
- Fixed bug that allowed the default policy to be deleted.
- Ability to zero individual accounting entries.
- Remove definitions of old_chk_ptr and old_ctl_ptr when compiling
ipfw as a lkm.
- Remove some redundant code shared between ip_fw_init and ipfw_load.
Closes PRs: 1192, 1219, and 1267.
gcc only inlines memcpy()'s whose count is constant and didn't inline
these. I want memcpy() in the kernel go away so that it's obvious that
it doesn't need to be optimized. Now it is only used for one struct
copy in si.c.
circumstances, caused perfectly good connections to be dropped. This
happened for connections over a LAN, where the retransmit timer
calculation TCP_REXMTVAL(tp) returned 0. If sending was blocked by flow
control for long enough, the old code dropped the connection, even
though timely replies were being received for all window probes.
Reviewed by: W. Richard Stevens <rstevens@noao.edu>
unconventionally:
If COMPAT_IPFW is not defined, or if it is defined to 1, enable;
otherwise, disable.
This means that these changes actually have no effect on anyone at the
moment. (It just makes it easier for me to keep my code in sync.)
In the future, the `not defined' part of the hack should be eliminated,
but doing this now would require everyone to change their config files.
The same conditionals need to be made in ip_input.c as well for this to
ave any useful effect, but I'm not ready to do that right now.
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps as control information (PR #1179).
Submitted by: Louis Mamakos <loiue@TransSys.com>
the destination represents. For IP:
- Iff it is a host route, RTF_LOCAL and RTF_BROADCAST indicate local
(belongs to this host) and broadcast addresses, respectively.
- For all routes, RTF_MULTICAST is set if the destination is multicast.
The RTF_BROADCAST flag is used by ip_output() to eliminate a call to
in_broadcast() in a common case; this gives about 1% in our packet-generation
experiments. All three flags might be used (although they aren't now)
to determine whether a packet can be forwarded; a given host route can
represent a forwardable address if:
(rt->rt_flags & (RTF_HOST | RTF_LOCAL | RTF_BROADCAST | RTF_MULTICAST))
== RTF_HOST
Obviously, one still has to do all the work if a host route is not present,
but this code allows one to cache the results of such a lookup if rtalloc1()
is called without masking RTF_PRCLONING.
1) Require all callers to pass a valid route pointer to ip_output()
so that we don't have to check and allocate one off the stack
as was done before. This eliminates one test and some stack
bloat from the common (UDP and TCP) case.
2) Perform the IP header checksum in-line if it's of the usual length.
This results in about a 5% speed-up in my packet-generation test.
3) Use ip_vhl field rather than ip_v and ip_hl bitfields.
1) Set the persist timer to help time-out connections in the CLOSING state.
2) Honor the keep-alive timer in the CLOSING state.
This fixes problems with connections getting "stuck" due to incompletion
of the final connection shutdown which can be a BIG problem on busy WWW
servers.