This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).
The rest of the code was treating it as a header mbuf, but it was
allocated as a normal mbuf.
This fixes the panic: ip_output no HDR when you have a multicast
tunnel configured.
duplicate ip address 204.162.228.7! sent from ethernet address: 08:00:20:09:7b:1d
changed to
arp: 08:00:20:09:7b:1d is using my IP address 204.162.228.7!
and
arp info overwritten for 204.162.228.2 by 08:00:20:09:7b:1d
changed to
arp: 204.162.228.2 moved from 08:00:20:07:b6:a0 to 08:00:20:09:7b:1d
I think the new wordings are more clear and could save some support
questions.
using a sockaddr_dl.
Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR)
to work for multicast UDP and raw sockets as well. (They previously only
worked for unicast UDP).
"high" and "secure"), we can't use a single variable to track the most
recently used port in all three ranges.. :-] This caused the next
transient port to be allocated from the start of the range more often than
it should.
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.
(yes I had tested the hell out of this).
I've also temporarily disabled the code so that it behaves as it previously
did (tail drop's the syns) pending discussion with fenner about some socket
state flags that I don't fully understand.
Submitted by: fenner
callers of it to take advantage of this. This reduces new connection
request overhead in the face of a large number of PCBs in the system.
Thanks to David Filo <filo@yahoo.com> for suggesting this and providing
a sample implementation (which wasn't used, but showed that it could be
done).
Reviewed by: wollman
drop the oldest entry in the queue.
There was a fair bit of discussion as to whether or not the
proper action is to drop a random entry in the queue. It's
my conclusion that a random drop is better than a head drop,
however profiling this section of code (done by John Capo)
shows that a head-drop results in a significant performance
increase.
There are scenarios where a random drop is more appropriate.
If I find one in reality, I'll add the random drop code under
a conditional.
Obtained from: discussions and code done by Vernon Schryver (vjs@sgi.com).
time, in seconds, that state for non-established TCP sessions stays about)
a sysctl modifyable variable.
[part 1 of two commits, I just realized I can't play with the indices as
I was typing this commit message.]
to "keepidle". this should not occur unless the connection has
been established via the 3-way handshake which requires an ACK
Submitted by: jmb
Obtained from: problem discussed in Stevens vol. 3
IPPORT_RESERVED that is used for selection when bind() is told to allocate
a reserved port.
Also, implement simple sanity checking for all the addresses set, to make
it a little harder for a user/sysadmin to shoot themselves in the feet.
pulled up already. This bug can cause the first packet from a source
to a group to be corrupted when it is delivered to a process listening
on the mrouter.
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.
This stuff should not be too destructive if the IPDIVERT is not compiled in..
be aware that this changes the size of the ip_fw struct
so ipfw needs to be recompiled to use it.. more changes coming to clean this up.
- State when we've reached the limit on a particular rule in the kernel logfile
- State when a rule or all rules have been zero'd.
This gives a log of all actions that occur w/regard to the firewall
occurances, and can explain why a particular break-in attempt might not
get logged due to the limit being reached.
Reviewed by: alex
Reviewed by: phk
Reject the addition of rules that will never match (for example,
1.2.3.4:255.255.255.0). User level utilities specify the policy by either
masking the IP address for the user (as ipfw(8) does) or rejecting the
entry with an error. In either case, the kernel should not modify chain
entries to make them work.
LKM'ness. ACTUALLY_LKM_NOT_KERNEL is supposed to be so ugly that it
only gets used until <machine/conf.h> goes away. bsd.kmod.mk should
define a better-named general macro for this. Some places use
PSEUDO_LKM. This is another bad name.
Makefile:
Added IPFIREWALL_VERBOSE_LIMIT option (commented out).
broadcast and multicast routes, otherwise they will be expired by
arptimeout after a few minutes, reverting to " (incomplete)". This makes
the work done by rev 1.27 stay around until the route itself is deleted.
This is mainly cosmetic for 'arp' and 'netstat -r'.
for ARP requests.
The NetBSD version of this patch (see NetBSD PR kern/2381) has this change
already. This should close our PR kern/1140 .
Although it's not quite what he submitted, I got the idea from him so
Submitted by: Jin Guojun <jin@george.lbl.gov>
/kernel: in_rtqtimo: adjusted rtq_reallyold to 1066
/kernel: in_rtqtimo: adjusted rtq_reallyold to 710
inside of #ifdef DIAGNOSTIC to avoid the support questions from folks
asking what this means.
- Log ICMP type during verbose output.
- Added IPFIREWALL_VERBOSE_LIMIT option to prevent denial of service
attacks via syslog flooding.
- Filter based on ICMP type.
- Timestamp chain entries when they are matched.
- Interfaces can now be matched with a wildcard specification (i.e.
will match any interface unit for a given name).
- Prevent the firewall chain from being manipulated when securelevel
is greater than 2.
- Fixed bug that allowed the default policy to be deleted.
- Ability to zero individual accounting entries.
- Remove definitions of old_chk_ptr and old_ctl_ptr when compiling
ipfw as a lkm.
- Remove some redundant code shared between ip_fw_init and ipfw_load.
Closes PRs: 1192, 1219, and 1267.
gcc only inlines memcpy()'s whose count is constant and didn't inline
these. I want memcpy() in the kernel go away so that it's obvious that
it doesn't need to be optimized. Now it is only used for one struct
copy in si.c.
circumstances, caused perfectly good connections to be dropped. This
happened for connections over a LAN, where the retransmit timer
calculation TCP_REXMTVAL(tp) returned 0. If sending was blocked by flow
control for long enough, the old code dropped the connection, even
though timely replies were being received for all window probes.
Reviewed by: W. Richard Stevens <rstevens@noao.edu>
unconventionally:
If COMPAT_IPFW is not defined, or if it is defined to 1, enable;
otherwise, disable.
This means that these changes actually have no effect on anyone at the
moment. (It just makes it easier for me to keep my code in sync.)
In the future, the `not defined' part of the hack should be eliminated,
but doing this now would require everyone to change their config files.
The same conditionals need to be made in ip_input.c as well for this to
ave any useful effect, but I'm not ready to do that right now.
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps as control information (PR #1179).
Submitted by: Louis Mamakos <loiue@TransSys.com>
the destination represents. For IP:
- Iff it is a host route, RTF_LOCAL and RTF_BROADCAST indicate local
(belongs to this host) and broadcast addresses, respectively.
- For all routes, RTF_MULTICAST is set if the destination is multicast.
The RTF_BROADCAST flag is used by ip_output() to eliminate a call to
in_broadcast() in a common case; this gives about 1% in our packet-generation
experiments. All three flags might be used (although they aren't now)
to determine whether a packet can be forwarded; a given host route can
represent a forwardable address if:
(rt->rt_flags & (RTF_HOST | RTF_LOCAL | RTF_BROADCAST | RTF_MULTICAST))
== RTF_HOST
Obviously, one still has to do all the work if a host route is not present,
but this code allows one to cache the results of such a lookup if rtalloc1()
is called without masking RTF_PRCLONING.
1) Require all callers to pass a valid route pointer to ip_output()
so that we don't have to check and allocate one off the stack
as was done before. This eliminates one test and some stack
bloat from the common (UDP and TCP) case.
2) Perform the IP header checksum in-line if it's of the usual length.
This results in about a 5% speed-up in my packet-generation test.
3) Use ip_vhl field rather than ip_v and ip_hl bitfields.
1) Set the persist timer to help time-out connections in the CLOSING state.
2) Honor the keep-alive timer in the CLOSING state.
This fixes problems with connections getting "stuck" due to incompletion
of the final connection shutdown which can be a BIG problem on busy WWW
servers.
common labels for LINT. There are still some common declarations for the
!KERNEL case in tcp_debug.h and spx_debug.h. trpt depends on the ones in
tcp_debug.h.
This fixes a panic that occurs when ifconfig ioctl(s) were interrupted
by IP traffic at the wrong time - resulting in a NULL pointer dereference.
This was originally noticed on a FreeBSD 1.0 system, but the problem still
exists in current sources.
keepalive on all tcp sessions. Setsockopt(2) cannot override this setting.
Maybe another one is needed that just changes the default for SO_KEEPALIVE ?
Requested by: Joe Greco <jgreco@brasil.moneng.mei.com>
Move ipip_input() and rsvp_input() prototypes to ip_var.h
Remove unused prototype for rip_ip_input() from ip_var.h
Remove unused variable *opts from rip_output()
Get rid of ac->ac_ipaddr and arpwhohas() since they assume that
an interface has only one address.
Obtained from: BSD/OS 2.1, via Rich Stevens <rstevens@noao.edu>
from Larry Peterson &co. at Arizona:
- Header prediction for ACKs did not exclude Fast Retransmit/Recovery.
- srtt calculation tended to get ``stuck'' and could never decrease
when below 8. It still can't, but the scaling factors are adjusted
so that this artifact does not cause as bad an effect on the RTO
value as it used to.
The paper also points out the incr/8 error that has been long since fixed,
and the problems with ACKing frequency resulting from the use of options
which I suspect to be fixed already as well (as part of the T/TCP work).
Obtained from: Brakmo & Peterson, ``Performance Problems in BSD4.4 TCP''
between ignoring options specified in the setsockopt call if IP_HDRINCL is set
(the UCB choice when VJ's code was brought in) vs allowing them (what everyone
else did, and what is assumed by programs everywhere...sigh).
Also perform some checking of the passed down packet to avoid running off
the end of a mbuf chain.
Reviewed by: fenner
Make a copy of the header of a packet that gets queued due to
lack of forwarding cache entry, so that nobody else can step
on it. Thanks to Mike Karels <karels@bsdi.com> for pointing
this one out.
Close the ip-fragment hole.
Waste less memory.
Rewrite to contemporary more readable style.
Kill separate IPACCT facility, use "accept" rules in IPFIREWALL.
Filter incoming >and< outgoing packets.
Replace "policy" by sticky "deny all" rule.
Rules have numbers used for ordering and deletion.
Remove "rerorder" code entirely.
Count packet & bytecount matches for rules.
Code in -current & -stable is now the same.
systems (my last change did not mix well with some firewall
configurations). As much as I dislike firewalls, this is one thing I
I was not prepared to break by default.. :-)
Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).
The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.*
This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.
The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it.
Partly suggested by: pst
Reviewed by: wollman
when a connection enters the ESTBLS state using T/TCP, then window
scaling wasn't properly handled. The fix is twofold.
1) When the 3WHS completes, make sure that we update our window
scaling state variables.
2) When setting the `virtual advertized window', then make sure
that we do not try to offer a window that is larger than the maximum
window without scaling (TCP_MAXWIN).
Reviewed by: davidg
Reported by: Jerry Chen <chen@Ipsilon.COM>
to 20000 through 30000. These numbers are used for local IP port numbers
when an explicit address is not specified.
The values are sysctl modifiable under: net.inet.ip.port_{first|last}_auto
These numbers do not overlap with any known server addresses, without going
above 32768 which are "negative" on some other implementations.
20000 through 30000 is 2.5 times larger than the old range, but some have
suggested even that may not be enough... (gasp!) Setting a low address
of 10000 should be plenty.. :-)
local address, that was assigned with ifconfig alias and netmask
0xffffffff, would receive duplictae udp packets.
This behaviour can easily be seen by having named run, and using the alias
address as the name server.
This solution is not the pretiest one, but after talk with Garreth, it
is seen as the most easy one.
to enable IP forwarding, use sysctl(8). Also did the same for IPX,
which involved inventing a completely new MIB from whole cloth (which
I may not quite have correct); be aware of this if you use IPX forwarding.
(The two should never have been controlled by the same option anyway.)
than separate ip_v and ip_hl members. Should have no effect on current code,
but I'd eventually like to get rid of those obnoxious bitfields completely.
others: start to populate the link-layer branch of the net mib, by
moving ARP to its proper place. (ARP is not a protocol family, it's an
interface layer between a medium-access layer and a protocol family.)
sysctl(8) needs to be taught about the structure of this branch, unless
Poul-Henning implements dynamic MIB exploration soon.
*' instead of caddr_t and it isn't optional (it never was). Most of the
netipx (and netns) pr_ctlinput functions abuse the second arg instead of
using the third arg but fixing this is beyond the scope of this round
of changes.
Add five sysctl variables that you should probably never tweak.
net.arp.t_prune: 300
net.arp.t_keep: 1200
net.arp.t_down: 20
net.arp.maxtries: 5
net.arp.useloopback: 1
net.arp.proxyall: 0
(It's net.arp because arp isn't limited to inet, though our present
implementation surely is).
Removed ifnet.if_init and ifnet.if_reset as they are generally unused.
Change the parameter passed to if_watchdog to be a ifnet * rather than
a unit number. All of this is an attempt to move toward not needing an
array of softc pointers (which is usually static in size) to point to
the driver softc.
if_ed.c:
Changed some of the argument passing to some functions to make a little
more sense.
if_ep.c, if_vx.c:
Killed completely bogus use of if_timer. It was being set in such a way
that the interface was being reset once per second (blech!).
- remove a redundant condition;
- complete all validity checks on segment before calling
soisconnected(so).
Reviewed by: Richard Stevens, davidg, wollman
have to decide whether to send a CC or CCnew option in our SYN segment
depending on the contents of our TAO cache. This decision has to be
made once when the connection starts. The earlier code delayed this
decision until the segment was assembled in tcp_output() and
retransmitted SYN segments could have different CC options.
Reviewed by: Richard Stevens, davidg, wollman
net.inet.ip.intr-queue-maxlen (=== ipintrq.ifq_maxlen)
and net.inet.ip.intr-queue-drops (=== ipintrq.ifq_drops)
There should probably be a standard way of getting the same information
going the other way.
in the FIN_WAIT_2 state in order to prevent the conn. hanging there
forever.
Reviewed by: davidg, olah
Submitted by: Arne Henrik Juul <arnej@imf.unit.no>
Obtained from: bugs@netbsd.org
Submitted by: Mike Mitchell, supervisor@alb.asctmd.com
This is a bulk mport of Mike's IPX/SPX protocol stacks and all the
related gunf that goes with it..
it is not guaranteed to work 100% correctly at this time
but as we had several people trying to work on it
I figured it would be better to get it checked in so
they could all get teh same thing to work on..
Mikes been using it for a year or so
but on 2.0
more changes and stuff will be merged in from other developers now that this is in.
Mike Mitchell, Network Engineer
AMTECH Systems Corporation, Technology and Manufacturing
8600 Jefferson Street, Albuquerque, New Mexico 87113 (505) 856-8000
supervisor@alb.asctmd.com
a few new wrinkles for MTU discovery which tcp_output() had better
be prepared to handle. ip_output() is also modified to do something
helpful in this case, since it has already calculated the information
we need.
capacity of the link, even if the route's MTU indicates that we cannot
send that much in their direction. (This might actually make it possible
to test Path MTU discovery in a useful variety of cases.)
turned out not to be necessary; simply watching for MTU decreases (which
we already did) automagically eliminates all the cases we were trying to
protect against.
middle of a fully-open window. Also, keep track of how many retransmits
we do as a result of MTU discovery. This may actually do more work than
necessary, but it's an unusual condition...
Suggested by: Janey Hoe <janey@lcs.mit.edu>
we're at it, eliminate obsolete exposure of `struct llinfo_arp' to
the world. (This dates back to when ARP entries were not stored in
the routing table, and there was no other way for the `arp' program
to read the whole table than to grovel around in /dev/kmem.)
to be no ill effects, and so far as Iknow none of the variables in
question depend on 16-bit wraparound behavior. (The sizes are in
many cases relics from when a PCB had to fit inside a 128-byte mbuf. PCBs
are no longer stored in that way, and the old structure would not have
fit, either.)
matching IP options..Check and test this - i made only a couple
of rough tests and this could be buggy.. Ipaccounting can't use
IP Options (and i don't see any need to cound packets with specific
options either..)
More to come...
time ago. I left in Garrett's one, because his was in the 4.4-Lite-2
location, making any diffs just that little bit smaller.
I presume this choice means that netstat needs to be recompiled before
"netstat -s" will give a meaningful answer on tcp stats.
and gated on `options MTUDISC' in the source. It is also practically
untested becausse (sniff!) I don't have easy access to a network with
an MTU of less than an Ethernet. If you have a small MTU network,
please try it and tell me if it works!
to be sent, just clean up and return ENOBUFS rather than silently
proceeding without sending any of the data. This makes it consistent
with the `#ifdef notyet' case immediately above.
Reviewed by: Andras Olah <olah@freebsd.org>
Obtained from: Lite-2
Garrett,
Here are some patches for the rate limiting code. It should be faster,
and in particular it doesn't leak malloc'd memory any more when rate_limit'ing
a phyint.
It now uses an mbuf chain at each vif, instead of the static queue array.
This means that the MAXQSIZE is now variable per vif (although there is no
interface to change it other than a debugger); this is an area for more
experimentation.
Bill
Submitted by: Bill Fenner <fenner@parc.xerox.com>
case, multicast options are not passed to ip_mforward().) The previous
version had a wrong test, thus causing RSVP mrouters to forward RSVP messages
in violation of the spec.
or ssthresh that we were able to use
tcp_var.h - declare tcpstat entries for above; declare tcp_{send,recv}space
in_rmx.c - fill in the MTU and pipe sizes with the defaults TCP would have
used anyway in the absence of values here
incorrect indents, a variety of poor coding practices such as comparing
pointers to constants ('0'), poor code structuring, etc, etc. This brings
the code up to the minimum standards for inclusion in FreeBSD.
2) Rewrote "bad_packet" code to be less buggy and more readable.
3) Removed a pile of goto's; the code is now somewhat less reminiscent
of a certain Italian pasta.
4) Changed all boolean returns of "0" and "1" to FALSE/TRUE.
know better when to cache values in the route, rather than relying on a
heuristic involving sequence numbers that broke when tcp_sendspace
was increased to 16k.
forwarding between networks that aren't directly connected) not to work
by intercepting the wrong protocol number. This should fix a bug reported
previously by someone I don't remember.
its connection parameters, we want to keep statistics on how often this
actually happens to see whether there is any work that needs to be done in
TCP itself.
Suggested by: John Wroclawski <jtw@lcs.mit.edu>
IGMPv2 spec. This fixes the following bugs:
o ntohs() on a char provides silly results
o timer needs to be scaled to units of PR_FASTHZ; this was being done
inconsistenly so now it gets done when it is initialized.
Reviewed by: Garrett Wollman
Submitted by: Bill Fenner <fenner@parc.xerox.com>
currently considering reducing the TCP fasttimo to 100ms to help improve
things, but this would be done as a seperate step at some point in the
future.
This was done because it was causing some sometimes serious performance
problems with T/TCP.
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.
On Tue, 09 May 1995 04:35:27 PDT, Richard Stevens wrote:
> In tcp_dooptions() under the case TCPOPT_CC there is an assignment
>
> to->to_flag |= TCPOPT_CC;
>
> that should be
>
> to->to_flag |= TOF_CC;
>
> I haven't thought through the ramifications of what's been happening ...
>
> Rich Stevens
Submitted by: rstevens@noao.edu (Richard Stevens)
Change IPTOS_PREC_ROUTINE to 0 (was conflict with IPTOS_LOWDELAY) according
to RFC 791 (unchanged since it) and BSDI 2.0 style
Submitted by: Igor Sviridov <siac@ua.net>
the lookup fails. Updated callers to deal with this. Call in_pcblookuphash
instead of in_pcblookup() in in_pcbconnect; this improves performance of
UDP output by about 17% in the standard case.