+ implement "limit" rules, which permit to limit the number of sessions
between certain host pairs (according to masks). These are a special
type of stateful rules, which might be of interest in some cases.
See the ipfw manpage for details.
+ merge the list pointers and ipfw rule descriptors in the kernel, so
the code is smaller, faster and more readable. This patch basically
consists in replacing "foo->rule->bar" with "rule->bar" all over
the place.
I have been willing to do this for ages!
MFC after: 1 week
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.
TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.
Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.
Reviewed by: -net
Obtained from: OpenBSD
is transmitted as all ones". This got broken after introduction
of delayed checksums as follows. Some guys (including Jonathan)
think that it is allowed to transmit all ones in place of a zero
checksum for TCP the same way as for UDP. (The discussion still
takes place on -net.) Thus, the 0 -> 0xffff checksum fixup was
first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65)
to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18,
INVERT expression). Besides that I disagree that it is valid for
TCP, there was no real problem until in_cksum.c,v 1.20, where the
in_cksum() was made just a special version of in_cksum_skip().
The side effect was that now every incoming IP datagram failed to
pass the checksum test (in_cksum() returned 0xffff when it should
actually return zero). It was fixed next day in revision 1.21,
by removing the INVERT expression. The latter also broke the
0 -> 0xffff fixup for UDP checksums.
Before this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1)
: 4500 001c 0001 0000 4011 7cce 7f00 0001
: 7f00 0001 80ed 80ee 0008 0000
After this change:
: tcpdump: listening on lo0
: 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1)
: 4500 001c 0001 0000 4011 7cce 7f00 0001
: 7f00 0001 80ed 80ee 0008 ffff
come from a dummynet pipe. Without this, the code which increments
the per-ifaddr stats can dereference an uninitialised pointer. This
should make dummynet usable again.
Reported by: "Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua>
Reviewed by: luigi, joe
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.
<sys/proc.h> to <sys/systm.h>.
Correctly document the #includes needed in the manpage.
Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.
statistics on a per network address basis.
Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.
Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.
fields between host and network byte order. The details:
o icmp_error() now does not add IP header length. This fixes the problem
when icmp_error() is called from ip_forward(). In this case the ip_len
of the original IP datagram returned with ICMP error was wrong.
o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host
byte order, so DTRT and convert these fields back to network byte order
before sending a message. This fixes the problem described in PR 16240
and PR 20877 (ip_id field was returned in host byte order).
o ip_ttl decrement operation in ip_forward() was moved down to make sure
that it does not corrupt the copy of original IP datagram passed later
to icmp_error().
o A copy of original IP datagram in ip_forward() was made a read-write,
independent copy. This fixes the problem I first reported to Garrett
Wollman and Bill Fenner and later put in audit trail of PR 16240:
ip_output() (not always) converts fields of original datagram to network
byte order, but because copy (mcopy) and its original (m) most likely
share the same mbuf cluster, ip_output()'s manipulations on original
also corrupted the copy.
o ip_output() now expects all three fields, ip_len, ip_off and (what is
significant) ip_id in host byte order. It was a headache for years that
ip_id was handled differently. The only compatibility issue here is the
raw IP socket interface with IP_HDRINCL socket option set and a non-zero
ip_id field, but ip.4 manual page was unclear on whether in this case
ip_id field should be in host or network byte order.
mbuf is marked for delayed checksums, then additionally mark the
packet as having it's checksums computed. This allows us to bypass
computing/checking the checksum entirely, which isn't really needeed
as the packet has never hit the wire.
Reviewed by: green
Without this, kernel will panic at getsockopt() of IPSEC_POLICY.
Also make compilable libipsec/test-policy.c which tries getsockopt() of
IPSEC_POLICY.
Approved by: jkh
Submitted by: sakane@kame.net
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
- Implement 'ipfw tee' (finally)
- Divert packets by calling new function divert_packet() directly instead
of going through protosw[].
- Replace kludgey global variable 'ip_divert_port' with a function parameter
to divert_packet()
- Replace kludgey global variable 'frag_divert_port' with a function parameter
to ip_reass()
- style(9) fixes
Reviewed by: julian, green
routines. The descriptor contains parameters which could be used
within those routines (eg. ip_output() ).
On passing, add IPPROTO_PGM entry to netinet/in.h
- unifdef -DCOMPAT_IPFW (this was on by default already)
- remove traces of in-kernel ip_nat package, it was never committed.
- Make IPFW and DUMMYNET initialize themselves rather than depend on
compiled-in hooks in ip_init(). This means they initialize the same
way both in-kernel and as kld modules. (IPFW initializes now :-)
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
Any packet that can be matched by a ipfw rule can be redirected
transparently to another port or machine. Redirection to another port
mostly makes sense with tcp, where a session can be set up
between a proxy and an unsuspecting client. Redirection to another machine
requires that the other machine also be expecting to receive the forwarded
packets, as their headers will not have been modified.
/sbin/ipfw must be recompiled!!!
Reviewed by: Peter Wemm <peter@freebsd.org>
Submitted by: Chrisy Luke <chrisy@flix.net>
Remove lots'o'hacks.
looutput is now static.
Other callers who want to use loopback to allow shortcutting
should call the special entrypoint for this, if_simloop(), which is
specifically designed for this purpose. Using looutput for this purpose
was problematic, particularly with bpf and trying to keep track
of whether one should be using the charateristics of the loopback interface
or the interface (e.g. if_ethersubr.c) that was requesting the loopback.
There was a whole class of errors due to this mis-use each of which had
hacks to cover them up.
Consists largly of hack removal :-)
so that the new behaviour is now default.
Solves the "infinite loop in diversion" problem when more than one diversion
is active.
Man page changes follow.
The new code is in -stable as the NON default option.
Prior to this change, Accidental recursion protection was done by
the diverted daemon feeding back the divert port number it got
the packet on, as the port number on a sendto(). IPFW knew not to
redivert a packet to this port (again). Processing of the ruleset
started at the beginning again, skipping that divert port.
The new semantic (which is how we should have done it the first time)
is that the port number in the sendto() is the rule number AFTER which
processing should restart, and on a recvfrom(), the port number is the
rule number which caused the diversion. This is much more flexible,
and also more intuitive. If the user uses the same sockaddr received
when resending, processing resumes at the rule number following that
that caused the diversion. The user can however select to resume rule
processing at any rule. (0 is restart at the beginning)
To enable the new code use
option IPFW_DIVERT_RESTART
This should become the default as soon as people have looked at it a bit
hope i've found out all files that actually depend on this dependancy.
IMHO, it's not very good practice to change the size of internal
structs depending on kernel options.
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
these are quite extensive additions to the ipfw code.
they include a change to the API because the old method was
broken, but the user view is kept the same.
The new code allows a particular match to skip forward to a particular
line number, so that blocks of rules can be
used without checking all the intervening rules.
There are also many more ways of rejecting
connections especially TCP related, and
many many more ...
see the man page for a complete description.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.
2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
cache lines. Removed the struct ip proto since only a couple of chars
were actually being used in it. Changed the order of compares in the
PCB hash lookup to take advantage of partial cache line fills (on PPro).
Discussed-with: wollman
set it in the first place, independent of whether sin->sin_port
is set.
The result is that diverted packets that are being forwarded
will be diverted once and only once on the way in (ip_input())
and again, once and only once on the way out (ip_output()) -
twice in total. ICMP packets that don't contain a port will
now also be diverted.
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).
using a sockaddr_dl.
Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR)
to work for multicast UDP and raw sockets as well. (They previously only
worked for unicast UDP).
This stuff should not be too destructive if the IPDIVERT is not compiled in..
be aware that this changes the size of the ip_fw struct
so ipfw needs to be recompiled to use it.. more changes coming to clean this up.
gcc only inlines memcpy()'s whose count is constant and didn't inline
these. I want memcpy() in the kernel go away so that it's obvious that
it doesn't need to be optimized. Now it is only used for one struct
copy in si.c.
unconventionally:
If COMPAT_IPFW is not defined, or if it is defined to 1, enable;
otherwise, disable.
This means that these changes actually have no effect on anyone at the
moment. (It just makes it easier for me to keep my code in sync.)
In the future, the `not defined' part of the hack should be eliminated,
but doing this now would require everyone to change their config files.
The same conditionals need to be made in ip_input.c as well for this to
ave any useful effect, but I'm not ready to do that right now.
the destination represents. For IP:
- Iff it is a host route, RTF_LOCAL and RTF_BROADCAST indicate local
(belongs to this host) and broadcast addresses, respectively.
- For all routes, RTF_MULTICAST is set if the destination is multicast.
The RTF_BROADCAST flag is used by ip_output() to eliminate a call to
in_broadcast() in a common case; this gives about 1% in our packet-generation
experiments. All three flags might be used (although they aren't now)
to determine whether a packet can be forwarded; a given host route can
represent a forwardable address if:
(rt->rt_flags & (RTF_HOST | RTF_LOCAL | RTF_BROADCAST | RTF_MULTICAST))
== RTF_HOST
Obviously, one still has to do all the work if a host route is not present,
but this code allows one to cache the results of such a lookup if rtalloc1()
is called without masking RTF_PRCLONING.
1) Require all callers to pass a valid route pointer to ip_output()
so that we don't have to check and allocate one off the stack
as was done before. This eliminates one test and some stack
bloat from the common (UDP and TCP) case.
2) Perform the IP header checksum in-line if it's of the usual length.
This results in about a 5% speed-up in my packet-generation test.
3) Use ip_vhl field rather than ip_v and ip_hl bitfields.
between ignoring options specified in the setsockopt call if IP_HDRINCL is set
(the UCB choice when VJ's code was brought in) vs allowing them (what everyone
else did, and what is assumed by programs everywhere...sigh).
Also perform some checking of the passed down packet to avoid running off
the end of a mbuf chain.
Reviewed by: fenner
Close the ip-fragment hole.
Waste less memory.
Rewrite to contemporary more readable style.
Kill separate IPACCT facility, use "accept" rules in IPFIREWALL.
Filter incoming >and< outgoing packets.
Replace "policy" by sticky "deny all" rule.
Rules have numbers used for ordering and deletion.
Remove "rerorder" code entirely.
Count packet & bytecount matches for rules.
Code in -current & -stable is now the same.
systems (my last change did not mix well with some firewall
configurations). As much as I dislike firewalls, this is one thing I
I was not prepared to break by default.. :-)
Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).
The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.*
This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.
The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it.
Partly suggested by: pst
Reviewed by: wollman
a few new wrinkles for MTU discovery which tcp_output() had better
be prepared to handle. ip_output() is also modified to do something
helpful in this case, since it has already calculated the information
we need.
case, multicast options are not passed to ip_mforward().) The previous
version had a wrong test, thus causing RSVP mrouters to forward RSVP messages
in violation of the spec.
fix Dennis Fortin's problem for good, if I've got it figured out right.
(The problem was that a `struct ifaddr' could get deleted out from under
the current requester, thus leaving him with an invalid interface pointer
and causing even more bogus accesses.)
1) Firewall is not subdivided on forwarding / blocking chains
anymore.Actually only one chain left-it was the blocking one.
2) LKM support.ip_fwdef.c is function pointers definition and
goes into kernel along with all INET stuff.
to something more recent than the ancient 1.2 release contained in
4.4. This code has the following advantages as compared to
previous versions (culled from the README file for the SunOS release):
- True multicast delivery
- Configurable rate-limiting of forwarded multicast traffic on each
physical interface or tunnel, using a token-bucket limiter.
- Simplistic classification of packets for prioritized dropping.
- Administrative scoping of multicast address ranges.
- Faster detection of hosts leaving groups.
- Support for multicast traceroute (code not yet available).
- Support for RSVP, the Resource Reservation Protocol.
What still needs to be done:
- The multicast forwarder needs testing.
- The multicast routing daemon needs to be ported.
- Network interface drivers need to have the `#ifdef MULTICAST' goop ripped
out of them.
- The IGMP code should probably be bogon-tested.
Some notes about the porting process:
In some cases, the Berkeley people decided to incorporate functionality from
later releases of the multicast code, but then had to do things differently.
As a result, if you look at Deering's patches, and then look at
our code, it is not always obvious whether the patch even applies. Let
the reader beware.
I ran ip_mroute.c through several passes of `unifdef' to get rid of
useless grot, and to permanently enable the RSVP support, which we will
include as standard.
Ported by: Garrett Wollman
Submitted by: Steve Deering and Ajit Thyagarajan (among others)
- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.
the interface output queue and further udp packets would be fragmented
and only partially sent - keeping the output queue full and jamming the
network, but not actually getting any real work done (because you can't
send just 'part' of a udp packet - if you fragment it, you must send
the whole thing). The fix involves adding a check to make sure that the
output queue has sufficient space for all of the fragments.