Commit Graph

84 Commits

Author SHA1 Message Date
Andre Oppermann
c94c54e4df Remove RFC1644 T/TCP support from the TCP side of the network stack.
A complete rationale and discussion is given in this message
and the resulting discussion:

 http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706

Note that this commit removes only the functional part of T/TCP
from the tcp_* related functions in the kernel.  Other features
introduced with RFC1644 are left intact (socket layer changes,
sendmsg(2) on connection oriented protocols)  and are meant to
be reused by a simpler and less intrusive reimplemention of the
previous T/TCP functionality.

Discussed on:	-arch
2004-11-02 22:22:22 +00:00
Andre Oppermann
21dcc96f4a When printing the initialization string and IPDIVERT is not compiled into the
kernel refer to it as "loadable" instead of "disabled".
2004-10-22 19:18:06 +00:00
Andre Oppermann
72584fd2c0 Convert IPDIVERT into a loadable module. This makes use of the dynamic loadability
of protocols.  The call to divert_packet() is done through a function pointer.  All
semantics of IPDIVERT remain intact.  If IPDIVERT is not loaded ipfw will refuse to
install divert rules and  natd will complain about 'protocol not supported'.  Once
it is loaded both will work and accept rules and open the divert socket.  The module
can only be unloaded if no divert sockets are open.  It does not close any divert
sockets when an unload is requested but will return EBUSY instead.
2004-10-19 21:14:57 +00:00
Brian Feldman
c99ee9e042 Add support to IPFW for matching by TCP data length. 2004-10-03 00:47:15 +00:00
Brian Feldman
6daf7ebd28 Add support to IPFW for classification based on "diverted" status
(that is, input via a divert socket).
2004-10-03 00:26:35 +00:00
Brian Feldman
974dfe3084 Add to IPFW the ability to do ALTQ classification/tagging. 2004-10-03 00:17:46 +00:00
Brian Feldman
88ef2880c1 Validate the action pointer to be within the rule size, so that trying to
add corrupt ipfw rules would not potentially panic the system or worse.
2004-09-30 17:42:00 +00:00
Max Laier
d6a8d58875 Add an additional struct inpcb * argument to pfil(9) in order to enable
passing along socket information. This is required to work around a LOR with
the socket code which results in an easy reproducible hard lockup with
debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do
so later. The missing piece is to turn the filter locking into a leaf lock
and will follow in a seperate (later) commit.

This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in
forseeable future.

Suggested by:		rwatson
A lot of work by:	csjp (he'd be even more helpful w/o mentor-reviews ;)
Reviewed by:		rwatson, csjp
Tested by:		-pf, -ipfw, LINT, csjp and myself
MFC after:		3 days

LOR IDs:		14 - 17 (not fixed yet)
2004-09-29 04:54:33 +00:00
Andre Oppermann
bda337d05e Do not allow 'ipfw fwd' command when IPFIREWALL_FORWARD is not compiled into
the kernel.  Return EINVAL instead.
2004-09-13 19:27:23 +00:00
Gleb Smirnoff
f46a6aac29 Recover normal behavior: return EINVAL to attempt to add a divert rule
when module is built without IPDIVERT.

Silence from:	andre
Approved by:	julian (mentor)
2004-09-05 20:06:50 +00:00
Ruslan Ermilov
9bfe6d472a Revert the last change to sys/modules/ipfw/Makefile and fix a
standalone module build in a better way.

Silence from:	andre
MFC after:	3 days
2004-08-26 14:18:30 +00:00
Andre Oppermann
70222723f3 When unloading ipfw module use callout_drain() to make absolutely sure that
all callouts are stopped and finished.  Move it before IPFW_LOCK() to avoid
deadlocking when draining callouts.
2004-08-19 23:31:40 +00:00
Andre Oppermann
9108601915 Do not unconditionally ignore IPDIVERT and IPFIREWALL_FORWARD when building
the ipfw KLD.

 For IPFIREWALL_FORWARD this does not have any side effects.  If the module
 has it but not the kernel it just doesn't do anything.

 For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT
 compiled in too.  However this is the least disturbing behaviour.  The user
 can just recompile either module or the kernel to match the other one.  The
 access to the machine is not denied if ipfw refuses to load.
2004-08-19 17:59:26 +00:00
Andre Oppermann
e4c97eff8e Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scripts
and to be able to disable ipfw if it was compiled directly into the kernel.
2004-08-19 17:38:47 +00:00
Andre Oppermann
9b932e9e04 Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland
and preserves the ipfw ABI.  The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.

However there are many changes how ipfw is and its add-on's are handled:

 In general ipfw is now called through the PFIL_HOOKS and most associated
 magic, that was in ip_input() or ip_output() previously, is now done in
 ipfw_check_[in|out]() in the ipfw PFIL handler.

 IPDIVERT is entirely handled within the ipfw PFIL handlers.  A packet to
 be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
 reassembly.  If not, or all fragments arrived and the packet is complete,
 divert_packet is called directly.  For 'tee' no reassembly attempt is made
 and a copy of the packet is sent to the divert socket unmodified.  The
 original packet continues its way through ip_input/output().

 ipfw 'forward' is done via m_tag's.  The ipfw PFIL handlers tag the packet
 with the new destination sockaddr_in.  A check if the new destination is a
 local IP address is made and the m_flags are set appropriately.  ip_input()
 and ip_output() have some more work to do here.  For ip_input() the m_flags
 are checked and a packet for us is directly sent to the 'ours' section for
 further processing.  Destination changes on the input path are only tagged
 and the 'srcrt' flag to ip_forward() is set to disable destination checks
 and ICMP replies at this stage.  The tag is going to be handled on output.
 ip_output() again checks for m_flags and the 'ours' tag.  If found, the
 packet will be dropped back to the IP netisr where it is going to be picked
 up by ip_input() again and the directly sent to the 'ours' section.  When
 only the destination changes, the route's 'dst' is overwritten with the
 new destination from the forward m_tag.  Then it jumps back at the route
 lookup again and skips the firewall check because it has been marked with
 M_SKIP_FIREWALL.  ipfw 'forward' has to be compiled into the kernel with
 'option IPFIREWALL_FORWARD' to enable it.

 DUMMYNET is entirely handled within the ipfw PFIL handlers.  A packet for
 a dummynet pipe or queue is directly sent to dummynet_io().  Dummynet will
 then inject it back into ip_input/ip_output() after it has served its time.
 Dummynet packets are tagged and will continue from the next rule when they
 hit the ipfw PFIL handlers again after re-injection.

 BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
 they did before.  Later this will be changed to dedicated ETHER PFIL_HOOKS.

More detailed changes to the code:

 conf/files
	Add netinet/ip_fw_pfil.c.

 conf/options
	Add IPFIREWALL_FORWARD option.

 modules/ipfw/Makefile
	Add ip_fw_pfil.c.

 net/bridge.c
	Disable PFIL_HOOKS if ipfw for bridging is active.  Bridging ipfw
	is still directly invoked to handle layer2 headers and packets would
	get a double ipfw when run through PFIL_HOOKS as well.

 netinet/ip_divert.c
	Removed divert_clone() function.  It is no longer used.

 netinet/ip_dummynet.[ch]
	Neither the route 'ro' nor the destination 'dst' need to be stored
	while in dummynet transit.  Structure members and associated macros
	are removed.

 netinet/ip_fastfwd.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_fw.h
	Removed 'ro' and 'dst' from struct ip_fw_args.

 netinet/ip_fw2.c
	(Re)moved some global variables and the module handling.

 netinet/ip_fw_pfil.c
	New file containing the ipfw PFIL handlers and module initialization.

 netinet/ip_input.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.  ip_forward() does not longer require
	the 'next_hop' struct sockaddr_in argument.  Disable early checks
	if 'srcrt' is set.

 netinet/ip_output.c
	Removed all direct ipfw handling code and replace it with the new
	'ipfw forward' handling code.

 netinet/ip_var.h
	Add ip_reass() as general function.  (Used from ipfw PFIL handlers
	for IPDIVERT.)

 netinet/raw_ip.c
	Directly check if ipfw and dummynet control pointers are active.

 netinet/tcp_input.c
	Rework the 'ipfw forward' to local code to work with the new way of
	forward tags.

 netinet/tcp_sack.c
	Remove include 'opt_ipfw.h' which is not needed here.

 sys/mbuf.h
	Remove m_claim_next() macro which was exclusively for ipfw 'forward'
	and is no longer needed.

Approved by:	re (scottl)
2004-08-17 22:05:54 +00:00
Christian S.J. Peron
31c88a3043 Add the ability to associate ipfw rules with a specific prison ID.
Since the only thing truly unique about a prison is it's ID, I figured
this would be the most granular way of handling this.

This commit makes the following changes:

- Adds tokenizing and parsing for the ``jail'' command line option
  to the ipfw(8) userspace utility.
- Append the ipfw opcode list with O_JAIL.
- While Iam here, add a comment informing others that if they
  want to add additional opcodes, they should append them to the end
  of the list to avoid ABI breakage.
- Add ``fw_prid'' to the ipfw ucred cache structure.
- When initializing ucred cache, if the process is jailed,
  set fw_prid to the prison ID, otherwise set it to -1.
- Update man page to reflect these changes.

This change was a strong motivator behind the ucred caching
mechanism in ipfw.

A sample usage of this new functionality could be:

    ipfw add count ip from any to any jail 2

It should be noted that because ucred based constraints
are only implemented for TCP and UDP packets, the same
applies for jail associations.

Conceptual head nod by:	pjd
Reviewed by:	rwatson
Approved by:	bmilekic (mentor)
2004-08-12 22:06:55 +00:00
Andre Oppermann
6e234ede37 Only invoke verify_path() for verrevpath and versrcreach when we have an IP packet. 2004-08-11 11:41:11 +00:00
Andre Oppermann
5f9541ecbd New ipfw option "antispoof":
For incoming packets, the packet's source address is checked if it
 belongs to a directly connected network.  If the network is directly
 connected, then the interface the packet came on in is compared to
 the interface the network is connected to.  When incoming interface
 and directly connected interface are not the same, the packet does
 not match.

Usage example:

 ipfw add deny ip from any to any not antispoof in

Manpage education by:	ru
2004-08-09 16:12:10 +00:00
Andre Oppermann
55db762b76 Extend versrcreach by checking against the rt_flags for RTF_REJECT and
RTF_BLACKHOLE as well.

To quote the submitter:

 The uRPF loose-check implementation by the industry vendors, at least on Cisco
 and possibly Juniper, will fail the check if the route of the source address
 is pointed to Null0 (on Juniper, discard or reject route). What this means is,
 even if uRPF Loose-check finds the route, if the route is pointed to blackhole,
 uRPF loose-check must fail. This allows people to utilize uRPF loose-check mode
 as a pseudo-packet-firewall without using any manual filtering configuration --
 one can simply inject a IGP or BGP prefix with next-hop set to a static route
 that directs to null/discard facility. This results in uRPF Loose-check failing
 on all packets with source addresses that are within the range of the nullroute.

Submitted by:	James Jun <james@towardex.com>
2004-07-21 19:55:14 +00:00
Juli Mallett
765d141c78 Make M_SKIP_FIREWALL a global (and semantic) flag, preventing anything from
using M_PROTO6 and possibly shooting someone's foot, as well as allowing the
firewall to be used in multiple passes, or with a packet classifier frontend,
that may need to explicitly allow a certain packet.  Presently this is handled
in the ipfw_chk code as before, though I have run with it moved to upper
layers, and possibly it should apply to ipfilter and pf as well, though this
has not been investigated.

Discussed with:	luigi, rwatson
2004-07-17 02:40:13 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Robert Watson
d67ec3dd48 When asserting non-Giant locks in the network stack, also assert
Giant if debug.mpsafenet=0, as any points that require synchronization
in the SMPng world also required it in the Giant-world:

- inpcb locks (including IPv6)
- inpcbinfo locks (including IPv6)
- dummynet subsystem lock
- ipfw2 subsystem lock
2004-06-24 02:01:48 +00:00
Christian S.J. Peron
d316f2cf4f Modify ip fw so that whenever UID or GID constraints exist in a
ruleset, the pcb is looked up once per ipfw_chk() activation.

This is done by extracting the required information out of the PCB
and caching it to the ipfw_chk() stack. This should greatly reduce
PCB looking contention and speed up the processing of UID/GID based
firewall rules (especially with large UID/GID rulesets).

Some very basic benchmarks were taken which compares the number
of in_pcblookup_hash(9) activations to the number of firewall
rules containing UID/GID based contraints before and after this patch.

The results can be viewed here:
o http://people.freebsd.org/~csjp/ip_fw_pcb.png

Reviewed by:	andre, luigi, rwatson
Approved by:	bmilekic (mentor)
2004-06-11 22:17:14 +00:00
Ruslan Ermilov
dd4d62c7d8 init_tables() must be run after sys/net/route.c:route_init(). 2004-06-10 20:20:37 +00:00
Ruslan Ermilov
cd8b5ae0ae Introduce a new feature to IPFW2: lookup tables. These are useful
for handling large sparse address sets.  Initial implementation by
Vsevolod Lobko <seva@ip.net.ua>, refined by me.

MFC after:	1 week
2004-06-09 20:10:38 +00:00
Poul-Henning Kamp
41ee9f1c69 Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>
2004-05-30 17:57:46 +00:00
Christian S.J. Peron
b5ef991561 Add a super-user check to ipfw_ctl() to make sure that the calling
process is a non-prison root. The security.jail.allow_raw_sockets
sysctl variable is disabled by default, however if the user enables
raw sockets in prisons, prison-root should not be able to interact
with firewall rule sets.

Approved by:	rwatson, bmilekic (mentor)
2004-05-25 15:02:12 +00:00
Andre Oppermann
22b5770b99 Add the option versrcreach to verify that a valid route to the
source address of a packet exists in the routing table.  The
default route is ignored because it would match everything and
render the check pointless.

This option is very useful for routers with a complete view of
the Internet (BGP) in the routing table to reject packets with
spoofed or unrouteable source addresses.

Example:

 ipfw add 1000 deny ip from any to any not versrcreach

also known in Cisco-speak as:

  ip verify unicast source reachable-via any

Reviewed by:	luigi
2004-04-23 14:28:38 +00:00
Max Laier
ac9d7e2618 Re-remove MT_TAGs. The problems with dummynet have been fixed now.
Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam
2004-02-25 19:55:29 +00:00
Max Laier
36e8826ffb Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)
2004-02-18 00:04:52 +00:00
Max Laier
1094bdca51 This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).

This is (mostly) work from: sam

Silence from: -arch
Approved by: bms(mentor), sam, rwatson
2004-02-13 19:14:16 +00:00
Hajimu UMEMOTO
8b8a0cef40 NULL is not 0.
Submitted by:	"Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
2003-12-24 18:22:04 +00:00
Maxim Konovalov
1c86761b2a o IN_MULTICAST wants an address in host byte order.
PR:		kern/60304
Submitted by:	demon
MFC after:	1 week
2003-12-16 18:21:47 +00:00
Sam Leffler
d559f5c3d8 Include opt_ipsec.h so IPSEC/FAST_IPSEC is defined and the appropriate
code is compiled in to support the O_IPSEC operator.  Previously no
support was included and ipsec rules were always matching.  Note that
we do not return an error when an ipsec rule is added and the kernel
does not have IPsec support compiled in; this is done intentionally
but we may want to revisit this (document this in the man page).

PR:		58899
Submitted by:	Bjoern A. Zeeb
Approved by:	re (rwatson)
2003-12-02 00:23:45 +00:00
Andre Oppermann
623f556031 Fix verify_rev_path() function. The author of this function tried to
cut corners which completely broke down when the routing table locking
was introduced.

Reviewed by:	sam (mentor)
Approved by:	re (rwatson)
2003-11-27 09:40:13 +00:00
Sam Leffler
6714d7c751 Correct a problem where ipfw-generated packets were being returned
for ipfw processing w/o an indication the packets were generated
by ipfw--and so should not be processed (this manifested itself
as a LOR.)  The flag bit in the mbuf that was used to mark the
packets was not listed in M_COPYFLAGS so if a packet had a header
prepended (as done by IPsec) the flag was lost.  Correct this by
defining a new M_PROTO6 flag and use it to mark packets that need
this processing.

Reviewed by:	bms
Approved by:	re (rwatson)
MFC after:	2 weeks
2003-11-24 03:57:03 +00:00
Sam Leffler
6a3ca7514d Use MPSAFE callouts only when debug.mpsafenet is 1. Both timer routines
potentially transmit packets that may enter KAME IPsec w/o Giant if the
callouts are marked MPSAFE.

Reviewed by:	ume
Approved by:	re (rwatson)
2003-11-23 18:13:41 +00:00
Andre Oppermann
97d8d152c2 Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table.  Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination.  Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 20:07:39 +00:00
Andre Oppermann
26d02ca7ba Remove RTF_PRCLONING from routing table and adjust users of it
accordingly.  The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache.  The
network stack remains fully useable with this change.

Reviewed by:	sam (mentor), bms
Reviewed by:	-net, -current, core@kame.net (IPv6 parts)
Approved by:	re (scottl)
2003-11-20 19:47:31 +00:00
Maxim Konovalov
dbf7b38125 Fix an arguments order in check_uidgid() call.
PR:		kern/59314
Submitted by:	Andrey V. Shytov
Approved by:	re (rwatson, jhb)
2003-11-20 10:28:33 +00:00
Andre Oppermann
02c1c7070e Remove the global one-level rtcache variable and associated
complex locking and rework ip_rtaddr() to do its own rtlookup.
Adopt all its callers to this and make ip_output() callable
with NULL rt pointer.

Reviewed by:	sam (mentor)
2003-11-14 21:48:57 +00:00
Sam Leffler
8f1ee3683d Move uid/gid checking logic out of line and lock inpcb usage. This
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).

Supported by:	FreeBSD Foundation
2003-11-07 23:26:57 +00:00
Hajimu UMEMOTO
aef3a65eb7 use ipsec_getnhist() instead of obsoleted ipsec_gethist().
Submitted by:	"Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Reviewed by:	Ari Suutari <ari@suutari.iki.fi> (ipfw@)
2003-11-07 20:25:47 +00:00
Brooks Davis
9bf40ede4a Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
Kirk McKusick
b03587f06a Malloc buckets of size 128 have been having their 64-byte offset
trashed after being freed. This has caused several panics including
kern/42277 related to soft updates. Jim Kuhn tracked the problem
down to ipfw limit rule processing.  In the expiry of dynamic rules,
it is possible for an O_LIMIT_PARENT rule to be removed when it still
has live children.  When the children eventually do expire, a pointer
to the (long gone) parent is dereferenced and a count decremented.
Since this memory can, and is, allocated for other purposes (in the
case of kern/42277 an inodedep structure), chaos ensues. The offset
in question in inodedep is the offset of the 16 bit count field in
the ipfw2 ipfw_dyn_rule.

Submitted by:	Jim Kuhn <jkuhn@sandvine.com>
Reviewed by:	"Evgueni V. Gavrilov" <aquatique@rusunix.org>
Reviewed by:	Ben Pfountz <netprince@vt.edu>
MFC after:	1 week
2003-10-16 02:00:12 +00:00
Sam Leffler
598345da4b Bandaid locking change: mark static rule mutex recursive so re-entry when
sending an ICMP packet doesn't cause a panic.  A better solution is needed;
possibly defering the transmit to a dedicated thread.

Observed by:	"Aaron Wohl" <freebsd@soith.com>
2003-09-17 22:06:47 +00:00
Sam Leffler
293941a556 Add locking.
o change timeout to MPSAFE callout
o restructure rule deletion to deal with locking requirements
o replace static buffer used for ipfw control operations with malloc'd storage

Sponsored by:	FreeBSD Foundation
2003-09-17 00:56:50 +00:00
Luigi Rizzo
4805529cf8 Allow set 31 to be used for rules other than 65535.
Set 31 is still special because rules belonging to it are not deleted
by the "ipfw flush" command, but must be deleted explicitly with
"ipfw delete set 31" or by individual rule numbers.

This implement a flexible form of "persistent rules" which you might
want to have available even after an "ipfw flush".
Note that this change does not violate POLA, because you could not
use set 31 in a ruleset before this change.

sbin/ipfw changes to allow manipulation of set 31 will follow shortly.

Suggested by: Paul Richards
2003-07-15 23:07:34 +00:00
Luigi Rizzo
72e02d4dac Implement comments embedded into ipfw2 instructions.
Since we already had 'O_NOP' instructions which always match, all
I needed to do is allow the NOP command to have arbitrary length
(i.e. move its label in a different part of the switch() which
validates instructions).

The kernel must know nothing about comments, everything else is
done in userland (which will be described in the upcoming ipfw2.c
commit).
2003-07-12 05:54:17 +00:00
Luigi Rizzo
7a1dfbc0d3 Merge the handlers of O_IP_SRC_MASK and O_IP_DST_MASK opcodes, and
support matching a list of addr/mask pairs so one can write
more efficient rulesets which were not possible before e.g.

    add 100 skipto 1000 not src-ip 10.0.0.0/8,127.0.0.1/8,192.168.0.0/16

The change is fully backward compatible.
ipfw2 and manpage commit to follow.

MFC after: 3 days
2003-07-08 07:44:42 +00:00