freebsd-skq

Author	SHA1	Message	Date
Mike Silbersack	4b421e2daa	Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports. Approved by: re (kensmith)	2007-10-07 20:44:24 +00:00
George V. Neville-Neil	b2630c2934	Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing	2007-07-03 12:13:45 +00:00
George V. Neville-Neil	2cb64cb272	Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing	2007-07-01 11:41:27 +00:00
Bruce M Simpson	71498f308b	Import rewrite of IPv4 socket multicast layer to support source-specific and protocol-independent host mode multicast. The code is written to accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work. This change only pertains to FreeBSD's use as a multicast end-station and does not concern multicast routing; for an IGMPv3/MLDv2 router implementation, consider the XORP project. The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6, which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html Summary * IPv4 multicast socket processing is now moved out of ip_output.c into a new module, in_mcast.c. * The in_mcast.c module implements the IPv4 legacy any-source API in terms of the protocol-independent source-specific API. * Source filters are lazy allocated as the common case does not use them. They are part of per inpcb state and are covered by the inpcb lock. * struct ip_mreqn is now supported to allow applications to specify multicast joins by interface index in the legacy IPv4 any-source API. * In UDP, an incoming multicast datagram only requires that the source port matches the 4-tuple if the socket was already bound by source port. An unbound socket SHOULD be able to receive multicasts sent from an ephemeral source port. * The UDP socket multicast filter mode defaults to exclusive, that is, sources present in the per-socket list will be blocked from delivery. * The RFC 3678 userland functions have been added to libc: setsourcefilter, getsourcefilter, setipv4sourcefilter, getipv4sourcefilter. * Definitions for IGMPv3 are merged but not yet used. * struct sockaddr_storage is now referenced from <netinet/in.h>. It is therefore defined there if not already declared in the same way as for the C99 types. * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF which are then interpreted as interface indexes) is now deprecated. * A patch for the Rhyolite.com routed in the FreeBSD base system is available in the -net archives. This only affects individuals running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces. * Make IPv6 detach path similar to IPv4's in code flow; functionally same. * Bump __FreeBSD_version to 700048; see UPDATING. This work was financially supported by another FreeBSD committer. Obtained from: p4://bms_netdev Submitted by: Wilbert de Graaf (original work) Reviewed by: rwatson (locking), silence from fenner, net@ (but with encouragement)	2007-06-12 16:24:56 +00:00
Robert Watson	f2565d68a4	Move universally to ANSI C function declarations, with relatively consistent style(9)-ish layout.	2007-05-10 15:58:48 +00:00
Bruce M Simpson	73ec8173eb	Purge two redundant case labels.	2007-03-23 09:43:36 +00:00
Bruce M Simpson	a3fd02d88b	Fix undirected broadcast sends for the case where SO_DONTROUTE has also been set at the socket layer, in our somewhat convoluted IPv4 source selection logic in ip_output(). IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255 must always be delivered on a local link with a TTL of 1. If IP_ONESBCAST has been set at the socket layer, also perform destination interface lookup for point-to-point interfaces based on the destination address of the link; previously it was not possible to use the option with such interfaces; also, the destination/broadcast address fields map to the same field within struct ifnet, which doesn't help matters. One more valid fix going forward for these issues is to treat 255.255.255.255 as a destination in its own right in the forwarding trie. Other implementations do this. It fits with the use of multiple paths, though it then becomes necessary to specify interface preference. This hack will eventually go away when that comes to pass. Reviewed by: andre MFC after: 1 week	2007-03-01 13:29:30 +00:00
Bruce M Simpson	3dbee59bd4	Back out revision 1.264. Fixing the IP accounting issue, if we plan to do so, needs to be better thought out; the 'fix' introduces a hash lookup and a possible kernel panic. Reported by: Mark Tinguely	2006-12-10 13:44:00 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Andre Oppermann	6a7c943c59	Remove stone-aged and irrelevant "#ifndef notdef".	2006-09-29 16:44:45 +00:00
Bruce M Simpson	07ea6709ea	Account for output IP datagrams on the ifaddr where they originated from, not the first ifaddr on the ifp. This is similar to what NetBSD does. PR: kern/72936 Submitted by: alfred Reviewed by: andre	2006-09-25 10:11:16 +00:00
Andre Oppermann	384a05bfd0	Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the validity of ro->ro_rt first. This prevents crashing on any non-normally routed IP packet. Coverity CID: 162 (incorrectly, it was re-introduced by previous commit)	2006-09-11 19:56:10 +00:00
John-Mark Gurney	3ae2ad088e	make use of the host route's mtu for processing. This means we can now support a network w/ split mtu's by assigning each host route the correct mtu. an aspiring programmer could write a daemon to probe hosts and find out if they support a larger mtu.	2006-09-10 17:49:09 +00:00
Andre Oppermann	233dcce118	First step of TSO (TCP segmentation offload) support in our network stack. o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-06 21:51:59 +00:00
Andre Oppermann	773725a255	Fix the socket option IP_ONESBCAST by giving it its own case in ip_output() and skip over the normal IP processing. Add a supporting function ifa_ifwithbroadaddr() to verify and validate the supplied subnet broadcast address. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 17:12:10 +00:00
Julian Elischer	b7522c27d2	Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was in older versions of FreeBSD. This option is pointless as it is needed in just about every interesting usage of forward that I have ever seen. It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x or 7.x Reviewed by: glebius MFC after: 1 week	2006-08-17 00:37:03 +00:00
Gleb Smirnoff	4d09f5a030	Fix URL to Bellovin's paper. Submitted by: Anton Yuzhaninov <citrin rambler-co.ru>	2006-06-29 13:38:36 +00:00
Maxim Konovalov	635354c446	o Add missed error check: in ip_ctloutput() sooptcopyin() returns a result but we never examine it. Reviewed by: rwatson MFC after: 2 weeks	2006-05-21 17:52:08 +00:00
Bruce M Simpson	3548bfc964	Fix a long-standing limitation in IPv4 multicast group membership. By making the imo_membership array a dynamically allocated vector, this minimizes disruption to existing IPv4 multicast code. This change breaks the ABI for the kernel module ip_mroute.ko, and may cause a small amount of churn for folks working on the IGMPv3 merge. Previously, sockets were subject to a compile-time limitation on the number of IPv4 group memberships, which was hard-coded to 20. The imo_membership relationship, however, is 1:1 with regards to a tuple of multicast group address and interface address. Users who ran routing protocols such as OSPF ran into this limitation on machines with a large system interface tree.	2006-05-14 14:22:49 +00:00
Christian S.J. Peron	604afec496	Somewhat re-factor the read/write locking mechanism associated with the packet filtering mechanisms to use the new rwlock(9) locking API: - Drop the variables stored in the phil_head structure which were specific to conditions and the home rolled read/write locking mechanism. - Drop some includes which were used for condition variables - Drop the inline functions, and convert them to macros. Also, move these macros into pfil.h - Move pfil list locking macros intp phil.h as well - Rename ph_busy_count to ph_nhooks. This variable will represent the number of IN/OUT hooks registered with the pfil head structure - Define PFIL_HOOKED macro which evaluates to true if there are any hooks to be ran by pfil_run_hooks - In the IP/IP6 stacks, change the ph_busy_count comparison to use the new PFIL_HOOKED macro. - Drop optimization in pfil_run_hooks which checks to see if there are any hooks to be ran, and returns if not. This check is already performed by the IP stacks when they call: if (!PFIL_HOOKED(ph)) goto skip_hooks; - Drop in assertion which makes sure that the number of hooks never drops below 0 for good measure. This in theory should never happen, and if it does than there are problems somewhere - Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep - Drop variables which support home rolled read/write locking mechanism from the IPFW firewall chain structure. - Swap out the read/write firewall chain lock internal to use the rwlock(9) API instead of our home rolled version - Convert the inlined functions to macros Reviewed by: mlaier, andre, glebius Thanks to: jhb for the new locking API	2006-02-02 03:13:16 +00:00
Andre Oppermann	1dfcf0d2a3	Move the IPSEC related code blocks to their own file to unclutter and signifincantly improve the readability of ip_input() and ip_output() again. The resulting IPSEC hooks in ip_input() and ip_output() may be used later on for making IPSEC loadable. This move is mostly mechanical and should preserve current IPSEC behaviour as-is. Nothing shall prevent improvements in the way IPSEC interacts with the IPv4 stack. Discussed with: bz, gnn, rwatson; (earlier version)	2006-02-01 13:55:03 +00:00
Andre Oppermann	8f8d29f686	In in_delayed_cksum() we can't perform a m_pullup() as it may change the mbuf pointer and we don't have any way of passing it back to the callers. Instead just fail silently without updating the checksum but leaving the mbuf+chain intact. A search in our GNATS database did not turn up any match for the existing warning message when this case is encountered. Found by: Coverity Prevent(tm) Coverity ID: CID779 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-01-18 18:49:16 +00:00
Andre Oppermann	39550088cf	Prevent dereferencing a NULL route pointer when trying to update the route MTU. This bug is very difficult to reach and not remotely exploitable. Found by: Coverity Prevent(tm) Coverity ID: CID162 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-01-18 15:05:05 +00:00
Gleb Smirnoff	bbce982bd5	When we drop packet due to no space in output interface output queue, also increase the ifp->if_snd.ifq_drops. PR: 72440 Submitted by: ikob	2005-12-06 11:16:11 +00:00
Andre Oppermann	ef39adf007	Consolidate all IP Options handling functions into ip_options.[ch] and include ip_options.h into all files making use of IP Options functions. From ip_input.c rev 1.306: ip_dooptions(struct mbuf m, int pass) save_rte(m, option, dst) ip_srcroute(m0) ip_stripoptions(m, mopt) From ip_output.c rev 1.249: ip_insertoptions(m, opt, phlen) ip_optcopy(ip, jp) ip_pcbopts(struct inpcb inp, int optname, struct mbuf *m) No functional changes in this commit. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 20:12:40 +00:00
Andre Oppermann	147f74d176	Purge layer specific mbuf flags on layer crossings to avoid confusing upper or lower layers. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 16:23:26 +00:00
Andre Oppermann	34333b16cd	Retire MT_HEADER mbuf type and change its users to use MT_DATA. Having an additional MT_HEADER mbuf type is superfluous and redundant as nothing depends on it. It only adds a layer of confusion. The distinction between header mbuf's and data mbuf's is solely done through the m->m_flags M_PKTHDR flag. Non-native code is not changed in this commit. For compatibility MT_HEADER is mapped to MT_DATA. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-02 13:46:32 +00:00
Andre Oppermann	b2828ad291	Implement IP_DONTFRAG IP socket option enabling the Don't Fragment flag on IP packets. Currently this option is only repected on udp and raw ip sockets. On tcp sockets the DF flag is controlled by the path MTU discovery option. Sending a packet larger than the MTU size of the egress interface returns an EMSGSIZE error. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005	2005-09-26 20:25:16 +00:00
Andre Oppermann	e0aec68255	Use the correct mbuf type for MGET().	2005-08-30 16:35:27 +00:00
Andre Oppermann	936cd18dad	Add socketoption IP_MINTTL. May be used to set the minimum acceptable TTL a packet must have when received on a socket. All packets with a lower TTL are silently dropped. Works on already connected/connecting and listening sockets for RAW/UDP/TCP. This option is only really useful when set to 255 preventing packets from outside the directly connected networks reaching local listeners on sockets. Allows userland implementation of 'The Generalized TTL Security Mechanism (GTSM)' according to RFC3682. Examples of such use include the Cisco IOS BGP implementation command "neighbor ttl-security". MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005	2005-08-22 16:13:08 +00:00
Robert Watson	a2dc1f5021	Add helper function ip_findmoptions(), which accepts an inpcb, and attempts to atomically return either an existing set of IP multicast options for the PCB, or a newlly allocated set with default values. The inpcb is returned locked. This function may sleep. Call ip_moptions() to acquire a reference to a PCB's socket options, and perform the update of the options while holding the PCB lock. Release the lock before returning. Remove garbage collection of multicast options when values return to the default, as this complicates locking substantially. Most applications allocate a socket either to be multicast, or not, and don't tend to keep around sockets that have previously been used for multicast, then used for unicast. This closes a number of race conditions involving multiple threads or processes modifying the IP multicast state of a socket simultaenously. MFC after: 7 days	2005-08-09 17:19:21 +00:00
Robert Watson	dd5a318ba3	Introduce in_multi_mtx, which will protect IPv4-layer multicast address lists, as well as accessor macros. For now, this is a recursive mutex due code sequences where IPv4 multicast calls into IGMP calls into ip_output(), which then tests for a multicast forwarding case. For support macros in in_var.h to check multicast address lists, assert that in_multi_mtx is held. Acquire in_multi_mtx around iteration over the IPv4 multicast address lists, such as in ip_input() and ip_output(). Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses, as well as over the manipulation of ifnet multicast address lists in order to keep the two layers in sync. Lock down accesses to IPv4 multicast addresses in IGMP, or assert the lock when performing IGMP join/leave events. Eliminate spl's associated with IPv4 multicast addresses, portions of IGMP that weren't previously expunged by IGMP locking. Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded lock order in WITNESS, in that order. Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 10 days	2005-08-03 19:29:47 +00:00
Robert Watson	3c308b091f	Eliminate MAC entry point mac_create_mbuf_from_mbuf(), which is redundant with respect to existing mbuf copy label routines. Expose a new mac_copy_mbuf() routine at the top end of the Framework and use that; use the existing mpo_copy_mbuf_label() routine on the bottom end. Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl)	2005-07-05 23:39:51 +00:00
Brooks Davis	fc74a9f93a	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam	2005-06-10 16:49:24 +00:00
Andre Oppermann	099dd0430b	Bring back the full packet destination manipulation for 'ipfw fwd' with the kernel compile time option: options IPFIREWALL_FORWARD_EXTENDED This option has to be specified in addition to IPFIRWALL_FORWARD. With this option even packets targeted for an IP address local to the host can be redirected. All restrictions to ensure proper behaviour for locally generated packets are turned off. Firewall rules have to be carefully crafted to make sure that things like PMTU discovery do not break. Document the two kernel options. PR: kern/71910 PR: kern/73129 MFC after: 1 week	2005-02-22 17:40:40 +00:00
Alan Cox	7258e9687b	Correctly move the packet header in ip_insertoptions(). Reported by: Anupam Chanda Reviewed by: sam@ MFC after: 2 weeks	2005-01-23 19:43:46 +00:00
Warner Losh	c398230b64	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
Robert Watson	74d4630b71	Remove an errant blank line apparently introduced in ip_output.c:1.194.	2004-12-25 22:59:42 +00:00
Robert Watson	89924e5865	Pass the inpcb reference into ip_getmoptions() rather than just the inp->inp_moptions pointer, so that ip_getmoptions() can perform necessary locking when doing non-atomic reads. Lock the inpcb by default to copy any data to local variables, then unlock before performing sooptcopyout(). MFC after: 2 weeks	2004-12-05 22:08:37 +00:00
Robert Watson	5c918b56d8	Push the inpcb argument into ip_setmoptions() when setting IP multicast socket options, so that it is available for locking.	2004-12-05 21:38:33 +00:00
Robert Watson	993d9505d4	Start working through inpcb locking for ip_ctloutput() by cleaning up modifications to the inpcb IP options mbuf: - Lock the inpcb before passing it into ip_pcbopts() in order to prevent simulatenous reads and read-modify-writes that could result in races. - Pass the inpcb reference into ip_pcbopts() instead of the option chain pointer in the inpcb. - Assert the inpcb lock in ip_pcbots. - Convert one or two uses of a pointer as a boolean or an integer comparison to a comparison with NULL for readability.	2004-12-05 19:11:09 +00:00
Max Laier	d6a8d58875	Add an additional struct inpcb * argument to pfil(9) in order to enable passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does not fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit. This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future. Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days LOR IDs: 14 - 17 (not fixed yet)	2004-09-29 04:54:33 +00:00
Andre Oppermann	f4fca2d8d3	Make comments more clear for the packet changed cases after pfil hooks.	2004-09-13 17:09:06 +00:00
John-Mark Gurney	cb459254a2	revert comment from rev1.158 now that rev1.225 backed it out.. MFC after: 3 days	2004-09-06 15:48:38 +00:00
Andre Oppermann	a9c92b54a9	In the case the destination of a packet was changed by the packet filter to point to a local IP address; and the packet was sourced from this host we fill in the m_pkthdr.rcvif with a pointer to the loopback interface. Before the function ifunit("lo0") was used to obtain the ifp. However this is sub-optimal from a performance point of view and might be dangerous if the loopback interface has been renamed. Use the global variable 'loif' instead which always points to the loopback interface. Submitted by: brooks	2004-08-27 15:39:34 +00:00
Andre Oppermann	c21fd23260	Always compile PFIL_HOOKS into the kernel and remove the associated kernel compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and thus it becomes a standard part of the network stack. If no hooks are connected the entire packet filter hooks section and related activities are jumped over. This removes any performance impact if no hooks are active. Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.	2004-08-27 15:16:24 +00:00
Max Laier	ca7a789aa6	Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel. Previously the early drop was disabled unconditionally for ALTQ-enabled kernels. This should give some benefit for the normal gateway + LAN-server case with a busy LAN leg and an ALTQ managed uplink. Reviewed and style help from: cperciva, pjd	2004-08-22 16:42:28 +00:00
Peter Wemm	1e5cc10dc2	Make the kernel compile again if you are not using PFIL_HOOKS	2004-08-18 00:37:46 +00:00
Andre Oppermann	9b932e9e04	Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different. However there are many changes how ipfw is and its add-on's are handled: In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in\|out]() in the ipfw PFIL handler. IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output(). ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it. DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection. BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS. More detailed changes to the code: conf/files Add netinet/ip_fw_pfil.c. conf/options Add IPFIREWALL_FORWARD option. modules/ipfw/Makefile Add ip_fw_pfil.c. net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well. netinet/ip_divert.c Removed divert_clone() function. It is no longer used. netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed. netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args. netinet/ip_fw2.c (Re)moved some global variables and the module handling. netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization. netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set. netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.) netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active. netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags. netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here. sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed. Approved by: re (scottl)	2004-08-17 22:05:54 +00:00
David Malone	1f44b0a1b5	Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD have already done this, so I have styled the patch on their work: 1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID. 2) named the sysctl net.inet.ip.random_id 3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns. The sysctl defaults to 0 (sequential IP IDs). Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months	2004-08-14 15:32:40 +00:00
Andre Oppermann	0b17fba7bc	Consistently use NULL for pointer comparisons.	2004-08-11 10:46:15 +00:00
Andre Oppermann	a5053398d4	Make a comment that "ipfw forward" is not SMP and PREEMPTION safe.	2004-08-09 16:16:10 +00:00
Andre Oppermann	81007fd4eb	o Delayed checksums are now calculated in divert_packet() for diverted packets Remove the XXX-escaped code that did it in ip_output()'s IPHACK section.	2004-08-03 14:13:36 +00:00
Robert Watson	a138d21769	In ip_ctloutput(), acquire the inpcb lock around some of the basic inpcb flag and status updates.	2004-06-24 02:05:47 +00:00
Max Laier	02b199f158	Link ALTQ to the build and break with ABI for struct ifnet. Please recompile your (network) modules as well as any userland that might make sense of sizeof(struct ifnet). This does not change the queueing yet. These changes will follow in a seperate commit. Same with the driver changes, which need case by case evaluation. __FreeBSD_version bump will follow. Tested-by: (i386)LINT	2004-06-13 17:29:10 +00:00
Maxim Konovalov	a49b21371a	o Calculate a number of bytes to copy (cnt) correctly: +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ \| \| \|C\| \| \| \| \| \| \| \| IP \|N\|O\|L\|P\| \| IP \| \| IP \| \| #1 \|O\|D\|E\|T\| \| #2 \| \| #n \| \| \|P\|E\|N\|R\| \| \| \| \| +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ ^ ^<---- cnt - (IPOPT_MINOFF - 1) ---->\| \| \| src \| +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr) \| dst +-- cp[IPOPT_OFF + 1] PR: kern/66386 Submitted by: Andrei Iltchenko MFC after: 3 weeks	2004-05-11 19:14:44 +00:00
Darren Reed	2f3f1e6773	Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier.	2004-05-02 15:10:17 +00:00
Darren Reed	ab884d993e	Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.	2004-05-02 06:36:30 +00:00
Luigi Rizzo	e6e51f0518	In an effort to simplify the routing code, try to deprecate rtalloc() in favour of rtalloc_ign(), which is what would end up being called anyways. There are 25 more instances of rtalloc() in net*/ and about 10 instances of rtalloc_ign()	2004-04-14 01:13:14 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Ruslan Ermilov	390cdc6a76	Fixed a bug in previous revision: compute the payload checksum before we convert ip_len into a network byte order; in_delayed_cksum() still expects it in host byte order. The symtom was the ``in_cksum_skip: out of data by %d'' complaints from the kernel. To add to the previous commit log. These fixes make tcpdump(1) happy by not complaining about UDP/TCP checksum being bad for looped back IP multicast when multicast router is deactivated. Reported by: Vsevolod Lobko	2004-04-07 10:01:39 +00:00
Ruslan Ermilov	26f16ebeb1	Untangle IP multicast routing interaction with delayed payload checksums. Compute the payload checksum for a locally originated IP multicast where God intended, in ip_mloopback(), rather than doing it in ip_output() and only when multicast router is active. This is more correct as we do not fool ip_input() that the packet has the correct payload checksum when in fact it does not (when multicast router is inactive). This is also more efficient if we don't join the multicast group we send to, thus allowing the hardware to checksum the payload.	2004-03-25 08:46:27 +00:00
Max Laier	4672d81921	Two minor follow-ups on the MT_TAG removal: ifp is now passed explicitly to ether_demux; no need to look it up again. Make mtag a global var in ip_input. Noticed by: rwatson Approved by: bms(mentor)	2004-03-02 14:37:23 +00:00
Max Laier	ac9d7e2618	Re-remove MT_TAGs. The problems with dummynet have been fixed now. Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam	2004-02-25 19:55:29 +00:00
Max Laier	36e8826ffb	Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is not working properly with the patch in place. Approved by: bms(mentor)	2004-02-18 00:04:52 +00:00
Hajimu UMEMOTO	70dbc6cbfc	don't update outgoing ifp, if ipsec tunnel mode encapsulation was not made. Obtained from: KAME	2004-02-16 17:05:06 +00:00
Max Laier	1094bdca51	This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag). This is (mostly) work from: sam Silence from: -arch Approved by: bms(mentor), sam, rwatson	2004-02-13 19:14:16 +00:00
Bruce M Simpson	1cfd4b5326	Initial import of RFC 2385 (TCP-MD5) digest support. This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net	2004-02-11 04:26:04 +00:00
Hajimu UMEMOTO	f073c60f73	pass pcb rather than so. it is expected that per socket policy works again.	2004-02-03 18:20:55 +00:00
Andre Oppermann	e0f630ea7a	Do not set the ip_id to zero when DF is set on packet and restore the general pre-randomid behaviour. Setting the ip_id to zero causes several problems with packet reassembly when a device along the path removes the DF bit for some reason. Other BSD and Linux have found and fixed the same issues. PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)	2004-01-08 11:13:40 +00:00
Andre Oppermann	97d8d152c2	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 20:07:39 +00:00
Andre Oppermann	26d02ca7ba	Remove RTF_PRCLONING from routing table and adjust users of it accordingly. The define is left intact for ABI compatibility with userland. This is a pre-step for the introduction of tcp_hostcache. The network stack remains fully useable with this change. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 19:47:31 +00:00
Andre Oppermann	02c1c7070e	Remove the global one-level rtcache variable and associated complex locking and rework ip_rtaddr() to do its own rtlookup. Adopt all its callers to this and make ip_output() callable with NULL rt pointer. Reviewed by: sam (mentor)	2003-11-14 21:48:57 +00:00
Andre Oppermann	9188b4a169	Introduce ip_fastforward and remove ip_flow. Short description of ip_fastforward: o adds full direct process-to-completion IPv4 forwarding code o handles ip fragmentation incl. hw support (ip_flow did not) o sends icmp needfrag to source if DF is set (ip_flow did not) o supports ipfw and ipfilter (ip_flow did not) o supports divert, ipfw fwd and ipfilter nat (ip_flow did not) o returns anything it can't handle back to normal ip_input Enable with sysctl -w net.inet.ip.fastforwarding=1 Reviewed by: sam (mentor)	2003-11-14 21:02:22 +00:00
Andre Oppermann	2683ceb661	Do not fragment a packet with hardware assistance if it has the DF bit set. Reviewed by: sam (mentor)	2003-11-12 23:35:40 +00:00
Sam Leffler	8484384564	assert optional inpcb is passed in locked Supported by: FreeBSD Foundation	2003-11-08 23:03:29 +00:00
Hajimu UMEMOTO	0f9ade718d	- cleanup SP refcnt issue. - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard\|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec__policy -> ipsec__pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy )pcb_sp, spp (struct secpolicy ), sp (struct secpolicy ) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets. Tested by: nork Obtained from: KAME	2003-11-04 16:02:05 +00:00
Robert Watson	3de758d3e3	Note that when ip_output() is called from ip_forward(), it will already have its options inserted, so the opt argument to ip_output() must be NULL.	2003-11-03 18:03:05 +00:00
Sam Leffler	d1dd20be6e	Locking for updates to routing table entries. Each rtentry gets a mutex that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself. Other/related changes: o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts Notes: 1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested. Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)	2003-10-04 03:44:50 +00:00
Sam Leffler	134ea22494	o update PFIL_HOOKS support to current API used by netbsd o revamp IPv4+IPv6+bridge usage to match API changes o remove pfil_head instances from protosw entries (no longer used) o add locking o bump FreeBSD version for 3rd party modules Heavy lifting by: "Max Laier" <max@love2party.net> Supported by: FreeBSD Foundation Obtained from: NetBSD (bits of pfil.h and pfil.c)	2003-09-23 17:54:04 +00:00
Mike Silbersack	3390d47670	Implement MBUF_STRESS_TEST mark II. Changes from the original implementation: - Fragmentation is handled by the function m_fragment, which can be called from whereever fragmentation is needed. Note that this function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing use. - m_fragment works slightly differently from the old fragmentation code in that it allocates a seperate mbuf cluster for each fragment. This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent fragments. While that is a nice feature in practice, it nerfed the usefulness of mbuf_stress_test. - Add two modes of random fragmentation. Chains with fragments all of the same random length and chains with fragments that are each uniquely random in length may now be requested.	2003-09-01 05:55:37 +00:00
Bruce M Simpson	8afa230470	Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on specific interfaces. This is required by aodvd, and may in future help us in getting rid of the requirement for BPF from our import of isc-dhcp. Suggested by: fenestro Obtained from: BSD/OS Reviewed by: mini, sam Approved by: jake (mentor)	2003-08-20 14:46:40 +00:00
Jeffrey Hsu	1e78ac216e	1. Basic PIM kernel support Disabled by default. To enable it, the new "options PIM" must be added to the kernel configuration file (in addition to MROUTING): options MROUTING # Multicast routing options PIM # Protocol Independent Multicast 2. Add support for advanced multicast API setup/configuration and extensibility. 3. Add support for kernel-level PIM Register encapsulation. Disabled by default. Can be enabled by the advanced multicast API. 4. Implement a mechanism for "multicast bandwidth monitoring and upcalls". Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-07 18:16:59 +00:00
Mike Silbersack	7dc7f0311e	Minor fix to the MBUF_STRESS_TEST code so that it keeps pkthdr.len consistant at all times. (Some debugging code I'm working on is tripped otherwise.) MFC after: 3 days	2003-07-19 05:50:32 +00:00
Garrett Wollman	6e49b1fe55	Don't generate an ip_id for packets with the DF bit set; ip_id is only meaningful for fragments. Also don't bother to byte-swap the ip_id when we do generate it; it is only used at the receiver as a nonce. I tried several different permutations of this code with no measurable difference to each other or to the unmodified version, so I've settled on the one for which gcc seems to generate the best code. (If anyone cares to microoptimize this differently for an architecture where it actually matters, feel free.) Suggested by: Steve Bellovin's paper in IMW'02	2003-05-31 17:55:21 +00:00
Matthew N. Dodd	4957466b8e	IP_RECVTTL socket option. Reviewed by: Stuart Cheshire <cheshire@apple.com>	2003-04-29 21:36:18 +00:00
Mike Silbersack	53dcc544a8	Rename MBUF_FRAG_TEST to MBUF_STRESS_TEST as it will be extended to include more than just frag tests.	2003-04-12 06:11:46 +00:00
Dag-Erling Smørgrav	fe58453891	Introduce an M_ASSERTPKTHDR() macro which performs the very common task of asserting that an mbuf has a packet header. Use it instead of hand- rolled versions wherever applicable. Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-04-08 14:25:47 +00:00
Dag-Erling Smørgrav	212059bd83	Replace memcpy() and ovbcopy() with bcopy(); ditch some caddr_t usage.	2003-04-04 12:14:00 +00:00
Matthew N. Dodd	2c56e246fa	Back out support for RFC3514. RFC3514 poses an unacceptale risk to compliant systems.	2003-04-02 20:14:44 +00:00
Matthew N. Dodd	4f6425f7ae	- Use the correct constant define. - Add a missing break.	2003-04-02 18:02:58 +00:00
Matthew N. Dodd	8faf6df9b3	Sync constant define with NetBSD. Requested by: Tom Spindler <dogcow@babymeat.com>	2003-04-02 10:28:47 +00:00
Matthew N. Dodd	09139a4537	Implement support for RFC 3514 (The Security Flag in the IPv4 Header). (See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt) This fulfills the host requirements for userland support by way of the setsockopt() IP_EVIL_INTENT message. There are three sysctl tunables provided to govern system behavior. net.inet.ip.rfc3514: Enables support for rfc3514. As this is an Informational RFC and support is not yet widespread this option is disabled by default. net.inet.ip.hear_no_evil If set the host will discard all received evil packets. net.inet.ip.speak_no_evil If set the host will discard all transmitted evil packets. The IP statistics counter 'ips_evil' (available via 'netstat') provides information on the number of 'evil' packets recieved. For reference, the '-E' option to 'ping' has been provided to demonstrate and test the implementation.	2003-04-01 08:21:44 +00:00
Maxime Henrion	511e01e2d6	Try to make the MBUF_FRAG_TEST code work better. - Don't try to fragment the packet if it's smaller than mbuf_frag_size. - Preserve the size of the mbuf chain which is modified by m_split(). - Check that m_split() didn't return NULL. - Make it so we don't end up with two M_PKTHDR mbuf in the chain. - Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole chain and not just the first mbuf. - Fix a nearby style bug and rework the logic of the loops so that it's more clear. This is still not quite right, because we're clearly abusing m_split() to do something it was not designed for, but at least it works now. We should probably move this code into a m_fragment() function when it's correct.	2003-03-25 23:49:14 +00:00
Mike Silbersack	9d9edc5693	Add the MBUF_FRAG_TEST option. When compiled in, this option allows you to tell ip_output to fragment all outgoing packets into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes. This is an excellent way to test if network drivers can properly handle long mbuf chains being passed to them. net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation) so that you can at least boot before your network driver dies. :)	2003-03-25 05:45:05 +00:00
Jonathan Lemon	8608c4c1f9	Remove unused variables in the IPSEC case. Submitted by: Lars Eggert <larse@ISI.EDU>	2003-02-20 18:22:21 +00:00
Jonathan Lemon	340c35de6a	Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs	2003-02-19 22:32:43 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Sam Leffler	9359ad861e	FAST_IPSEC bandaid: act like KAME and ignore ENOENT error codes from ipsec4_process_packet; they happen when a packet is dropped because an SA acquire is initiated Submitted by: Doug Ambrisko <ambrisko@verniernetworks.com>	2003-01-30 05:45:45 +00:00

1 2 3 4 5 ...

322 Commits