freebsd-skq

Author	SHA1	Message	Date
Robert Watson	5e7ce4785f	Modify the mac_init_ipq() MAC Framework entry point to accept an additional flags argument to indicate blocking disposition, and pass in M_NOWAIT from the IP reassembly code to indicate that blocking is not OK when labeling a new IP fragment reassembly queue. This should eliminate some of the WITNESS warnings that have started popping up since fine-grained IP stack locking started going in; if memory allocation fails, the creation of the fragment queue will be aborted. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-03-26 15:12:03 +00:00
Maxime Henrion	511e01e2d6	Try to make the MBUF_FRAG_TEST code work better. - Don't try to fragment the packet if it's smaller than mbuf_frag_size. - Preserve the size of the mbuf chain which is modified by m_split(). - Check that m_split() didn't return NULL. - Make it so we don't end up with two M_PKTHDR mbuf in the chain. - Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole chain and not just the first mbuf. - Fix a nearby style bug and rework the logic of the loops so that it's more clear. This is still not quite right, because we're clearly abusing m_split() to do something it was not designed for, but at least it works now. We should probably move this code into a m_fragment() function when it's correct.	2003-03-25 23:49:14 +00:00
Mike Silbersack	9d9edc5693	Add the MBUF_FRAG_TEST option. When compiled in, this option allows you to tell ip_output to fragment all outgoing packets into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes. This is an excellent way to test if network drivers can properly handle long mbuf chains being passed to them. net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation) so that you can at least boot before your network driver dies. :)	2003-03-25 05:45:05 +00:00
Maxime Henrion	aecfcdb824	Use __packed instead of __attribute__((__packed__)).	2003-03-22 00:25:14 +00:00
Matthew N. Dodd	57842a38fd	Add a sysctl node allowing the specification of an address mask to use when replying to ICMP Address Mask Request packets.	2003-03-21 15:43:06 +00:00
Matthew N. Dodd	21150298bb	Add comments regarding the ICMP timestamp fields.	2003-03-21 15:28:10 +00:00
Crist J. Clark	010dabb047	Add a 'verrevpath' option that verifies the interface that a packet comes in on is the same interface that we would route out of to get to the packet's source address. Essentially automates an anti-spoofing check using the information in the routing table. Experimental. The usage and rule format for the feature may still be subject to change.	2003-03-15 01:13:00 +00:00
Jeffrey Hsu	7792ea2700	Greatly simplify the unlocking logic by holding the TCP protocol lock until after FIN_WAIT_2 processing. Helped with debugging: Doug Barton	2003-03-13 11:46:57 +00:00
Jeffrey Hsu	da3a8a1a4f	Add support for RFC 3390, which allows for a variable-sized initial congestion window.	2003-03-13 01:43:45 +00:00
Jeffrey Hsu	582a954b00	Implement the Limited Transmit algorithm (RFC 3042).	2003-03-12 20:27:28 +00:00
Sam Leffler	4a692a1fc2	correct two more flag misuses; m_tag* use malloc flags	2003-03-12 14:45:22 +00:00
Jonathan Lemon	a3b6edc353	Remove check for t_state == TCPS_TIME_WAIT and introduce the tw structure. Sponsored by: DARPA, NAI Labs	2003-03-08 22:07:52 +00:00
Jonathan Lemon	607b0b0cc9	Remove a panic(); if the zone allocator can't provide more timewait structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs	2003-03-08 22:06:20 +00:00
Peter Wemm	3c6b084e96	Finish driving a stake through the heart of netns and the associated ifdefs scattered around the place - its dead Jim! The SMB stuff had stolen AF_NS, make it official.	2003-03-05 19:24:24 +00:00
Jonathan Lemon	1cafed3941	Update netisr handling; Each SWI now registers its queue, and all queue drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off. Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs	2003-03-04 23:19:55 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Jonathan Lemon	272c5dfe93	In timewait state, if the incoming segment is a pure in-sequence ack that matches snd_max, then do not respond with an ack, just drop the segment. This fixes a problem where a simultaneous close results in an ack loop between two time-wait states. Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG> Sponsored by: DARPA, NAI Labs	2003-02-26 18:20:41 +00:00
Jonathan Lemon	ef6b48deb9	The TCP protocol lock may still be held if the reassembly queue dropped FIN. Detect this case and drop the lock accordingly. Sponsored by: DARPA, NAI Labs	2003-02-26 13:55:13 +00:00
Mike Silbersack	a75a485d62	Fix a condition so that ip reassembly queues are emptied immediately when maxfragpackets is dropped to 0. Noticed by: bmah	2003-02-26 07:28:35 +00:00
Robert Watson	9327ee33bf	When generating a TCP response to a connection, not only test if the tcpcb is NULL, but also its connected inpcb, since we now allow elements of a TCP connection to hang around after other state, such as the socket, has been recycled. Tested by: dcs Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-02-25 14:08:41 +00:00
Maxim Konovalov	b36f5b3735	style(9): join lines.	2003-02-25 11:53:11 +00:00
Maxim Konovalov	99e8617d24	Ip reassembly queue structure has ipq_nfrags now. Count a number of dropped ip fragments precisely. Reviewed by: silby	2003-02-25 11:49:01 +00:00
Jeffrey Hsu	edf02ff15d	Hold the TCP protocol lock while modifying the connection hash table.	2003-02-25 01:32:03 +00:00
Mike Silbersack	af9c7d06d5	Fix a comment which didn't match the new cookie behavior. Submitted by: Scott Renfro <scott@renfro.org> MFC after: 1 day	2003-02-24 03:15:48 +00:00
Jeffrey Hsu	11a20fb8b6	tcp_twstart() need to be called with the TCP protocol lock held to avoid a race condition with the TCP timer routines.	2003-02-24 00:52:03 +00:00
Jeffrey Hsu	2fbef91887	Pass the right function to callout_reset() for a compressed TIME-WAIT control block.	2003-02-24 00:48:12 +00:00
Mike Silbersack	a432399c56	Improve the security and performance of syncookies: Security improvements: - Increase the size of each syncookie secret from 32 to 128 bits in order to make brute force attacks on the secrets much more difficult. - Always return the lowest order dword from the MD5 hash; this allows us to expose 2 more bits of the cookie and makes ACK floods which seek to guess the cookie value more difficult. Performance improvements: - Increase the lifetime of each syncookie from 4 seconds to 16 seconds. This increases the usefulness of syncookies during an attack. - From Yahoo!: Reduce the number of calls to MD5Update; this results in a ~17% increase in cookie generation time here. Reviewed by: hsu, jayanth, jlemon, nectar MFC After: 15 seconds	2003-02-23 19:04:23 +00:00
Jonathan Lemon	f243998be5	Yesterday just wasn't my day. Remove testing delta that crept into the diff. Pointy hat provided by: sam	2003-02-23 15:40:36 +00:00
Sam Leffler	14dd6717f8	Add a new config option IPSEC_FILTERGIF to control whether or not packets coming out of a GIF tunnel are re-processed by ipfw, et. al. By default they are not reprocessed. With the option they are. This reverts 1.214. Prior to that change packets were not re-processed. After they were which caused problems because packets do not have distinguishing characteristics (like a special network if) that allows them to be filtered specially. This is really a stopgap measure designed for immediate MFC so that 4.8 has consistent handling to what was in 4.7. PR: 48159 Reviewed by: Guido van Rooij <guido@gvr.org> MFC after: 1 day	2003-02-23 00:47:06 +00:00
Jonathan Lemon	a14c749f04	Check to see if the TF_DELACK flag is set before returning from tcp_input(). This unbreaks delack handling, while still preserving correct T/TCP behavior Tested by: maxim Sponsored by: DARPA, NAI Labs	2003-02-22 21:54:57 +00:00
Mike Silbersack	375386e284	Add the ability to limit the number of IP fragments allowed per packet, and enable it by default, with a limit of 16. At the same time, tweak maxfragpackets downward so that in the worst possible case, IP reassembly can use only 1/2 of all mbuf clusters. MFC after: 3 days Reviewed by: hsu Liked by: bmah	2003-02-22 06:41:47 +00:00
Poul-Henning Kamp	d25ecb917b	- m = m_gethdr(M_NOWAIT, MT_HEADER); + m = m_gethdr(M_DONTWAIT, MT_HEADER); 'nuff said.	2003-02-21 23:17:12 +00:00
Crist J. Clark	b0d226932e	The ancient and outdated concept of "privileged ports" in UNIX-type OSes has probably caused more problems than it ever solved. Allow the user to retire the old behavior by specifying their own privileged range with, net.inet.ip.portrange.reservedhigh default = IPPORT_RESERVED - 1 net.inet.ip.portrange.reservedlo default = 0 Now you can run that webserver without ever needing root at all. Or just imagine, an ftpd that can really drop privileges, rather than just set the euid, and still do PORT data transfers from 20/tcp. Two edge cases to note, # sysctl net.inet.ip.portrange.reservedhigh=0 Opens all ports to everyone, and, # sysctl net.inet.ip.portrange.reservedhigh=65535 Locks all network activity to root only (which could actually have been achieved before with ipfw(8), but is somewhat more complicated). For those who stick to the old religion that 0-1023 belong to root and root alone, don't touch the knobs (or even lock them by raising securelevel(8)), and nothing changes.	2003-02-21 05:28:27 +00:00
Jonathan Lemon	8608c4c1f9	Remove unused variables in the IPSEC case. Submitted by: Lars Eggert <larse@ISI.EDU>	2003-02-20 18:22:21 +00:00
Jonathan Lemon	ffae8c5a7e	Unbreak non-IPV6 compilation. Caught by: phk Sponsored by: DARPA, NAI Labs	2003-02-19 23:43:04 +00:00
Jonathan Lemon	340c35de6a	Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs	2003-02-19 22:32:43 +00:00
Jonathan Lemon	7990938421	Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs	2003-02-19 22:18:06 +00:00
Jonathan Lemon	414462252a	Correct comments.	2003-02-19 21:33:46 +00:00
Jonathan Lemon	3bfd6421c2	Clean up delayed acks and T/TCP interactions: - delay acks for T/TCP regardless of delack setting - fix bug where a single pass through tcp_input might not delay acks - use callout_active() instead of callout_pending() Sponsored by: DARPA, NAI Labs	2003-02-19 21:18:23 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Maxim Konovalov	b52d5ea3d2	o Fix ipfw uid rules: socheckuid() returns 0 when uid matches a socket cr_uid. Note: we do not have socheckuid() in RELENG_4, ip_fw2.c uses its own macro for a similar purpose that is why ipfw2 in RELENG_4 processes uid rules correctly. I will MFC the diff for code consistency. Reported by: Oleg Baranov <ol@csa.ru> Reviewed by: luigi MFC after: 1 month	2003-02-17 13:39:57 +00:00
Jeffrey Hsu	4b40c56c28	Take advantage of pre-existing lock-free synchronization and type stable memory to avoid acquiring SMP locks during expensive copyout process.	2003-02-15 02:37:57 +00:00
Jeffrey Hsu	85e8b24343	The protocol lock is always held in the dropafterack case, so we don't need to check for it at runtime.	2003-02-13 22:14:22 +00:00
Jeffrey Hsu	3dc7ebf9ff	in_pcbnotifyall() requires an exclusive protocol lock for notify functions which modify the connection list, namely, tcp_notify().	2003-02-12 23:55:07 +00:00
Jeffrey Hsu	6d45d64a8f	Properly document that syncache timer processing requires an exclusive TCP protocol lock.	2003-02-12 00:42:12 +00:00
Seigo Tanimura	cd6c2a8874	s/IPSSEC/IPSEC/	2003-02-11 10:51:56 +00:00
Jeffrey Hsu	24652ff6e1	Get cosmetic changes out of the way before I add routing table SMP locks.	2003-02-10 22:01:34 +00:00
Orion Hodson	022695f82a	Avoid multiply for preemptive arp calculation since it hits every ethernet packet sent. Prompted by: Jeffrey Hsu <hsu@FreeBSD.org>	2003-02-08 15:05:15 +00:00
Orion Hodson	73224fb019	MFS 1.64.2.22: Re-enable non pre-emptive ARP requests. Submitted by: "Diomidis Spinellis" <dds@aueb.gr> PR: kern/46116	2003-02-04 05:28:08 +00:00
Crist J. Clark	39eb27a4a9	Add the TCP flags to the log message whenever log_in_vain is 1, not just when set to 2. PR: kern/43348 MFC after: 5 days	2003-02-02 22:06:56 +00:00
Mike Silbersack	ecf44c01f4	Move a comment and optimize the frag timeout code a slight bit. Submitted by: maxim MFC with: The previous two revisions	2003-02-01 05:59:51 +00:00
Sam Leffler	9359ad861e	FAST_IPSEC bandaid: act like KAME and ignore ENOENT error codes from ipsec4_process_packet; they happen when a packet is dropped because an SA acquire is initiated Submitted by: Doug Ambrisko <ambrisko@verniernetworks.com>	2003-01-30 05:45:45 +00:00
Sam Leffler	28a34902c4	remove the restriction on build a kernel with FAST_IPSEC and INET6; you still don't want to use the two together, but it's ok to have them in the same kernel (the problem that initiated this bandaid has long since been fixed)	2003-01-30 05:43:08 +00:00
Mike Silbersack	d4d5315c23	Fix a bug with syncookies; previously, the syncache's MSS size was not initialized until after a syncookie was generated. As a result, all connections resulting from a returned cookie would end up using a MSS of ~512 bytes. Now larger packets will be used where possible. MFC after: 5 days	2003-01-29 03:49:49 +00:00
Poul-Henning Kamp	4ee6e70ef3	Check bounds for index before dereferencing memory past end of array. Found by: FlexeLint	2003-01-28 22:44:12 +00:00
Jeffrey Hsu	93f798891a	Avoid lock order reversal by expanding the scope of the AF_INET radix tree lock to cover the ARP data structures.	2003-01-28 20:22:19 +00:00
Mike Silbersack	ac64c8668b	A few fixes to rev 1.221 - Honor the previous behavior of maxfragpackets = 0 or -1 - Take a better stab at fragment statistics - Move / correct a comment Suggested by: maxim@ MFC after: 7 days	2003-01-28 03:39:39 +00:00
Mike Silbersack	402062e80c	Merge the best parts of maxfragpackets and maxnipq together. (Both functions implemented approximately the same limits on fragment memory usage, but in different fashions.) End user visible changes: - Fragment reassembly queues are freed in a FIFO manner when maxfragpackets has been reached, rather than all reassembly stopping. MFC after: 5 days	2003-01-26 01:44:05 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Maxim Konovalov	2adf7582da	De-anonymity a couple of messages I missed in a previous sweep. Move one of them under DEB macro. Noticed by: Wiktor Niesiobedzki <w@evip.pl>	2003-01-20 13:03:34 +00:00
Maxim Konovalov	8ec22a9363	If the first action is O_LOG adjust a pointer to the real one, unbreaks skipto + log rules. Reported by: Wiktor Niesiobedzki <w@evip.pl> MFC after: 1 week	2003-01-20 11:58:34 +00:00
Jeffrey Hsu	314e5a3daf	Optimize away call to bzero() in the common case by directly checking if a connection has any cached TAO information.	2003-01-18 19:03:26 +00:00
Jeffrey Hsu	f5c5746047	Fix long-standing bug predating FreeBSD where calling connect() twice on a raw ip socket will crash the system with a null-dereference.	2003-01-18 01:10:55 +00:00
Jeffrey Hsu	c996428c32	SMP locking for ARP.	2003-01-17 07:59:35 +00:00
Matthew Dillon	fe41ca530c	Introduce the ability to flag a sysctl for operation at secure level 2 or 3 in addition to secure level 1. The mask supports up to a secure level of 8 but only add defines through CTLFLAG_SECURE3 for now. As per the missif in the log entry for 1.11 of ip_fw2.c which added the secure flag to the IPFW sysctl's in the first place, change the secure level requirement from 1 to 3 now that we have support for it. Reviewed by: imp With Design Suggestions by: imp	2003-01-14 19:35:33 +00:00
Jeffrey Hsu	cb942153c8	Fix NewReno. Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>	2003-01-13 11:01:20 +00:00
Thomas Moestl	a9a7a91220	Clear the target hardware address field when generating an ARP request. Reviewed by: nectar MFC after: 1 week	2003-01-10 00:04:53 +00:00
Jeffrey Hsu	b21bf9a59b	Validate inp before de-referencing it. Submitted by: pb	2003-01-05 07:56:24 +00:00
Jens Schweikhardt	9d5abbddbf	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
Sam Leffler	9967cafc49	Correct mbuf packet header propagation. Previously, packet headers were sometimes propagated using M_COPY_PKTHDR which actually did something between a "move" and a "copy" operation. This is replaced by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it from the source mbuf) and m_dup_pkthdr which copies the packet header contents including any m_tag chain. This corrects numerous problems whereby mbuf tags could be lost during packet manipulations. These changes also introduce arguments to m_tag_copy and m_tag_copy_chain to specify if the tag copy work should potentially block. This introduces an incompatibility with openbsd which we may want to revisit. Note that move/dup of packet headers does not handle target mbufs that have a cluster bound to them. We may want to support this; for now we watch for it with an assert. Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG\|M_LASTFRAG. Supported by: Vernier Networks Reviewed by: Robert Watson <rwatson@FreeBSD.org>	2002-12-30 20:22:40 +00:00
Matthew Dillon	07fd333df3	Remove the PAWS ack-on-ack debugging printf(). Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of order / reverse-time-indexed packet should be acknowledged as specified in RFC-793 page 69 then dropped. The original PAWS code in FreeBSD (1994) simply acknowledged the segment unconditionally, which is incorrect, and was fixed in 1.183 (2002). At the moment we do not do checks for SYN or FIN in addition to (tlen != 0), which may or may not be correct, but the worst that ought to happen should be a retry by the sender.	2002-12-30 19:31:04 +00:00
Sam Leffler	069f35d328	correct style bogons	2002-12-30 18:45:31 +00:00
Ian Dowse	ed1a13b18f	Bridged packets are supplied to the firewall with their IP header in network byte order, but icmp_error() expects the IP header to be in host order and the code here did not perform the necessary swapping for the bridged case. This bug causes an "icmp_error: bad length" panic when certain length IP packets (e.g. ip_len == 0x100) are rejected by the firewall with an ICMP response. MFC after: 3 days	2002-12-27 17:43:25 +00:00
Jeffrey Hsu	abe239cfe2	Validate inp to prevent an use after free.	2002-12-24 21:00:31 +00:00
Maxim Konovalov	f4ef616f98	o De-anonymity dummynet(4) and ipfw(4) messages, prepend them by 'dummynet: ' and 'ipfw: ' prefixes. PR: kern/41609	2002-12-24 13:45:24 +00:00
Jeffrey Hsu	956b0b653c	SMP locking for radix nodes.	2002-12-24 03:03:39 +00:00
Pierre Beyssac	1ba7727b9e	Remove forgotten INP_UNLOCK(inp) in my previous commit. Reported by: hsu	2002-12-22 13:04:08 +00:00
Pierre Beyssac	87cd4001b5	In syncache_timer(), don't attempt to lock the inpcb structure associated with the syncache entry: in case tcp_close() has been called on the corresponding listening socket, the lock has been destroyed as a side effect of in_pcbdetach(), causing a panic when we attempt to lock on it. Reviewed by: hsu	2002-12-21 19:59:47 +00:00
Sam Leffler	00f21882a0	replace the special-purpose rate-limiting code with the general facility just added; this tries to maintain the same behaviour vis a vis printing the rate-limiting messages but need tweaking	2002-12-21 00:08:20 +00:00
Jeffrey Hsu	9a39fc9d73	Eliminate a goto. Fix some line breaks.	2002-12-20 11:24:02 +00:00
Jeffrey Hsu	540e8b7e31	Unravel a nested conditional. Remove an unneeded local variable.	2002-12-20 11:16:52 +00:00
Jeffrey Hsu	f320a1bfd2	Expand scope of TCP protocol lock to cover syncache data structures.	2002-12-20 00:24:19 +00:00
Bosko Milekic	86fea6be59	o Untangle the confusion with the malloc flags {M_WAITOK, M_NOWAIT} and the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}. o Fix a bpf_compat issue where malloc() was defined to just call bpf_alloc() and pass the 'canwait' flag(s) along. It's been changed to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT flag (and only one of those two). Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)	2002-12-19 22:58:27 +00:00
Jeffrey Hsu	19fc74fb60	Lock up ifaddr reference counts.	2002-12-18 11:46:59 +00:00
Poul-Henning Kamp	11aee0b4b0	Remove unused and incorrectly maintained variable "in_interfaces"	2002-12-17 19:30:04 +00:00
Matthew Dillon	967adce8df	Fix syntax in last commit.	2002-12-17 00:24:48 +00:00
Maxim Konovalov	616fa7460c	o Trim EOL whitespaces. MFC after: 1 week	2002-12-15 10:24:36 +00:00
Maxim Konovalov	21ef23ab3f	o s/if_name[16]/if_name[IFNAMSIZ]/ Reviewed by: luigi MFC after: 1 week	2002-12-15 10:23:02 +00:00
Maxim Konovalov	2713a5bebb	o M_DONTWAIT is mbuf(9) flag: malloc(M_DONTWAIT) -> malloc(M_NOWAIT). The bug does not affect anything because M_NOWAIT == M_DONTWAIT. Reviewed by: luigi MFC after: 1 week	2002-12-15 10:21:30 +00:00
Maxim Konovalov	83b75b7621	o Fix byte order logging issue: sa.sin_port is already in host byte order. PR: kern/45964 Submitted by: Sascha Blank <sblank@tiscali.de> Reviewed by: luigi MFC after: 1 week	2002-12-15 09:44:02 +00:00
Matthew Dillon	d7ff8ef62a	Change tcp.inflight_min from 1024 to a production default of 6144. Create a sysctl for the stabilization value for the bandwidth delay product (inflight) algorithm and document it. MFC after: 3 days	2002-12-14 21:00:17 +00:00
Matthew Dillon	1ab4789dc2	Bruce forwarded this tidbit from an analysis Van Jacobson did on an apparent ack-on-ack problem with FreeBSD. Prof. Jacobson noticed a case in our TCP stack which would acknowledge a received ack-only packet, which is not legal in TCP. Submitted by: Van Jacobson <van@packetdesign.com>, bmah@packetdesign.com (Bruce A. Mah) MFC after: 7 days	2002-12-14 07:31:51 +00:00
Maxim Sobolev	16199bf2d3	MFS: recognize gre packets used in the WCCP protocol. Approved by: re	2002-12-07 14:22:05 +00:00
Luigi Rizzo	97850a5dd9	Move fw_one_pass from ip_fw2.c to ip_input.c so that neither bridge.c nor if_ethersubr.c depend on IPFIREWALL. Restore the use of fw_one_pass in if_ethersubr.c ipfw.8 will be updated with a separate commit. Approved by: re	2002-11-20 19:07:27 +00:00
Luigi Rizzo	032dcc7680	Back out some style changes. They are not urgent, I will put them back in after 5.0 is out. Requested by: sam Approved by: re	2002-11-20 19:00:54 +00:00
Luigi Rizzo	b375c9ec2c	Back out the ip_fragment() code -- it is not urgent to have it in now, I will put it back in in a better form after 5.0 is out. Requested by: sam, rwatson, luigi (on second thought) Approved by: re	2002-11-20 18:56:25 +00:00
Mike Silbersack	df285b3d1d	Add a sysctl to control the generation of source quench packets, and set it to 0 by default. Partially obtained from: NetBSD Suggested by: David Gilbert MFC after: 5 days	2002-11-19 17:06:06 +00:00
Luigi Rizzo	9b77fbf0a2	Fix function headers and remove 'register' variable declarations.	2002-11-17 17:04:19 +00:00
Luigi Rizzo	3e372e140c	Move the ip_fragment code from ip_output() to a separate function, so that it can be reused elsewhere (there is a number of places where it can be useful). This also trims some 200 lines from the body of ip_output(), which helps readability a bit. (This change was discussed a few weeks ago on the mailing lists, Julian agreed, silence from others. It is not a functional change, so i expect it to be ok to commit it now but i am happy to back it out if there are objections). While at it, fix some function headers and replace m_copy() with m_copypacket() where applicable. MFC after: 1 week	2002-11-17 16:30:44 +00:00
Luigi Rizzo	20fab86349	Minor documentation changes and indentation fix. Replace m_copy() with m_copypacket() where applicable. While at it, fix some function headers and remove 'register' from variable declarations.	2002-11-17 16:13:08 +00:00
Luigi Rizzo	4e8fe3210d	Cleanup some of the comments, and reformat long lines. Replace m_copy() with m_copypacket() where applicable. Replace "if (a.s_addr ...)" with "if (a.s_addr != INADDR_ANY ...)" to make it clear what the code means. While at it, fix some function headers and remove 'register' from variable declarations. MFC after: 3 days	2002-11-17 16:02:17 +00:00
Luigi Rizzo	bbb4330b61	Massive cleanup of the ip_mroute code. No functional changes, but: + the mrouting module now should behave the same as the compiled-in version (it did not before, some of the rsvp code was not loaded properly); + netinet/ip_mroute.c is now truly optional; + removed some redundant/unused code; + changed many instances of '0' to NULL and INADDR_ANY as appropriate; + removed several static variables to make the code more SMP-friendly; + fixed some minor bugs in the mrouting code (mostly, incorrect return values from functions). This commit is also a prerequisite to the addition of support for PIM, which i would like to put in before DP2 (it does not change any of the existing APIs, anyways). Note, in the process we found out that some device drivers fail to properly handle changes in IFF_ALLMULTI, leading to interesting behaviour when a multicast router is started. This bug is not corrected by this commit, and will be fixed with a separate commit. Detailed changes: -------------------- netinet/ip_mroute.c all the above. conf/files make ip_mroute.c optional net/route.c fix mrt_ioctl hook netinet/ip_input.c fix ip_mforward hook, move rsvp_input() here together with other rsvp code, and a couple of indentation fixes. netinet/ip_output.c fix ip_mforward and ip_mcast_src hooks netinet/ip_var.h rsvp function hooks netinet/raw_ip.c hooks for mrouting and rsvp functions, plus interface cleanup. netinet/ip_mroute.h remove an unused and optional field from a struct Most of the code is from Pavlin Radoslavov and the XORP project Reviewed by: sam MFC after: 1 week	2002-11-15 22:53:53 +00:00
Sam Leffler	eec3a0b17f	track changes to not strip the Ethernet header from input packets Reviewed by: many Approved by: re	2002-11-14 23:46:04 +00:00
Sam Leffler	ccb2acfe1b	track bpf changes Reviewed by: many Approved by: re	2002-11-14 23:45:13 +00:00
Maxim Konovalov	8ef1565d2b	Due to a memory alignment sizeof(struct ipfw_flow_id) is bigger than ipfw_flow_id structure actual size and bcmp(3) may fail to compare them properly. Compare members of these structures instead. PR: kern/44078 Submitted by: Oleg Bulyzhin <oleg@rinet.ru> Reviewed by: luigi MFC after: 2 weeks	2002-11-13 11:31:44 +00:00
Jeffrey Hsu	e1e1b6e892	Turn off duplicate lock checking for inp locks because udp_input() intentionally locks two inp records simultaneously.	2002-11-12 20:44:38 +00:00
Sam Leffler	6f0d017cf4	a better solution to building FAST_IPSEC w/o INET6 Submitted by: Jeffrey Hsu <hsu@FreeBSD.org>	2002-11-10 17:17:32 +00:00
Alfred Perlstein	29f194457c	Fix instances of macros with improperly parenthasized arguments. Verified by: md5	2002-11-09 12:55:07 +00:00
Sam Leffler	9c0a8ace11	temporarily disallow FAST_IPSEC and INET6 to avoid potential panics; will correct this before 5.0 release	2002-11-08 23:50:32 +00:00
Sam Leffler	e8539d32f0	FAST_IPSEC fixups: o fix #ifdef typo o must use "bounce functions" when dispatched from the protosw table don't know how this stuff was missed in my testing; must've committed the wrong bits Pointy hat: sam Submitted by: "Doug Ambrisko" <ambrisko@verniernetworks.com>	2002-11-08 23:37:50 +00:00
Sam Leffler	58fcadfc0f	fixup FAST_IPSEC build w/o INET6	2002-11-08 23:33:59 +00:00
Sam Leffler	ab94ca3cec	correct fast ipsec logic: compare destination ip address against the contents of the SA, not the SP Submitted by: "Doug Ambrisko" <ambrisko@verniernetworks.com>	2002-11-08 23:11:02 +00:00
John Baldwin	2d4e26522d	Cast a ptrdiff_t to an int to printf.	2002-11-08 14:52:26 +00:00
Jeff Roberson	1645d0903e	- Consistently update snd_wl1, snd_wl2, and rcv_up in the header prediction code. Previously, 2GB worth of header predicted data could leave these variables too far out of sequence which would cause problems after receiving a packet that did not match the header prediction. Submitted by: Bill Baumann <bbaumann@isilon.com> Sponsored by: Isilon Systems, Inc. Reviewed by: hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com	2002-10-31 23:24:13 +00:00
Jeffrey Hsu	30613f5610	Don't need to check if SO_OOBINLINE is defined. Don't need to protect isipv6 conditional with INET6. Fix leading indentation in 2 lines.	2002-10-30 08:32:19 +00:00
Bill Fenner	4d3ffc9841	Renumber IPPROTO_DIVERT out of the range of valid IP protocol numbers. This allows socket() to return an error when the kernel is not built with IPDIVERT, and doesn't prevent future applications from using the "borrowed" IP protocol number. The sysctl net.inet.raw.olddiverterror controls whether opening a socket with the "borrowed" IP protocol fails with an accompanying kernel printf; this code should last only a couple of releases. Approved by: re	2002-10-29 16:46:13 +00:00
Maxim Konovalov	a98d88ad3e	Lower a priority of "session drop" messages. Requested by: Eugene Grosbein <eugen@kuzbass.ru> MFC after: 3 days	2002-10-29 08:53:14 +00:00
Maxime Henrion	d28e8b3a0d	Oops, forgot to commit this file. This is part of the fix for ipfw2 panics on sparc64.	2002-10-24 22:32:13 +00:00
Maxime Henrion	7c697970f4	Fix ipfw2 panics on 64-bit platforms. Quoting luigi: In order to make the userland code fully 64-bit clean it may be necessary to commit other changes that may or may not cause a minor change in the ABI. Reviewed by: luigi	2002-10-24 18:04:44 +00:00
Luigi Rizzo	18f13da2be	src and dst address were erroneously swapped in SRC_SET and DST_SET commands. Use the correct one. Also affects ipfw2 in -stable.	2002-10-24 18:01:53 +00:00
Maxime Henrion	56e77afa59	Fix kernel build on sparc64 in the IPDIVERT case.	2002-10-24 09:58:50 +00:00
Ian Dowse	efac726eeb	Unbreak the automatic remapping of an INADDR_ANY destination address to the primary local IP address when doing a TCP connect(). The tcp_connect() code was relying on in_pcbconnect (actually in_pcbladdr) modifying the passed-in sockaddr, and I failed to notice this in the recent change that added in_pcbconnect_setup(). As a result, tcp_connect() was ending up using the unmodified sockaddr address instead of the munged version. There are two cases to handle: if in_pcbconnect_setup() succeeds, then the PCB has already been updated with the correct destination address as we pass it pointers to inp_faddr and inp_fport directly. If in_pcbconnect_setup() fails due to an existing but dead connection, then copy the destination address from the old connection.	2002-10-24 02:02:34 +00:00
Maxim Konovalov	ba3a9d459c	Kill EOL spaces. Approved by: luigi MFC after: 1 week	2002-10-23 10:07:55 +00:00
Maxim Konovalov	6b6874b20c	Use syslog for messages about dropped sessions, do not flood a console. Suggested by: Eugene Grosbein <eugen@kuzbass.ru> Approved by: luigi MFC after: 1 week	2002-10-23 10:05:19 +00:00
SUZUKI Shinsuke	2754d95d85	fixed a kernel crash by "ifconfig stf0 inet 1.2.3.4" MFC after: 1 week	2002-10-22 22:50:38 +00:00
Ian Dowse	c557ae16ce	Implement a new IP_SENDSRCADDR ancillary message type that permits a server process bound to a wildcard UDP socket to select the IP address from which outgoing packets are sent on a per-datagram basis. When combined with IP_RECVDSTADDR, such a server process can guarantee to reply to an incoming request using the same source IP address as the destination IP address of the request, without having to open one socket per server IP address. Discussed on: -net Approved by: re	2002-10-21 20:40:02 +00:00
Ian Dowse	90162a4e87	Remove the "temporary connection" hack in udp_output(). In order to send datagrams from an unconnected socket, we used to first block input, then connect the socket to the sendmsg/sendto destination, send the datagram, and finally disconnect the socket and unblock input. We now use in_pcbconnect_setup() to check if a connect() would have succeeded, but we never record the connection in the PCB (local anonymous port allocation is still recorded, though). The result from in_pcbconnect_setup() authorises the sending of the datagram and selects the local address and port to use, so we just construct the header and call ip_output(). Discussed on: -net Approved by: re	2002-10-21 20:10:05 +00:00
Ian Dowse	5200e00e72	Replace in_pcbladdr() with a more generic inner subroutine for in_pcbconnect() called in_pcbconnect_setup(). This version performs all of the functions of in_pcbconnect() except for the final committing of changes to the PCB. In the case of an EADDRINUSE error it can also provide to the caller the PCB of the duplicate connection, avoiding an extra in_pcblookup_hash() lookup in tcp_connect(). This change will allow the "temporary connect" hack in udp_output() to be removed and is part of the preparation for adding the IP_SENDSRCADDR control message. Discussed on: -net Approved by: re	2002-10-21 13:55:50 +00:00
Poul-Henning Kamp	53be11f680	Fix two instances of variant struct definitions in sys/netinet: Remove the never completed _IP_VHL version, it has not caught on anywhere and it would make us incompatible with other BSD netstacks to retain this version. Add a CTASSERT protecting sizeof(struct ip) == 20. Don't let the size of struct ipq depend on the IPDIVERT option. This is a functional no-op commit. Approved by: re	2002-10-20 22:52:07 +00:00
Robert Watson	c740509854	When a packet is multicast encapsulated, give labeled policies the opportunity to preserve the label. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-20 21:59:00 +00:00
Ian Dowse	4b932371f4	Split out most of the logic from in_pcbbind() into a new function called in_pcbbind_setup() that does everything except commit the changes to the PCB. There should be no functional change here, but in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR control message implementation to check or allocate the source address and port. Discussed on: -net Approved by: re	2002-10-20 21:44:31 +00:00
Maxime Henrion	d7f4d27a7a	Several malloc() calls were passing the M_DONTWAIT flag which is an mbuf allocation flag. Use the correct M_NOWAIT malloc() flag. Fortunately, both were defined to 1, so this commit is a no-op.	2002-10-19 11:31:50 +00:00
Hajimu UMEMOTO	b6e2845324	last arg of in6?_gif_output() is not used any more. Obtained from: KAME MFC after: 3 weeks	2002-10-17 17:47:55 +00:00
Alfred Perlstein	dde2897f82	de-__P().	2002-10-16 22:27:27 +00:00
Hajimu UMEMOTO	ab94625826	use encapcheck. Obtained from: KAME MFC after: 3 weeks	2002-10-16 20:16:49 +00:00
Hajimu UMEMOTO	9426aedf7f	- after gif_set_tunnel(), psrc/pdst may be null. set IFF_RUNNING accordingly. - set IFF_UP on SIOCSIFADDR. be consistent with others. - set if_addrlen explicitly (just in case) - multi destination mode is long gone. - missing break statement - add gif_set_tunnel(), so that we can set tunnel address from within the kernel at ease. - encap_attach/detach dynamically on ioctls - move encap_attach() to dedicated function in in*_gif.c Obtained from: KAME MFC after: 3 weeks	2002-10-16 19:49:37 +00:00
Matthew Dillon	abac41a659	Fix oops in my last commit, I was calculating a new length but then not using it. (The code is already correct in -stable). Found by: silby	2002-10-16 19:16:33 +00:00
Guido van Rooij	2f591ab8fe	Get rid of checking for ip sec history. It is true that packets are not supposed to be checked by the firewall rules twice. However, because the various ipsec handlers never call ip_input(), this never happens anyway. This fixes the situation where a gif tunnel is encrypted with IPsec. In such a case, after IPsec processing, the unencrypted contents from the GIF tunnel are fed back to the ipintrq and subsequently handeld by ip_input(). Yet, since there still is IPSec history attached, the packets coming out from the gif device are never fed into the filtering code. This fix was sent to Itojun, and he pointed towartds http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction. This patch actually implements what is stated there (specifically: Packet came from tunnel devices (gif(4) and ipip(4)) will still go through ipf(4). You may need to identify these packets by using interface name directive in ipf.conf(5). Reviewed by: rwatson MFC after: 3 weeks	2002-10-16 09:01:48 +00:00
Sam Leffler	9b65723081	correct PCB locking in broadcast/multicast case that was exposed by change to use udp_append Reviewed by: hsu	2002-10-16 02:33:28 +00:00
Sam Leffler	b9234fafa0	Tie new "Fast IPsec" code into the build. This involves the usual configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks	2002-10-16 02:25:05 +00:00
Sam Leffler	5d84645305	Replace aux mbufs with packet tags: o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month	2002-10-16 01:54:46 +00:00
Sean Chittenden	927a76bb5e	Increase the max dummynet hash size from 1024 to 65536. Default is still 1024. Silence on: -net, -ipfw 4weeks+ Reviewed by: dd Approved by: knu (mentor) MFC after: 3 weeks	2002-10-12 07:45:23 +00:00
Matthew Dillon	c8d50f2414	turn off debugging by default if bandwidth delay product limiting is turned on (it is already off in -stable).	2002-10-10 21:41:30 +00:00
Matthew Dillon	28257b5ccc	Update various comments mainly related to retransmit/FIN that I documented while working on a previous bug. Fix a PERSIST bug. Properly account for a FIN sent during a PERSIST. MFC after: 7 days	2002-10-10 19:21:50 +00:00
Maxim Konovalov	a5428e3a9a	Fix IPOPT_TS processing: do not overwrite IP address by timestamp. PR: misc/42121 Submitted by: Praveen Khurjekar <praveen@codito.com> Reviewed by: silence on -net MFC after: 1 month	2002-10-10 12:03:36 +00:00
Maxim Sobolev	748bb23dcc	Since bpf is no longer an optional component, remove associated ifdef's. Submitted by: don't quite remember - the name of the sender disappeared with the rest of my inbox. :(	2002-10-02 09:38:17 +00:00
Mike Barcroft	c0ec31f93e	Include <sys/cdefs.h> so the visibility conditionals are available. (This should have been included with the previous revision.)	2002-10-02 04:22:34 +00:00
Mike Barcroft	0cd4a9031e	Use visibility conditionals. Only TCP_NODELAY ends up being defined in the standards case.	2002-10-02 04:19:47 +00:00
Matthew Dillon	a84db8f49e	Guido found another bug. There is a situation with timestamped TCP packets where FreeBSD will send DATA+FIN and A W2K box will ack just the DATA portion. If this occurs after FreeBSD has done a (NewReno) fast-retransmit and is recovering it (dupacks > threshold) it triggers a case in tcp_newreno_partial_ack() (tcp_newreno() in stable) where tcp_output() is called with the expectation that the retransmit timer will be reloaded. But tcp_output() falls through and returns without doing anything, causing the persist timer to be loaded instead. This causes the connection to hang until W2K gives up. This occurs because in the case where only the FIN must be acked, the 'len' calculation in tcp_output() will be 0, a lot of checks will be skipped, and the FIN check will also be skipped because it is designed to handle FIN retransmits, not forced transmits from tcp_newreno(). The solution is to simply set TF_ACKNOW before calling tcp_output() to absolute guarentee that it will run the send code and reset the retransmit timer. TF_ACKNOW is already used for this purpose in other cases. For some unknown reason this patch also seems to greatly reduce the number of duplicate acks received when Guido runs his tests over a lossy network. It is quite possible that there are other tcp_newreno{_partial_ack()} cases which were not generating the expected output which this patch also fixes. X-MFC after: Will be MFC'd after the freeze is over	2002-09-30 18:55:45 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Peter Wemm	224af215a6	Zap now-unused SHLIB_MINOR	2002-09-28 00:25:32 +00:00
Maxim Konovalov	cb7641e85b	Slightly rearrange a code in rev. 1.164: o Move len initialization closer to place of its first usage. o Compare len with 0 to improve readability. o Explicitly zero out phlen in ip_insertoptions() in failure case. Suggested by: jhb Reviewed by: jhb MFC after: 2 weeks	2002-09-23 08:56:24 +00:00
Alfred Perlstein	ebc82cbbf0	s/__attribute__((__packed__))/__packed/g	2002-09-23 06:25:08 +00:00
Mike Silbersack	c1c36a2c68	Fix issue where shutdown(socket, SHUT_RD) was effectively ignored for TCP sockets. NetBSD PR: 18185 Submitted by: Sean Boudreau <seanb@qnx.com> MFC after: 3 days	2002-09-22 02:54:07 +00:00
Poul-Henning Kamp	a5554bf05b	Use m_fixhdr() rather than roll our own.	2002-09-18 19:43:01 +00:00
Matthew Dillon	fa55172bc0	Guido reported an interesting bug where an FTP connection between a Windows 2000 box and a FreeBSD box could stall. The problem turned out to be a timestamp reply bug in the W2K TCP stack. FreeBSD sends a timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK causing FreeBSD to calculate an insane SRTT and RTT, resulting in a maximal retransmit timeout (60 seconds). If there is any packet loss on the connection for the first six or so packets the retransmit case may be hit (the window will still be too small for fast-retransmit), causing a 60+ second pause. The W2K box gives up and closes the connection. This commit works around the W2K bug. 15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8] 15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF) Bug reported by: Guido van Rooij <guido@gvr.org>	2002-09-17 22:21:37 +00:00
Maxim Sobolev	563a9b6ecb	Remove __RCSID(). Submitted by: bde	2002-09-17 11:31:41 +00:00
Maxim Konovalov	1cf4349926	Explicitly clear M_FRAG flag on a mbuf with the last fragment to unbreak ip fragments reassembling for loopback interface. Discussed with: bde, jlemon Reviewed by: silence on -net MFC after: 2 weeks	2002-09-17 11:20:02 +00:00
Maxim Konovalov	e079ba8d93	In rare cases when there is no room for ip options ip_insertoptions() can fail and corrupt a header length. Initialize len and check what ip_insertoptions() returns. Reviewed by: archie, silence on -net MFC after: 5 days	2002-09-17 11:13:04 +00:00
Jennifer Yang	4a03a8a8c7	Tempary fix for inet6. The final fix is to change in6_pcbnotify to take pcbinfo instead of pcbhead. It is on the way.	2002-09-17 03:19:43 +00:00
Maxim Sobolev	2b82e3b367	Remove superfluous break.	2002-09-10 09:18:33 +00:00
Maxim Sobolev	565bb857d0	Since from now on encap_input() also catches IPPROTO_MOBILE and IPPROTO_GRE packets in addition to IPPROTO_IPV4 and IPPROTO_IPV6, explicitly specify IPPROTO_IPV4 or IPPROTO_IPV6 instead of -1 when calling encap_attach(). MFC after: 28 days (along with other if_gre changes)	2002-09-09 09:36:47 +00:00
Maxim Sobolev	c23d234cce	Reduce namespace pollution by staticizing everything, which doesn't need to be visible from outside of the module.	2002-09-06 18:16:03 +00:00
Maxim Sobolev	8e96e13e6a	Add a new gre(4) driver, which could be used to create GRE (RFC1701) and MOBILE (RFC2004) IP tunnels. Obrained from: NetBSD	2002-09-06 17:12:50 +00:00
Bruce Evans	40545cf5fc	Fixed namespace pollution in uma changes: - use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite. - don't include <sys/uma.h>. Namespace pollution makes "opaque" types like uma_zone_t perfectly non-opaque. Such types should never be used (see style(9)). Fixed subsequently grwon dependencies of this header on its own pollution: - include <sys/_mutex.h> and its prerequisite <sys/_lock.h> instead of depending on namespace pollution 2 layers deep in <sys/uma.h>.	2002-09-05 19:48:52 +00:00
Bruce Evans	c74af4fac1	Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending on namespace pollution 4 layers deep in <netinet/in_pcb.h>. Removed unused includes. Sorted includes.	2002-09-05 15:33:30 +00:00
Maxim Sobolev	386fefa3a0	Add in_hosteq() and in_nullhost() macros to make life of developers porting NetBSD code a little bit easier. Obtained from: NetBSD	2002-09-04 09:55:50 +00:00
Darren Reed	1851791868	some ipfilter files that accidently got imported here	2002-08-29 13:27:26 +00:00
Darren Reed	070700595d	This commit was generated by cvs2svn to compensate for changes in r102514, which included commits to RCS files with non-trunk default branches.	2002-08-28 13:26:01 +00:00
Philippe Charnier	93b0017f88	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
Crist J. Clark	784d7650f7	Lock the sysctl(8) knobs that turn ip{,6}fw(8) firewalling and firewall logging on and off when at elevated securelevel(8). It would be nice to be able to only lock these at securelevel >= 3, like rules are, but there is no such functionality at present. I don't see reason to be adding features to securelevel(8) with MAC being merged into 5.0. PR: kern/39396 Reviewed by: luigi MFC after: 1 week	2002-08-25 03:50:29 +00:00
Matthew Dillon	4f1e1f32b6	Correct bug in t_bw_rtttime rollover, #undef USERTT	2002-08-24 17:22:44 +00:00
Archie Cobbs	4a6a94d8d8	Replace (ab)uses of "NULL" where "0" is really meant.	2002-08-22 21:24:01 +00:00
Mike Barcroft	abbd890233	o Merge <machine/ansi.h> and <machine/types.h> into a new header called <machine/_types.h>. o <machine/ansi.h> will continue to live so it can define MD clock macros, which are only MD because of gratuitous differences between architectures. o Change all headers to make use of this. This mainly involves changing: #ifdef _BSD_FOO_T_ typedef _BSD_FOO_T_ foo_t; #undef _BSD_FOO_T_ #endif to: #ifndef _FOO_T_DECLARED typedef __foo_t foo_t; #define _FOO_T_DECLARED #endif Concept by: bde Reviewed by: jake, obrien	2002-08-21 16:20:02 +00:00
Don Lewis	26ef6ac4df	Create new functions in_sockaddr(), in6_sockaddr(), and in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in* structure and initialize it with the address and port information passed as arguments. Use calls to these new functions to replace code that is replicated multiple times in in_setsockaddr(), in_setpeeraddr(), in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that we can call in_sockaddr() with temporary copies of the address and port after the PCB is unlocked. Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC() inside in6_mapped_peeraddr() while the PCB is locked) by changing the implementation of tcp6_usr_accept() to match tcp_usr_accept(). Reviewed by: suz	2002-08-21 11:57:12 +00:00
Juli Mallett	ded7008a07	Enclose IPv6 addresses in brackets when they are displayed printable with a TCP/UDP port seperated by a colon. This is for the log_in_vain facility. Pointed out by: Edward J. M. Brocklesby Reviewed by: ume MFC after: 2 weeks	2002-08-19 19:47:13 +00:00
Luigi Rizzo	306fe283a1	Raise limit for port lists to 30 entries/ranges. Remove a duplicate "logging" message, and identify the firewall as ipfw2 in the boot message.	2002-08-19 04:45:01 +00:00
Matthew Dillon	1fcc99b5de	Implement TCP bandwidth delay product window limiting, similar to (but not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week	2002-08-17 18:26:02 +00:00
Jeffrey Hsu	c068736a61	Cosmetic-only changes for readability. Reviewed by: (early form passed by) bde Approved by: itojun (from core@kame.net)	2002-08-17 02:05:25 +00:00
Luigi Rizzo	99e5e64504	sys/netinet/ip_fw2.c: Implement the M_SKIP_FIREWALL bit in m_flags to avoid loops for firewall-generated packets (the constant has to go in sys/mbuf.h). Better comments on keepalive generation, and enforce dyn_rst_lifetime and dyn_fin_lifetime to be less than dyn_keepalive_period. Enforce limits (up to 64k) on the number of dynamic buckets, and retry allocation with smaller sizes. Raise default number of dynamic rules to 4096. Improved handling of set of rules -- now you can atomically enable/disable multiple sets, move rules from one set to another, and swap sets. sbin/ipfw/ipfw2.c: userland support for "noerror" pipe attribute. userland support for sets of rules. minor improvements on rule parsing and printing. sbin/ipfw/ipfw.8: more documentation on ipfw2 extensions, differences from ipfw1 (so we can use the same manpage for both), stateful rules, and some additional examples. Feedback and more examples needed here.	2002-08-16 10:31:47 +00:00
Alfred Perlstein	e88894d39a	make the strings for tcptimers, tanames and prurequests const to silence warnings.	2002-08-16 09:07:59 +00:00
Robert Watson	365433d9b8	Code formatting sync to trustedbsd_mac: don't perform an assignment in an if clause. PR: Submitted by: Reviewed by: Approved by: Obtained from: MFC after:	2002-08-15 22:04:31 +00:00
Robert Watson	fb95b5d3c3	Rename mac_check_socket_receive() to mac_check_socket_deliver() so that we can use the names _receive() and _send() for the receive() and send() checks. Rename related constants, policy implementations, etc. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 18:51:27 +00:00
Jeffrey Hsu	b5addd8564	Reset dupack count in header prediction. Follow-on to rev 1.39. Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon	2002-08-15 17:13:18 +00:00
Luigi Rizzo	4bbf3b8b3a	Kernel support for a dummynet option: When a pipe or queue has the "noerror" attribute, do not report drops to the caller (ip_output() and friends). (2 lines to implement it, 2 lines to document it.) This will let you simulate losses on the sender side as if they happened in the middle of the network, i.e. with no explicit feedback to the sender. manpage and ipfw2.c changes to follow shortly, together with other ipfw2 changes. Requested by: silby MFC after: 3 days	2002-08-15 16:53:43 +00:00
Robert Watson	ecd3e8ff5a	It's now sufficient to rely on a nested include of _label.h to make sure all structures in ip_var.h are defined, so remove include of mac.h. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:34:45 +00:00
Robert Watson	9daf40feaa	Perform a nested include of _label.h if #ifdef _KERNEL. This will satisfy consumers of ip_var.h that need a complete definition of struct ipq and don't include mac.h. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:34:02 +00:00
Robert Watson	3b6aad64bf	Add mac.h -- raw_ip.c was depending on nested inclusion of mac.h which is no longer present. Pointed out by: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:27:46 +00:00
Poul-Henning Kamp	ae89fdaba7	remove spurious printf	2002-08-13 19:13:23 +00:00
Jennifer Yang	3d6ade3a03	Assert that the inpcb lock is held when calling tcp_output(). Approved by: hsu	2002-08-12 03:22:46 +00:00
Luigi Rizzo	43405724ec	One bugfix and one new feature. The bugfix (ipfw2.c) makes the handling of port numbers with a dash in the name, e.g. ftp-data, consistent with old ipfw: use \\ before the - to consider it as part of the name and not a range separator. The new feature (all this description will go in the manpage): each rule now belongs to one of 32 different sets, which can be optionally specified in the following form: ipfw add 100 set 23 allow ip from any to any If "set N" is not specified, the rule belongs to set 0. Individual sets can be disabled, enabled, and deleted with the commands: ipfw disable set N ipfw enable set N ipfw delete set N Enabling/disabling of a set is atomic. Rules belonging to a disabled set are skipped during packet matching, and they are not listed unless you use the '-S' flag in the show/list commands. Note that dynamic rules, once created, are always active until they expire or their parent rule is deleted. Set 31 is reserved for the default rule and cannot be disabled. All sets are enabled by default. The enable/disable status of the sets can be shown with the command ipfw show sets Hopefully, this feature will make life easier to those who want to have atomic ruleset addition/deletion/tests. Examples: To add a set of rules atomically: ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18 To delete a set of rules atomically ipfw disable set 18 ipfw delete set 18 ipfw enable set 18 To test a ruleset and disable it and regain control if something goes wrong: ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18 ; echo "done "; sleep 30 && ipfw disable set 18 here if everything goes well, you press control-C before the "sleep" terminates, and your ruleset will be left active. Otherwise, e.g. if you cannot access your box, the ruleset will be disabled after the sleep terminates. I think there is only one more thing that one might want, namely a command to assign all rules in set X to set Y, so one can test a ruleset using the above mechanisms, and once it is considered acceptable, make it part of an existing ruleset.	2002-08-10 04:37:32 +00:00
Mike Silbersack	a9ce5e05b5	Handle PMTU discovery in syn-ack packets slightly differently; rely on syncache flags instead of directly accessing the route entry. MFC after: 3 days	2002-08-05 22:34:15 +00:00
Luigi Rizzo	1cbd978e96	bugfix: move check for udp_blackhole before the one for icmp_bandlim. MFC after: 3 days	2002-08-04 20:50:13 +00:00
Luigi Rizzo	ea779ff36c	Fix handling of packets which matched an "ipfw fwd" rule on the input side.	2002-08-03 14:59:45 +00:00
Robert Watson	e316463a86	When preserving the IP header in extra mbuf in the IP forwarding case, also preserve the MAC label. Note that this mbuf allocation is fairly non-optimal, but not my fault. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-02 20:45:27 +00:00
Robert Watson	09a555cbf9	Work to fix LINT build. Reported by: phk	2002-08-02 18:08:14 +00:00
Robert Watson	bdb3fa1832	Introduce support for Mandatory Access Control and extensible kernel access control. Add MAC support for the UDP protocol. Invoke appropriate MAC entry points to label packets that are generated by local UDP sockets, and to authorize delivery of mbufs to local sockets both in the multicast/broadcast case and the unicast case. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 21:37:34 +00:00
Robert Watson	d00e44fb4a	Document the undocumented assumption that at least one of the PCB pointer and incoming mbuf pointer will be non-NULL in tcp_respond(). This is relied on by the MAC code for correctness, as well as existing code. Obtained from: TrustedBSD PRoject Sponsored by: DARPA, NAI Labs	2002-08-01 03:54:43 +00:00
Robert Watson	0070e096d7	Introduce support for Mandatory Access Control and extensible kernel access control. Add support for labeling most out-going ICMP messages using an appropriate MAC entry point. Currently, we do not explicitly label packet reflect (timestamp, echo request) ICMP events, implicitly using the originating packet label since the mbuf is reused. This will be made explicit at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 03:53:04 +00:00
Robert Watson	c488362e1a	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 19:06:49 +00:00
Robert Watson	4ea889c666	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the raw IP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check the socket and mbuf labels before permitting delivery to a socket, permitting MAC policies to selectively allow delivery of raw IP mbufs to various raw IP sockets that may be open. Restructure the policy checking code to compose IPsec and MAC results in a more readable manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 18:30:34 +00:00
Robert Watson	4ed84624a2	Introduce support for Mandatory Access Control and extensible kernel access control. When fragmenting an IP datagram, invoke an appropriate MAC entry point so that MAC labels may be copied (...) to the individual IP fragment mbufs by MAC policies. When IP options are inserted into an IP datagram when leaving a host, preserve the label if we need to reallocate the mbuf for alignment or size reasons. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 17:21:01 +00:00
Robert Watson	36b0360b37	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the code managing IP fragment reassembly queues (struct ipq) to invoke appropriate MAC entry points to maintain a MAC label on each queue. Permit MAC policies to associate information with a queue based on the mbuf that caused it to be created, update that information based on further mbufs accepted by the queue, influence the decision making process by which mbufs are accepted to the queue, and set the label of the mbuf holding the reassembled datagram following reassembly completetion. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 17:17:51 +00:00
Robert Watson	0ec4b12334	Introduce support for Mandatory Access Control and extensible kernel access control. When generating an IGMP message, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the target interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 16:46:56 +00:00
Robert Watson	19527d3e22	Introduce support for Mandatory Access Control and extensible kernel access control. When generating an ARP query, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 16:45:16 +00:00
Robert Watson	d3990b06e1	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the MAC framework to label mbuf created using divert sockets. These labels may later be used for access control on delivery to another socket, or to an interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI LAbs	2002-07-31 16:42:47 +00:00
Robert Watson	549e4c9e4e	Introduce support for Mandatory Access Control and extensible kernel access control. Label IP fragment reassembly queues, permitting security features to be maintained on those objects. ipq_label will be used to manage the reassembly of fragments into IP datagrams using security properties. This permits policies to deny the reassembly of fragments, as well as influence the resulting label of a datagram following reassembly. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 23:09:20 +00:00
Maxim Konovalov	d46a53126c	Use a common way to release locks before exit. Reviewed by: hsu	2002-07-29 09:01:39 +00:00
Don Lewis	5c38b6dbce	Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.	2002-07-28 19:59:31 +00:00
Hajimu UMEMOTO	66ef17c4b6	make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6. MFC after: 1 week	2002-07-25 18:10:04 +00:00
Hajimu UMEMOTO	eccb7001ee	cleanup usage of ip6_mapped_addr_on and ip6_v6only. now, ip6_mapped_addr_on is unified into ip6_v6only. MFC after: 1 week	2002-07-25 17:40:45 +00:00
Luigi Rizzo	be1826c354	Only log things net.inet.ip.fw.verbose is set	2002-07-24 02:41:19 +00:00
Ruslan Ermilov	61a875d706	Don't forget to recalculate the IP checksum of the original IP datagram embedded into ICMP error message. Spotted by: tcpdump 3.7.1 (-vvv) MFC after: 3 days	2002-07-23 00:16:19 +00:00
Ruslan Ermilov	88c39af35f	Don't shrink socket buffers in tcp_mss(), application might have already configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows. PR: kern/11966 MFC after: 1 week	2002-07-22 22:31:09 +00:00
Hajimu UMEMOTO	854d3b19a2	do not refer to IN6P_BINDV6ONLY anymore. Obtained from: KAME MFC after: 1 week	2002-07-22 15:51:02 +00:00
John Polstra	8ea8a6804b	Fix overflows in intermediate calculations in sysctl_msec_to_ticks(). At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle to be reported as negative. MFC after: 3 days	2002-07-20 23:48:59 +00:00
Robert Watson	69dac2ea47	Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel data structures pick up security and synchronization primitives, it becomes increasingly desirable not to arbitrarily export them via include files to userland, as the userland applications pick up new #include dependencies. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-20 22:46:20 +00:00
Matthew Dillon	d65bf08af3	Add the tcps_sndrexmitbad statistic, keep track of late acks that caused unnecessary retransmissions.	2002-07-19 18:29:38 +00:00
Matthew Dillon	701bec5a38	Introduce two new sysctl's: net.inet.tcp.rexmit_min (default 3 ticks equiv) This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only. net.inet.tcp.rexmit_slop (default 200ms) This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues. Note that the original code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant. Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits.	2002-07-18 19:06:12 +00:00
Luigi Rizzo	90780c4b05	Move IPFW2 definition before including ip_fw.h Make indentation of new parts consistent with the style used for this file.	2002-07-18 05:18:41 +00:00
Matthew Dillon	22fd54d461	I don't know how the minimum retransmit timeout managed to get set to one second but it badly breaks throughput on networks with minor packet loss. Complaints by: at least two people tracked down to this. MFC after: 3 days	2002-07-17 23:32:03 +00:00
Luigi Rizzo	318aa87b59	Fix a panic when doing "ipfw add pipe 1 log ..." Also synchronize ip_dummynet.c with the version in RELENG_4 to ease MFC's.	2002-07-17 07:21:42 +00:00
Luigi Rizzo	a8c102a2ec	Implement keepalives for dynamic rules, so they will not expire just because you leave your session idle. Also, put in a fix for 64-bit architectures (to be revised). In detail: ip_fw.h * Reorder fields in struct ip_fw to avoid alignment problems on 64-bit machines. This only masks the problem, I am still not sure whether I am doing something wrong in the code or there is a problem elsewhere (e.g. different aligmnent of structures between userland and kernel because of pragmas etc.) * added fields in dyn_rule to store ack numbers, so we can generate keepalives when the dynamic rule is about to expire ip_fw2.c * use a local function, send_pkt(), to generate TCP RST for Reset rules; * save about 250 bytes by cleaning up the various snprintf() in ipfw_log() ... * ... and use twice as many bytes to implement keepalives (this seems to be working, but i have not tested it extensively). Keepalives are generated once every 5 seconds for the last 20 seconds of the lifetime of a dynamic rule for an established TCP flow. The packets are sent to both sides, so if at least one of the endpoints is responding, the timeout is refreshed and the rule will not expire. You can disable this feature with sysctl net.inet.ip.fw.dyn_keepalive=0 (the default is 1, to have them enabled). MFC after: 1 day (just kidding... I will supply an updated version of ipfw2 for RELENG_4 tomorrow).	2002-07-14 23:47:18 +00:00
Luigi Rizzo	3956b02345	Avoid dereferencing a null pointer in ro_rt. This was always broken in HEAD (the offending statement was introduced in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev. 1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30. So I am also restoring these two lines in RELENG_4 now. We might need another few things from 1.99.2.30.	2002-07-12 22:08:47 +00:00
Don Lewis	2d20c83f93	Back out the previous change, since it looks like locking udbinfo provides sufficient protection.	2002-07-12 09:55:48 +00:00
Don Lewis	bb1dd7a45a	Lock inp while we're accessing it.	2002-07-12 08:05:22 +00:00
Don Lewis	0e1eebb846	Defer calling SYSCTL_OUT() until after the locks have been released.	2002-07-11 23:18:43 +00:00
Don Lewis	142b2bd644	Reduce the nesting level of a code block that doesn't need to be in an else clause.	2002-07-11 23:13:31 +00:00
Luigi Rizzo	c7ea683135	Change one variable to make it easier to switch between ipfw and ipfw2	2002-07-09 06:53:38 +00:00
Luigi Rizzo	b3063f064c	Fix a bug caused by dereferencing an invalid pointer when no punch_fw was used. Fix another couple of bugs which prevented rules from being installed properly. On passing, use IPFW2 instead of NEW_IPFW to compile the new code, and slightly simplify the instruction generation code.	2002-07-08 22:57:35 +00:00
Luigi Rizzo	d63b346ab1	No functional changes, but: Following Darren's suggestion, make Dijkstra happy and rewrite the ipfw_chk() main loop removing a lot of goto's and using instead a variable to store match status. Add a lot of comments to explain what instructions are supposed to do and how -- this should ease auditing of the code and make people more confident with it. In terms of code size: the entire file takes about 12700 bytes of text, about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!) for ipfw_log().	2002-07-08 22:46:01 +00:00
Luigi Rizzo	7d4d3e9051	Remove one unused command name.	2002-07-08 22:39:19 +00:00
Luigi Rizzo	5185195169	Forgot to update one field name in one of the latest commits.	2002-07-08 22:37:55 +00:00
Luigi Rizzo	5e43aef891	Implement the last 2-3 missing instructions for ipfw, now it should support all the instructions of the old ipfw. Fix some bugs in the user interface, /sbin/ipfw. Please check this code against your rulesets, so i can fix the remaining bugs (if any, i think they will be mostly in /sbin/ipfw). Once we have done a bit of testing, this code is ready to be MFC'ed, together with a bunch of other changes (glue to ipfw, and also the removal of some global variables) which have been in -current for a couple of weeks now. MFC after: 7 days	2002-07-05 22:43:06 +00:00
Brian Somers	27cc91fbf8	Remove trailing whitespace	2002-07-01 11:19:40 +00:00
Jesper Skriver	eb538bfd64	Extend the effect of the sysctl net.inet.tcp.icmp_may_rst so that, if we recieve a ICMP "time to live exceeded in transit", (type 11, code 0) for a TCP connection on SYN-SENT state, close the connection. MFC after: 2 weeks	2002-06-30 20:07:21 +00:00
Jonathan Lemon	0080a004d7	One possible code path for syncache_respond() is: syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B) Which winds up deleting a different entry from the syncache. Handle this by not utilizing the next entry in the timer chain until after syncache_respond() completes. The case of A == B should not be possible. Problem found by: Don Bowman <don@sandvine.com>	2002-06-28 19:12:38 +00:00
Doug Rabson	24f8fd9fd1	Fix warning. Reviewed by: luigi	2002-06-28 08:36:26 +00:00
Luigi Rizzo	9758b77ff1	The new ipfw code. This code makes use of variable-size kernel representation of rules (exactly the same concept of BPF instructions, as used in the BSDI's firewall), which makes firewall operation a lot faster, and the code more readable and easier to extend and debug. The interface with the rest of the system is unchanged, as witnessed by this commit. The only extra kernel files that I am touching are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In userland I only had to touch those programs which manipulate the internal representation of firewall rules). The code is almost entirely new (and I believe I have written the vast majority of those sections which were taken from the former ip_fw.c), so rather than modifying the old ip_fw.c I decided to create a new file, sys/netinet/ip_fw2.c . Same for the user interface, which is in sbin/ipfw/ipfw2.c (it still compiles to /sbin/ipfw). The old files are still there, and will be removed in due time. I have not renamed the header file because it would have required touching a one-line change to a number of kernel files. In terms of user interface, the new "ipfw" is supposed to accepts the old syntax for ipfw rules (and produce the same output with "ipfw show". Only a couple of the old options (out of some 30 of them) has not been implemented, but they will be soon. On the other hand, the new code has some very powerful extensions. First, you can put "or" connectives between match fields (and soon also between options), and write things like ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any This should make rulesets slightly more compact (and lines longer!), by condensing 2 or more of the old rules into single ones. Also, as an example of how easy the rules can be extended, I have implemented an 'address set' match pattern, where you can specify an IP address in a format like this: 10.20.30.0/26{18,44,33,22,9} which will match the set of hosts listed in braces belonging to the subnet 10.20.30.0/26 . The match is done using a bitmap, so it is essentially a constant time operation requiring a handful of CPU instructions (and a very small amount of memmory -- for a full /24 subnet, the instruction only consumes 40 bytes). Again, in this commit I have focused on functionality and tried to minimize changes to the other parts of the system. Some performance improvement can be achieved with minor changes to the interface of ip_fw_chk_t. This will be done later when this code is settled. The code is meant to compile unmodified on RELENG_4 (once the PACKET_TAG_* changes have been merged), for this reason you will see #ifdef __FreeBSD_version in a couple of places. This should minimize errors when (hopefully soon) it will be time to do the MFC.	2002-06-27 23:02:18 +00:00
Maxime Henrion	7627c6cbcc	Warning fixes for 64 bits platforms. With this last fix, I can build a GENERIC sparc64 kernel with -Werror. Reviewed by: luigi	2002-06-27 11:02:06 +00:00
Luigi Rizzo	713a6ea063	Just a comment on some additional consistency checks that could be added here.	2002-06-26 21:00:53 +00:00
Kenneth D. Merry	98cb733c67	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
Jeffrey Hsu	6fd22caf91	Avoid unlocking the inp twice if badport_bandlim() returns -1. Reported by: jlemon	2002-06-24 22:25:00 +00:00
Jeffrey Hsu	f14e4cfe33	Style bug: fix 4 space indentations that should have been tabs. Submitted by: jlemon	2002-06-24 16:47:02 +00:00
Luigi Rizzo	f10e85d797	Slightly restructure the #ifdef INET6 sections to make the code more readable. Remove the six "register" attributes from variables tcp_output(), the compiler surely knows well how to allocate them.	2002-06-23 21:25:36 +00:00
Luigi Rizzo	410bb1bfe2	Move two global variables to automatic variables within the only function where they are used (they are used with TCPDEBUG only).	2002-06-23 21:22:56 +00:00
Luigi Rizzo	4d2e36928d	Move some global variables in more appropriate places. Add XXX comments to mark places which need to be taken care of if we want to remove this part of the kernel from Giant. Add a comment on a potential performance problem with ip_forward()	2002-06-23 20:48:26 +00:00
Luigi Rizzo	51aed12e52	fix bad indentation and whitespace resulting from cut&paste	2002-06-23 09:15:43 +00:00
Luigi Rizzo	dfd1ae2f86	fix indentation of a comment	2002-06-23 09:14:24 +00:00
Luigi Rizzo	a5924d6100	fix a typo in a comment	2002-06-23 09:13:46 +00:00

... 3 4 5 6 7 ...

1816 Commits