freebsd-dev

Author	SHA1	Message	Date
Darren Reed	1851791868	some ipfilter files that accidently got imported here	2002-08-29 13:27:26 +00:00
Darren Reed	070700595d	This commit was generated by cvs2svn to compensate for changes in r102514, which included commits to RCS files with non-trunk default branches.	2002-08-28 13:26:01 +00:00
Philippe Charnier	93b0017f88	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
Crist J. Clark	784d7650f7	Lock the sysctl(8) knobs that turn ip{,6}fw(8) firewalling and firewall logging on and off when at elevated securelevel(8). It would be nice to be able to only lock these at securelevel >= 3, like rules are, but there is no such functionality at present. I don't see reason to be adding features to securelevel(8) with MAC being merged into 5.0. PR: kern/39396 Reviewed by: luigi MFC after: 1 week	2002-08-25 03:50:29 +00:00
Matthew Dillon	4f1e1f32b6	Correct bug in t_bw_rtttime rollover, #undef USERTT	2002-08-24 17:22:44 +00:00
Archie Cobbs	4a6a94d8d8	Replace (ab)uses of "NULL" where "0" is really meant.	2002-08-22 21:24:01 +00:00
Mike Barcroft	abbd890233	o Merge <machine/ansi.h> and <machine/types.h> into a new header called <machine/_types.h>. o <machine/ansi.h> will continue to live so it can define MD clock macros, which are only MD because of gratuitous differences between architectures. o Change all headers to make use of this. This mainly involves changing: #ifdef _BSD_FOO_T_ typedef _BSD_FOO_T_ foo_t; #undef _BSD_FOO_T_ #endif to: #ifndef _FOO_T_DECLARED typedef __foo_t foo_t; #define _FOO_T_DECLARED #endif Concept by: bde Reviewed by: jake, obrien	2002-08-21 16:20:02 +00:00
Don Lewis	26ef6ac4df	Create new functions in_sockaddr(), in6_sockaddr(), and in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in* structure and initialize it with the address and port information passed as arguments. Use calls to these new functions to replace code that is replicated multiple times in in_setsockaddr(), in_setpeeraddr(), in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that we can call in_sockaddr() with temporary copies of the address and port after the PCB is unlocked. Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC() inside in6_mapped_peeraddr() while the PCB is locked) by changing the implementation of tcp6_usr_accept() to match tcp_usr_accept(). Reviewed by: suz	2002-08-21 11:57:12 +00:00
Juli Mallett	ded7008a07	Enclose IPv6 addresses in brackets when they are displayed printable with a TCP/UDP port seperated by a colon. This is for the log_in_vain facility. Pointed out by: Edward J. M. Brocklesby Reviewed by: ume MFC after: 2 weeks	2002-08-19 19:47:13 +00:00
Luigi Rizzo	306fe283a1	Raise limit for port lists to 30 entries/ranges. Remove a duplicate "logging" message, and identify the firewall as ipfw2 in the boot message.	2002-08-19 04:45:01 +00:00
Matthew Dillon	1fcc99b5de	Implement TCP bandwidth delay product window limiting, similar to (but not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week	2002-08-17 18:26:02 +00:00
Jeffrey Hsu	c068736a61	Cosmetic-only changes for readability. Reviewed by: (early form passed by) bde Approved by: itojun (from core@kame.net)	2002-08-17 02:05:25 +00:00
Luigi Rizzo	99e5e64504	sys/netinet/ip_fw2.c: Implement the M_SKIP_FIREWALL bit in m_flags to avoid loops for firewall-generated packets (the constant has to go in sys/mbuf.h). Better comments on keepalive generation, and enforce dyn_rst_lifetime and dyn_fin_lifetime to be less than dyn_keepalive_period. Enforce limits (up to 64k) on the number of dynamic buckets, and retry allocation with smaller sizes. Raise default number of dynamic rules to 4096. Improved handling of set of rules -- now you can atomically enable/disable multiple sets, move rules from one set to another, and swap sets. sbin/ipfw/ipfw2.c: userland support for "noerror" pipe attribute. userland support for sets of rules. minor improvements on rule parsing and printing. sbin/ipfw/ipfw.8: more documentation on ipfw2 extensions, differences from ipfw1 (so we can use the same manpage for both), stateful rules, and some additional examples. Feedback and more examples needed here.	2002-08-16 10:31:47 +00:00
Alfred Perlstein	e88894d39a	make the strings for tcptimers, tanames and prurequests const to silence warnings.	2002-08-16 09:07:59 +00:00
Robert Watson	365433d9b8	Code formatting sync to trustedbsd_mac: don't perform an assignment in an if clause. PR: Submitted by: Reviewed by: Approved by: Obtained from: MFC after:	2002-08-15 22:04:31 +00:00
Robert Watson	fb95b5d3c3	Rename mac_check_socket_receive() to mac_check_socket_deliver() so that we can use the names _receive() and _send() for the receive() and send() checks. Rename related constants, policy implementations, etc. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 18:51:27 +00:00
Jeffrey Hsu	b5addd8564	Reset dupack count in header prediction. Follow-on to rev 1.39. Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon	2002-08-15 17:13:18 +00:00
Luigi Rizzo	4bbf3b8b3a	Kernel support for a dummynet option: When a pipe or queue has the "noerror" attribute, do not report drops to the caller (ip_output() and friends). (2 lines to implement it, 2 lines to document it.) This will let you simulate losses on the sender side as if they happened in the middle of the network, i.e. with no explicit feedback to the sender. manpage and ipfw2.c changes to follow shortly, together with other ipfw2 changes. Requested by: silby MFC after: 3 days	2002-08-15 16:53:43 +00:00
Robert Watson	ecd3e8ff5a	It's now sufficient to rely on a nested include of _label.h to make sure all structures in ip_var.h are defined, so remove include of mac.h. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:34:45 +00:00
Robert Watson	9daf40feaa	Perform a nested include of _label.h if #ifdef _KERNEL. This will satisfy consumers of ip_var.h that need a complete definition of struct ipq and don't include mac.h. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:34:02 +00:00
Robert Watson	3b6aad64bf	Add mac.h -- raw_ip.c was depending on nested inclusion of mac.h which is no longer present. Pointed out by: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:27:46 +00:00
Poul-Henning Kamp	ae89fdaba7	remove spurious printf	2002-08-13 19:13:23 +00:00
Jennifer Yang	3d6ade3a03	Assert that the inpcb lock is held when calling tcp_output(). Approved by: hsu	2002-08-12 03:22:46 +00:00
Luigi Rizzo	43405724ec	One bugfix and one new feature. The bugfix (ipfw2.c) makes the handling of port numbers with a dash in the name, e.g. ftp-data, consistent with old ipfw: use \\ before the - to consider it as part of the name and not a range separator. The new feature (all this description will go in the manpage): each rule now belongs to one of 32 different sets, which can be optionally specified in the following form: ipfw add 100 set 23 allow ip from any to any If "set N" is not specified, the rule belongs to set 0. Individual sets can be disabled, enabled, and deleted with the commands: ipfw disable set N ipfw enable set N ipfw delete set N Enabling/disabling of a set is atomic. Rules belonging to a disabled set are skipped during packet matching, and they are not listed unless you use the '-S' flag in the show/list commands. Note that dynamic rules, once created, are always active until they expire or their parent rule is deleted. Set 31 is reserved for the default rule and cannot be disabled. All sets are enabled by default. The enable/disable status of the sets can be shown with the command ipfw show sets Hopefully, this feature will make life easier to those who want to have atomic ruleset addition/deletion/tests. Examples: To add a set of rules atomically: ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18 To delete a set of rules atomically ipfw disable set 18 ipfw delete set 18 ipfw enable set 18 To test a ruleset and disable it and regain control if something goes wrong: ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18 ; echo "done "; sleep 30 && ipfw disable set 18 here if everything goes well, you press control-C before the "sleep" terminates, and your ruleset will be left active. Otherwise, e.g. if you cannot access your box, the ruleset will be disabled after the sleep terminates. I think there is only one more thing that one might want, namely a command to assign all rules in set X to set Y, so one can test a ruleset using the above mechanisms, and once it is considered acceptable, make it part of an existing ruleset.	2002-08-10 04:37:32 +00:00
Mike Silbersack	a9ce5e05b5	Handle PMTU discovery in syn-ack packets slightly differently; rely on syncache flags instead of directly accessing the route entry. MFC after: 3 days	2002-08-05 22:34:15 +00:00
Luigi Rizzo	1cbd978e96	bugfix: move check for udp_blackhole before the one for icmp_bandlim. MFC after: 3 days	2002-08-04 20:50:13 +00:00
Luigi Rizzo	ea779ff36c	Fix handling of packets which matched an "ipfw fwd" rule on the input side.	2002-08-03 14:59:45 +00:00
Robert Watson	e316463a86	When preserving the IP header in extra mbuf in the IP forwarding case, also preserve the MAC label. Note that this mbuf allocation is fairly non-optimal, but not my fault. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-02 20:45:27 +00:00
Robert Watson	09a555cbf9	Work to fix LINT build. Reported by: phk	2002-08-02 18:08:14 +00:00
Robert Watson	bdb3fa1832	Introduce support for Mandatory Access Control and extensible kernel access control. Add MAC support for the UDP protocol. Invoke appropriate MAC entry points to label packets that are generated by local UDP sockets, and to authorize delivery of mbufs to local sockets both in the multicast/broadcast case and the unicast case. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 21:37:34 +00:00
Robert Watson	d00e44fb4a	Document the undocumented assumption that at least one of the PCB pointer and incoming mbuf pointer will be non-NULL in tcp_respond(). This is relied on by the MAC code for correctness, as well as existing code. Obtained from: TrustedBSD PRoject Sponsored by: DARPA, NAI Labs	2002-08-01 03:54:43 +00:00
Robert Watson	0070e096d7	Introduce support for Mandatory Access Control and extensible kernel access control. Add support for labeling most out-going ICMP messages using an appropriate MAC entry point. Currently, we do not explicitly label packet reflect (timestamp, echo request) ICMP events, implicitly using the originating packet label since the mbuf is reused. This will be made explicit at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 03:53:04 +00:00
Robert Watson	c488362e1a	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 19:06:49 +00:00
Robert Watson	4ea889c666	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the raw IP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check the socket and mbuf labels before permitting delivery to a socket, permitting MAC policies to selectively allow delivery of raw IP mbufs to various raw IP sockets that may be open. Restructure the policy checking code to compose IPsec and MAC results in a more readable manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 18:30:34 +00:00
Robert Watson	4ed84624a2	Introduce support for Mandatory Access Control and extensible kernel access control. When fragmenting an IP datagram, invoke an appropriate MAC entry point so that MAC labels may be copied (...) to the individual IP fragment mbufs by MAC policies. When IP options are inserted into an IP datagram when leaving a host, preserve the label if we need to reallocate the mbuf for alignment or size reasons. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 17:21:01 +00:00
Robert Watson	36b0360b37	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the code managing IP fragment reassembly queues (struct ipq) to invoke appropriate MAC entry points to maintain a MAC label on each queue. Permit MAC policies to associate information with a queue based on the mbuf that caused it to be created, update that information based on further mbufs accepted by the queue, influence the decision making process by which mbufs are accepted to the queue, and set the label of the mbuf holding the reassembled datagram following reassembly completetion. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 17:17:51 +00:00
Robert Watson	0ec4b12334	Introduce support for Mandatory Access Control and extensible kernel access control. When generating an IGMP message, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the target interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 16:46:56 +00:00
Robert Watson	19527d3e22	Introduce support for Mandatory Access Control and extensible kernel access control. When generating an ARP query, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 16:45:16 +00:00
Robert Watson	d3990b06e1	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the MAC framework to label mbuf created using divert sockets. These labels may later be used for access control on delivery to another socket, or to an interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI LAbs	2002-07-31 16:42:47 +00:00
Robert Watson	549e4c9e4e	Introduce support for Mandatory Access Control and extensible kernel access control. Label IP fragment reassembly queues, permitting security features to be maintained on those objects. ipq_label will be used to manage the reassembly of fragments into IP datagrams using security properties. This permits policies to deny the reassembly of fragments, as well as influence the resulting label of a datagram following reassembly. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 23:09:20 +00:00
Maxim Konovalov	d46a53126c	Use a common way to release locks before exit. Reviewed by: hsu	2002-07-29 09:01:39 +00:00
Don Lewis	5c38b6dbce	Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.	2002-07-28 19:59:31 +00:00
Hajimu UMEMOTO	66ef17c4b6	make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6. MFC after: 1 week	2002-07-25 18:10:04 +00:00
Hajimu UMEMOTO	eccb7001ee	cleanup usage of ip6_mapped_addr_on and ip6_v6only. now, ip6_mapped_addr_on is unified into ip6_v6only. MFC after: 1 week	2002-07-25 17:40:45 +00:00
Luigi Rizzo	be1826c354	Only log things net.inet.ip.fw.verbose is set	2002-07-24 02:41:19 +00:00
Ruslan Ermilov	61a875d706	Don't forget to recalculate the IP checksum of the original IP datagram embedded into ICMP error message. Spotted by: tcpdump 3.7.1 (-vvv) MFC after: 3 days	2002-07-23 00:16:19 +00:00
Ruslan Ermilov	88c39af35f	Don't shrink socket buffers in tcp_mss(), application might have already configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows. PR: kern/11966 MFC after: 1 week	2002-07-22 22:31:09 +00:00
Hajimu UMEMOTO	854d3b19a2	do not refer to IN6P_BINDV6ONLY anymore. Obtained from: KAME MFC after: 1 week	2002-07-22 15:51:02 +00:00
John Polstra	8ea8a6804b	Fix overflows in intermediate calculations in sysctl_msec_to_ticks(). At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle to be reported as negative. MFC after: 3 days	2002-07-20 23:48:59 +00:00
Robert Watson	69dac2ea47	Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel data structures pick up security and synchronization primitives, it becomes increasingly desirable not to arbitrarily export them via include files to userland, as the userland applications pick up new #include dependencies. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-20 22:46:20 +00:00
Matthew Dillon	d65bf08af3	Add the tcps_sndrexmitbad statistic, keep track of late acks that caused unnecessary retransmissions.	2002-07-19 18:29:38 +00:00
Matthew Dillon	701bec5a38	Introduce two new sysctl's: net.inet.tcp.rexmit_min (default 3 ticks equiv) This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only. net.inet.tcp.rexmit_slop (default 200ms) This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues. Note that the original code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant. Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits.	2002-07-18 19:06:12 +00:00
Luigi Rizzo	90780c4b05	Move IPFW2 definition before including ip_fw.h Make indentation of new parts consistent with the style used for this file.	2002-07-18 05:18:41 +00:00
Matthew Dillon	22fd54d461	I don't know how the minimum retransmit timeout managed to get set to one second but it badly breaks throughput on networks with minor packet loss. Complaints by: at least two people tracked down to this. MFC after: 3 days	2002-07-17 23:32:03 +00:00
Luigi Rizzo	318aa87b59	Fix a panic when doing "ipfw add pipe 1 log ..." Also synchronize ip_dummynet.c with the version in RELENG_4 to ease MFC's.	2002-07-17 07:21:42 +00:00
Luigi Rizzo	a8c102a2ec	Implement keepalives for dynamic rules, so they will not expire just because you leave your session idle. Also, put in a fix for 64-bit architectures (to be revised). In detail: ip_fw.h * Reorder fields in struct ip_fw to avoid alignment problems on 64-bit machines. This only masks the problem, I am still not sure whether I am doing something wrong in the code or there is a problem elsewhere (e.g. different aligmnent of structures between userland and kernel because of pragmas etc.) * added fields in dyn_rule to store ack numbers, so we can generate keepalives when the dynamic rule is about to expire ip_fw2.c * use a local function, send_pkt(), to generate TCP RST for Reset rules; * save about 250 bytes by cleaning up the various snprintf() in ipfw_log() ... * ... and use twice as many bytes to implement keepalives (this seems to be working, but i have not tested it extensively). Keepalives are generated once every 5 seconds for the last 20 seconds of the lifetime of a dynamic rule for an established TCP flow. The packets are sent to both sides, so if at least one of the endpoints is responding, the timeout is refreshed and the rule will not expire. You can disable this feature with sysctl net.inet.ip.fw.dyn_keepalive=0 (the default is 1, to have them enabled). MFC after: 1 day (just kidding... I will supply an updated version of ipfw2 for RELENG_4 tomorrow).	2002-07-14 23:47:18 +00:00
Luigi Rizzo	3956b02345	Avoid dereferencing a null pointer in ro_rt. This was always broken in HEAD (the offending statement was introduced in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev. 1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30. So I am also restoring these two lines in RELENG_4 now. We might need another few things from 1.99.2.30.	2002-07-12 22:08:47 +00:00
Don Lewis	2d20c83f93	Back out the previous change, since it looks like locking udbinfo provides sufficient protection.	2002-07-12 09:55:48 +00:00
Don Lewis	bb1dd7a45a	Lock inp while we're accessing it.	2002-07-12 08:05:22 +00:00
Don Lewis	0e1eebb846	Defer calling SYSCTL_OUT() until after the locks have been released.	2002-07-11 23:18:43 +00:00
Don Lewis	142b2bd644	Reduce the nesting level of a code block that doesn't need to be in an else clause.	2002-07-11 23:13:31 +00:00
Luigi Rizzo	c7ea683135	Change one variable to make it easier to switch between ipfw and ipfw2	2002-07-09 06:53:38 +00:00
Luigi Rizzo	b3063f064c	Fix a bug caused by dereferencing an invalid pointer when no punch_fw was used. Fix another couple of bugs which prevented rules from being installed properly. On passing, use IPFW2 instead of NEW_IPFW to compile the new code, and slightly simplify the instruction generation code.	2002-07-08 22:57:35 +00:00
Luigi Rizzo	d63b346ab1	No functional changes, but: Following Darren's suggestion, make Dijkstra happy and rewrite the ipfw_chk() main loop removing a lot of goto's and using instead a variable to store match status. Add a lot of comments to explain what instructions are supposed to do and how -- this should ease auditing of the code and make people more confident with it. In terms of code size: the entire file takes about 12700 bytes of text, about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!) for ipfw_log().	2002-07-08 22:46:01 +00:00
Luigi Rizzo	7d4d3e9051	Remove one unused command name.	2002-07-08 22:39:19 +00:00
Luigi Rizzo	5185195169	Forgot to update one field name in one of the latest commits.	2002-07-08 22:37:55 +00:00
Luigi Rizzo	5e43aef891	Implement the last 2-3 missing instructions for ipfw, now it should support all the instructions of the old ipfw. Fix some bugs in the user interface, /sbin/ipfw. Please check this code against your rulesets, so i can fix the remaining bugs (if any, i think they will be mostly in /sbin/ipfw). Once we have done a bit of testing, this code is ready to be MFC'ed, together with a bunch of other changes (glue to ipfw, and also the removal of some global variables) which have been in -current for a couple of weeks now. MFC after: 7 days	2002-07-05 22:43:06 +00:00
Brian Somers	27cc91fbf8	Remove trailing whitespace	2002-07-01 11:19:40 +00:00
Jesper Skriver	eb538bfd64	Extend the effect of the sysctl net.inet.tcp.icmp_may_rst so that, if we recieve a ICMP "time to live exceeded in transit", (type 11, code 0) for a TCP connection on SYN-SENT state, close the connection. MFC after: 2 weeks	2002-06-30 20:07:21 +00:00
Jonathan Lemon	0080a004d7	One possible code path for syncache_respond() is: syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B) Which winds up deleting a different entry from the syncache. Handle this by not utilizing the next entry in the timer chain until after syncache_respond() completes. The case of A == B should not be possible. Problem found by: Don Bowman <don@sandvine.com>	2002-06-28 19:12:38 +00:00
Doug Rabson	24f8fd9fd1	Fix warning. Reviewed by: luigi	2002-06-28 08:36:26 +00:00
Luigi Rizzo	9758b77ff1	The new ipfw code. This code makes use of variable-size kernel representation of rules (exactly the same concept of BPF instructions, as used in the BSDI's firewall), which makes firewall operation a lot faster, and the code more readable and easier to extend and debug. The interface with the rest of the system is unchanged, as witnessed by this commit. The only extra kernel files that I am touching are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In userland I only had to touch those programs which manipulate the internal representation of firewall rules). The code is almost entirely new (and I believe I have written the vast majority of those sections which were taken from the former ip_fw.c), so rather than modifying the old ip_fw.c I decided to create a new file, sys/netinet/ip_fw2.c . Same for the user interface, which is in sbin/ipfw/ipfw2.c (it still compiles to /sbin/ipfw). The old files are still there, and will be removed in due time. I have not renamed the header file because it would have required touching a one-line change to a number of kernel files. In terms of user interface, the new "ipfw" is supposed to accepts the old syntax for ipfw rules (and produce the same output with "ipfw show". Only a couple of the old options (out of some 30 of them) has not been implemented, but they will be soon. On the other hand, the new code has some very powerful extensions. First, you can put "or" connectives between match fields (and soon also between options), and write things like ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any This should make rulesets slightly more compact (and lines longer!), by condensing 2 or more of the old rules into single ones. Also, as an example of how easy the rules can be extended, I have implemented an 'address set' match pattern, where you can specify an IP address in a format like this: 10.20.30.0/26{18,44,33,22,9} which will match the set of hosts listed in braces belonging to the subnet 10.20.30.0/26 . The match is done using a bitmap, so it is essentially a constant time operation requiring a handful of CPU instructions (and a very small amount of memmory -- for a full /24 subnet, the instruction only consumes 40 bytes). Again, in this commit I have focused on functionality and tried to minimize changes to the other parts of the system. Some performance improvement can be achieved with minor changes to the interface of ip_fw_chk_t. This will be done later when this code is settled. The code is meant to compile unmodified on RELENG_4 (once the PACKET_TAG_* changes have been merged), for this reason you will see #ifdef __FreeBSD_version in a couple of places. This should minimize errors when (hopefully soon) it will be time to do the MFC.	2002-06-27 23:02:18 +00:00
Maxime Henrion	7627c6cbcc	Warning fixes for 64 bits platforms. With this last fix, I can build a GENERIC sparc64 kernel with -Werror. Reviewed by: luigi	2002-06-27 11:02:06 +00:00
Luigi Rizzo	713a6ea063	Just a comment on some additional consistency checks that could be added here.	2002-06-26 21:00:53 +00:00
Kenneth D. Merry	98cb733c67	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
Jeffrey Hsu	6fd22caf91	Avoid unlocking the inp twice if badport_bandlim() returns -1. Reported by: jlemon	2002-06-24 22:25:00 +00:00
Jeffrey Hsu	f14e4cfe33	Style bug: fix 4 space indentations that should have been tabs. Submitted by: jlemon	2002-06-24 16:47:02 +00:00
Luigi Rizzo	f10e85d797	Slightly restructure the #ifdef INET6 sections to make the code more readable. Remove the six "register" attributes from variables tcp_output(), the compiler surely knows well how to allocate them.	2002-06-23 21:25:36 +00:00
Luigi Rizzo	410bb1bfe2	Move two global variables to automatic variables within the only function where they are used (they are used with TCPDEBUG only).	2002-06-23 21:22:56 +00:00
Luigi Rizzo	4d2e36928d	Move some global variables in more appropriate places. Add XXX comments to mark places which need to be taken care of if we want to remove this part of the kernel from Giant. Add a comment on a potential performance problem with ip_forward()	2002-06-23 20:48:26 +00:00
Luigi Rizzo	51aed12e52	fix bad indentation and whitespace resulting from cut&paste	2002-06-23 09:15:43 +00:00
Luigi Rizzo	dfd1ae2f86	fix indentation of a comment	2002-06-23 09:14:24 +00:00
Luigi Rizzo	a5924d6100	fix a typo in a comment	2002-06-23 09:13:46 +00:00
Luigi Rizzo	ec3057db9e	Remove ip_fw_fwd_addr (forgotten in previous commit) remove some extra whitespace.	2002-06-23 09:03:42 +00:00
Luigi Rizzo	2b25acc158	Remove (almost all) global variables that were used to hold packet forwarding state ("annotations") during ip processing. The code is considerably cleaner now. The variables removed by this change are: ip_divert_cookie used by divert sockets ip_fw_fwd_addr used for transparent ip redirection last_pkt used by dynamic pipes in dummynet Removal of the first two has been done by carrying the annotations into volatile structs prepended to the mbuf chains, and adding appropriate code to add/remove annotations in the routines which make use of them, i.e. ip_input(), ip_output(), tcp_input(), bdg_forward(), ether_demux(), ether_output_frame(), div_output(). On passing, remove a bug in divert handling of fragmented packet. Now it is the fragment at offset 0 which sets the divert status of the whole packet, whereas formerly it was the last incoming fragment to decide. Removal of last_pkt required a change in the interface of ip_fw_chk() and dummynet_io(). On passing, use the same mechanism for dummynet annotations and for divert/forward annotations. option IPFIREWALL_FORWARD is effectively useless, the code to implement it is very small and is now in by default to avoid the obfuscation of conditionally compiled code. NOTES: * there is at least one global variable left, sro_fwd, in ip_output(). I am not sure if/how this can be removed. * I have deliberately avoided gratuitous style changes in this commit to avoid cluttering the diffs. Minor stule cleanup will likely be necessary * this commit only focused on the IP layer. I am sure there is a number of global variables used in the TCP and maybe UDP stack. * despite the number of files touched, there are absolutely no API's or data structures changed by this commit (except the interfaces of ip_fw_chk() and dummynet_io(), which are internal anyways), so an MFC is quite safe and unintrusive (and desirable, given the improved readability of the code). MFC after: 10 days	2002-06-22 11:51:02 +00:00
Jeffrey Hsu	2ded288c88	Fix logic which resulted in missing a call to INP_UNLOCK(). Submitted by: jlemon, mux	2002-06-21 22:54:16 +00:00
Jeffrey Hsu	2d40081d1f	TCP notify functions can change the pcb list.	2002-06-21 22:52:48 +00:00
Peter Wemm	532cf61bcf	Solve the 'unregistered netisr 18' information notice with a sledgehammer. Register the ISR early, but do not actually kick off the timer until we see some activity. This still saves us from running the arp timers on a system with no network cards.	2002-06-20 01:27:40 +00:00
Seigo Tanimura	03e4918190	Remove so*_locked(), which were backed out by mistake.	2002-06-18 07:42:02 +00:00
Jeffrey Hsu	3ce144ea88	Notify functions can destroy the pcb, so they have to return an indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)	2002-06-14 08:35:21 +00:00
Mike Silbersack	eb5afeba22	Re-commit w/fix: Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks This time, make sure that ipv4 specific code (aka all of the above) is only run in the ipv4 case.	2002-06-14 03:08:05 +00:00
Mike Silbersack	70d2b17029	Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :) Pointy-hat to: silby	2002-06-14 02:43:20 +00:00
Mike Silbersack	21c3b2fc69	Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks	2002-06-14 02:36:34 +00:00
Jeffrey Hsu	9c68f33a9d	Because we're holding an exclusive write lock on the head, references to the new inp cannot leak out even though it has been placed on the head list.	2002-06-13 23:14:58 +00:00
Jeffrey Hsu	61ffc0b1a6	The UDP head was unlocked too early in one unicast case. Submitted by: bug reported by arr	2002-06-12 15:21:41 +00:00
Jeffrey Hsu	73dca2078d	Fix logic which resulted in missing a call to INP_UNLOCK().	2002-06-12 03:11:06 +00:00
Jeffrey Hsu	3cfcc388ea	Fix typo where INP_INFO_RLOCK should be INP_INFO_RUNLOCK. Submitted by: tegge, jlemon Prefer LIST_FOREACH macro. Submitted by: jlemon	2002-06-12 03:08:08 +00:00
Jeffrey Hsu	7a9378e7f5	Remember to initialize the control block head mutex.	2002-06-11 10:58:57 +00:00
Jeffrey Hsu	3d9baf34c0	Fix typo. Submitted by: Kyunghwan Kim <redjade@atropos.snu.ac.kr>	2002-06-11 10:56:49 +00:00
Jeffrey Hsu	e98d6424af	Every array elt is initialized in the following loop, so remove unnecessary M_ZERO.	2002-06-10 23:48:37 +00:00

1 2 3 4 5 ...

1499 Commits