freebsd-nq

Author	SHA1	Message	Date
Qing Li	c7ab66020f	The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry. MFC after: 5 days	2009-12-30 21:35:34 +00:00
Shteryana Shopova	7c90b0258f	Make sure the multicast forwarding cache entry's stall queue is properly initialized before trying to insert an entry into it. PR: kern/142052 Reviewed by: bms MFC after: now	2009-12-30 08:52:13 +00:00
Luigi Rizzo	bcd3b68dd2	we really need htonl() here, see the comment a few lines above in the code.	2009-12-29 00:02:57 +00:00
Antoine Brodin	13e403fdea	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month	2009-12-28 22:56:30 +00:00
Bjoern A. Zeeb	fc74d005d9	Make the compiler happy after r201125: - + remove two unnecessary initializations in ip_output; + + remove one unnecessary initializations in ip_output;	2009-12-28 21:14:18 +00:00
Luigi Rizzo	ec396e61ed	introduce a local variable rte acting as a cache of ro->ro_rt within ip_output, achieving (in random order of importance): - a reduction of the number of 'r's in the source code; - improved legibility; - a reduction of 64 bytes in the .text	2009-12-28 14:48:32 +00:00
Luigi Rizzo	ca8b83b0fa	+ remove an unused #define print_ip; + remove two unnecessary initializations in ip_output; + localize 'len'; + introduce a temporary variable n to count the number of fragments, the compiler seems unable to identify a common subexpression (written 3 times, used twice); + document some assumptions on ip_len and ip_hl	2009-12-28 14:09:46 +00:00
Luigi Rizzo	e59084e086	bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects to find it there. Unfortunately this reintroduces the dependency on ip_fw_pfil.c	2009-12-28 12:29:13 +00:00
Luigi Rizzo	830c6e2b97	bring in several cleanups tested in ipfw3-head branch, namely: r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files. - move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h - document the structure of the packet tags used for dummynet and netgraph; r201049 - merge some common code to attach/detach hooks into a single function. r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need. r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option) No ABI changes in this commit. MFC after: 1 week	2009-12-28 10:47:04 +00:00
Luigi Rizzo	6cc7b9f5d9	readability fixes -- add braces on large blocks, remove unnecessary initializations	2009-12-28 10:19:53 +00:00
Luigi Rizzo	6730dcaec7	explain details of operation of table lookups, and improve portability	2009-12-28 10:12:35 +00:00
Luigi Rizzo	2082ecd966	diverted packet must re-enter _after_ the matching rule, or we create loops. The divert cookie (that can be set from userland too) contains the matching rule nr, so we must start from nr+1. Reported by: Joe Marcus Clarke	2009-12-27 10:19:10 +00:00
Luigi Rizzo	4a3c1bd27f	fix poor indentation resulting from a merge	2009-12-24 17:35:28 +00:00
Luigi Rizzo	84918f5bc8	mostly style changes, such as removal of trailing whitespace, reformatting to avoid unnecessary line breaks, small block restructuring to avoid unnecessary nesting, replace macros with function calls, etc. As a side effect of code restructuring, this commit fixes one bug: previously, if a realloc() failed, memory was leaked. Now, the realloc is not there anymore, as we first count how much memory we need and then do a single malloc.	2009-12-23 18:53:11 +00:00
Luigi Rizzo	3ae19c3ba3	fix build with the new fast lookup structure. Also remove some unnecessary headers	2009-12-23 12:15:21 +00:00
Luigi Rizzo	6aab896346	fix build on 64-bit architectures. Also fix the indentation on a few lines.	2009-12-23 12:00:50 +00:00
Luigi Rizzo	de240d1013	merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw. In detail: 1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK. 2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N). 3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1). 4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id> All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t Operation costs now are as follows: Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) ------------------------------------------------------------------- The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI Supported by: Valeria Paoli MFC after: 1 month	2009-12-22 19:01:47 +00:00
John Baldwin	43d9473499	- Rename the __tcpi_(snd\|rcv)_mss fields of the tcp_info structure to remove the leading underscores since they are now implemented. - Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info structure. Reviewed by: rwatson MFC after: 2 weeks	2009-12-22 15:47:40 +00:00
Luigi Rizzo	46fdc2bf60	some mostly cosmetic changes in preparation for upcoming work: + in many places, replace &V_layer3_chain with a local variable chain; + bring the counter of rules and static_len within ip_fw_chain replacing static variables; + remove some spurious comments and extern declaration; + document which lock protects certain data structures	2009-12-22 13:53:34 +00:00
Ruslan Ermilov	bec5f27f73	Added proper attribution. Requested by: luigi	2009-12-18 17:22:21 +00:00
Luigi Rizzo	1328a38b96	Add some experimental code to log traffic with tcpdump, similar to pflog(4). To use the feature, just put the 'log' options on rules you are interested in, e.g. ipfw add 5000 count log .... and run tcpdump -ni ipfw0 ... net.inet.ip.fw.verbose=0 enables logging to ipfw0, net.inet.ip.fw.verbose=1 sends logging to syslog as before. More features can be added, similar to pflog(), to store in the MAC header metadata such as rule numbers and actions. Manpage to come once features are settled.	2009-12-17 23:11:16 +00:00
Luigi Rizzo	60ab046a41	simplify and document lookup_next_rule()	2009-12-17 17:27:12 +00:00
Luigi Rizzo	59cd9f65f9	simplify the code that finds the next rule after reinjections MFC after: 1 week	2009-12-17 12:27:54 +00:00
Luigi Rizzo	53638988bc	remove a duplicate sysctl entry	2009-12-16 18:03:35 +00:00
Luigi Rizzo	1b5691c61e	bring back a couple of #include that are supplied by nesting, and explain why they are used.	2009-12-16 13:00:37 +00:00
Luigi Rizzo	97219abf05	Various cosmetic cleanup of the files: - move global variables around to reduce the scope and make them static if possible; - add an ipfw_ prefix to all public functions to prevent conflicts (the same should be done for variables); - try to pack variable declaration in an uniform way across files; - clarify some comments; - remove some misspelling of names (#define V_foo VNET(bar)) that slipped in due to cut&paste - remove duplicate static variables in different files; MFC after: 1 month	2009-12-16 10:48:40 +00:00
Warner Losh	26bbc1fc5a	Quick fix to make this compile: Remove redundant extern declearations. If the maintainer has a better fix, then feel free to back this out.	2009-12-16 03:26:37 +00:00
Luigi Rizzo	22f123afad	more splitting of ip_fw2.c, now extract the 'table' routines and the sockopt routines (the upper half of the kernel). Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?) please change the attribution in ip_fw_table.c. I have copied the copyright line from ip_fw2.c but it carries my name and I have neither written nor designed the feature so I don't deserve the credit. MFC after: 1 month	2009-12-15 21:24:12 +00:00
Luigi Rizzo	70228fb346	Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part. Files touched by this commit: conf/files now references the two new source files netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h. netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions netinet/ipfw/ip_fw_log.c ipfw_log and related functions netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer. A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them. MFC after: 1 month	2009-12-15 16:15:14 +00:00
Luigi Rizzo	472099c4b0	implement a new match option, lookup {dst-ip\|src-ip\|dst-port\|src-port\|uid\|jail} N which searches the specified field in table N and sets tablearg accordingly. With dst-ip or src-ip the option replicates two existing options. When used with other arguments, the option can be useful to quickly dispatch traffic based on other fields. Work supported by the Onelab project. MFC after: 1 week	2009-12-15 09:46:27 +00:00
Bjoern A. Zeeb	de0bd6f76b	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days	2009-12-13 13:57:32 +00:00
Luigi Rizzo	b2089673e5	use div64 when converting back the burst value for userland	2009-12-10 18:37:14 +00:00
Luigi Rizzo	89717f91ef	when draining a flowset free the entire chain, not just one packet.	2009-12-10 18:34:07 +00:00
Luigi Rizzo	478cae8a97	centralize the code to free a packet (or a chain) while in dummynet. Remove an old macro and its stale comment.	2009-12-10 15:17:34 +00:00
Oleg Bulyzhin	22746035ec	Fix burst processing for WF2Q pipes - do not increase available burst size unless pipe is idle. This should fix follwing issues: - 'dummynet: OUCH! pipe should have been idle!' log messages. - exceeding configured pipe bandwidth. MFC after: 1 week	2009-12-05 23:27:21 +00:00
Luigi Rizzo	f573a0a634	adjust comment in previous commit after Julian's explanation	2009-12-05 11:51:32 +00:00
Luigi Rizzo	bc0d5982e2	remove a dead block of code, document how the ipfw clients are hooked and the difference in handling the 'enable' variable for layer2 and layer3. The latter needs fixing once i figure out how it worked pre-vnet. MFC after: 7 days	2009-12-05 09:13:06 +00:00
Luigi Rizzo	e99816f1eb	fix build with VNET enabled Reported by: David Wolfskill	2009-12-05 08:32:12 +00:00
Hajimu UMEMOTO	2ea64e8ef9	Use INET_ADDRSTRLEN and INET6_ADDRSTRLEN rather than hard coded number. Spotted by: bz	2009-12-04 15:39:37 +00:00
Luigi Rizzo	4f60c0b97d	preparation work to replace the monster switch in ipfw_chk() with table of functions. This commit (which is heavily based on work done by Marta Carbone in this year's GSOC project), removes the goto's and explicit return from the inner switch(), so we will have a easier time when putting the blocks into individual functions. MFC after: 3 weeks	2009-12-03 14:22:15 +00:00
Hajimu UMEMOTO	a22e82b87b	Teach an IPv6 to the debug prints.	2009-12-03 11:16:53 +00:00
Luigi Rizzo	3c95089ef4	- initialize src_ip in the main loop to prevent a compiler warning (gcc 4.x under linux, not sure how real is the complaint). - rename a macro argument to prevent name clashes. - add the macro name on a couple of #endif - add a blank line for readability. MFC after: 3 days	2009-12-02 17:50:52 +00:00
Luigi Rizzo	3429911d4d	Dispatch sockopt calls to ipfw and dummynet using the new option numbers, IP_FW3 and IP_DUMMYNET3. Right now the modules return an error if called with those arguments so there is no danger of unwanted behaviour. MFC after: 3 days	2009-12-02 15:50:43 +00:00
Luigi Rizzo	0a13f6b1b3	small changes for portability and diff reduction wrt/ FreeBSD 7. No functional differences. - use the div64() macro to wrap 64 bit divisions (which almost always are 64 / 32 bits) so they are easier to handle with compilers or OS that do not have native support for 64bit divisions; - use a local variable for p_numbytes even if not strictly necessary on HEAD, as it reduces diffs with FreeBSD7 - in dummynet_send() check that a tag is present before dereferencing the pointer. - add a couple of blank lines for readability near the end of a function MFC after: 3 days	2009-12-02 15:20:31 +00:00
Hajimu UMEMOTO	fd63c04193	Teach an IPv6 to send_pkt() and ipfw_tick(). It fixes the issue which keep-alive doesn't work for an IPv6. PR: kern/117234 Submitted by: mlaier, Joost Bekkers <joost__at__jodocus.org> MFC after: 1 month	2009-12-02 14:32:01 +00:00
Gleb Smirnoff	e81ab87652	Until this moment carp(4) used a strange logging priority. It used debug priority for such important information as MASTER/BACKUP state change, and used a normal logging priority for such innocent messages as receiving short packet (which is a normal VRRP packet between some other routers) or receving a CARP packet on non-carp interface (someone else running CARP). This commit shifts message logging priorities to a more sane default.	2009-12-02 13:24:21 +00:00
Luigi Rizzo	de9fc6bcd4	Add new sockopt names for ipfw and dummynet. This commit is just grabbing entries for the new names that will be used in the future, so you don't need to rebuild anything now. MFC after: 3 days	2009-12-02 10:36:41 +00:00
Luigi Rizzo	9565806f16	change the type of the opcode from enum *:8 to u_int8_t so the size and alignment of the ipfw_insn is not compiler dependent. No changes in the code generated by gcc. There was only one instance of this kind in our entire source tree, so i suspect the old definition was a poor choice (which i made). MFC after: 3 days	2009-12-02 08:52:06 +00:00
Michael Tuexen	dec7fa27c6	Use the default stack size for the iterator thread. This fixes a crash reported by Irene Ruengeler. Approved by: rrs (mentor) MFC after: 1 month	2009-11-27 17:25:19 +00:00
Bruce M Simpson	a8cf681de2	Correct a comment. MFC after: 1 day	2009-11-19 13:21:37 +00:00
Michael Tuexen	7e6206af12	Fix a bug where the system panics when a SHUTDOWN is received with an illegal TSN. Approved by: rrs (mentor) MFC after: ASAP	2009-11-18 12:17:06 +00:00
Michael Tuexen	0e891bcdc1	Get rid of unused fields addr_over which is never really used, only copied around. Approved by: rrs (mentor)	2009-11-17 23:03:38 +00:00
Michael Tuexen	83fc1165c5	Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY, which is defined as LIST_EMPTY. Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 20:56:14 +00:00
Michael Tuexen	2ab6846a23	Fix a bug where queued ASCONF messags are not sent out. Approved by: rrs (mentor) Obtained from: Irene Ruengeler MFC after: 1 month	2009-11-17 13:36:21 +00:00
Michael Tuexen	b6c5780299	Fix a memory leak when destroying an SCTP stack. Clean up sctp_pcb_finish(). Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 13:13:58 +00:00
Michael Tuexen	87b4fcd323	Do not start the iterator when there are no associations. This fixes a bug found by Irene Ruengeler. Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 13:11:23 +00:00
Michael Tuexen	1e01164145	Disable (temporary) the thread based interator. It does not work with vnet. Approved by: rrs (mentor)	2009-11-17 13:09:50 +00:00
Michael Tuexen	cf458c646d	Allow the UMA to free data. This resolves the UMA related bug reported by Julian. Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 13:08:15 +00:00
Michael Tuexen	7a9b5b2040	Do not hold the lock longer than necessary. Approved by: rrs (mentor) MFC after: 1 month	2009-11-17 13:05:51 +00:00
Bruce M Simpson	793c70425a	Fix a functional regression in multicast. Userland daemons need to see IGMP traffic regardless of the group; omit the imo filter check if the proto is IGMP. The kernel part of IGMP will have already filtered appropriately at this point. MFC after: ASAP Submitted by: Franz Struwig Reported by: Ivor Prebeg, Franz Struwig	2009-11-15 11:07:22 +00:00
Attilio Rao	758801232c	Move inet_aton() (specular to inet_ntoa(), already present in libkern) into libkern in order to made it usable by other modules than alias_proxy. Obtained from: Sandvine Incorporated Sponsored by: Sandvine Incorporated MFC: 1 week	2009-11-12 00:46:28 +00:00
Edward Tomasz Napierala	4f7418a09f	Remove ifdefed out part of code, which seems to have originated a decade ago in OpenBSD. As it is now, there is no way for this to be useful, since IPsec is free to forward packets via whatever interface it wants, so checking capabilities of the interface passed from ip_output (fetched from the routing table) serves no purpose. Discussed with: sam@	2009-11-09 19:53:34 +00:00
Oleg Bulyzhin	57edc1bbf3	style(9): add missing parentheses	2009-11-09 09:12:45 +00:00
John Baldwin	c6d9480519	Several years ago a feature was added to TCP that casued soreceive() to send an ACK right away if data was drained from a TCP socket that had previously advertised a zero-sized window. The current code requires the receive window to be exactly zero for this to kick in. If window scaling is enabled and the window is smaller than the scale, then the effective window that is advertised is zero. However, in that case the zero-sized window handling is not enabled because the window is not exactly zero. The fix changes the code to check the raw window value against zero. Reviewed by: bz MFC after: 1 week	2009-11-06 16:55:05 +00:00
Oleg Bulyzhin	5661377e37	Fix two issues that can lead to exceeding configured pipe bandwidth: - do not expire queues which are not ready to be expired. - properly calculate available burst size. MFC after: 3 days	2009-11-03 08:41:14 +00:00
Michael Tuexen	08abf6399a	Improve round robin stream scheduler and cleanup some code. Approved by: rrs (mentor) MFC after: 3 days	2009-10-29 17:40:33 +00:00
Christian Brueffer	621882f0bc	Close a stream file descriptor leak. PR: 138130 Submitted by: Patroklos Argyroudis <argp@census-labs.com> MFC after: 1 week	2009-10-28 12:10:29 +00:00
Michael Tuexen	d18f7e0a98	Bugfix: Use formula from section 7.2.3 of RFC 4960. Reported by Martin Becke. Approved by: rrs (mentor) MFC after: 3 days	2009-10-27 18:17:07 +00:00
Michael Tuexen	ac9bce0f3b	Improve the round robin stream scheduler. Approved by: rrs (mentor) MFC after: 3 days	2009-10-26 19:23:34 +00:00
Robert Watson	99b96cf934	Correct spelling typo in ip_input comment. Pointed out by: N.J. Mann <njm at njm.me.uk>, John Nielsen <john at jnielsen.net>, julian (!), lstewart MFC after: 2 days	2009-10-24 09:18:26 +00:00
Qing Li	6cb2b4e7a8	Use the correct option name in the preprocessor command to enable or disable diagnostic messages. Reviewed by: ru MFC after: 3 days	2009-10-23 18:27:34 +00:00
Robert Watson	0d3d0d74ea	Improve grammar in ip_input comment while attempting to maintain what might be its meaning. MFC after: 3 days	2009-10-23 13:35:00 +00:00
Qing Li	fc02323563	In the ARP callout timer expiration function, the current time_second is compared against the entry expiration time value (that was set based on time_second) to check if the current time is larger than the set expiration time. Due to the +/- timer granularity value, the comparison returns false, causing the alternative code to be executed. The alternative code path freed the memory without removing that entry from the table list, causing a use-after-free bug. Reviewed by: discussed with kmacy MFC after: immediately Verified by: rnoland, yongari	2009-10-20 17:55:42 +00:00
Robert Watson	6426657e9f	Rewrap ip_input() comment so that it prints more nicely. MFC after: 3 days	2009-10-18 11:23:56 +00:00
Qing Li	93704ac5d7	This patch fixes the following issues in the ARP operation: 1. There is a regression issue in the ARP code. The incomplete ARP entry was timing out too quickly (1 second timeout), as such, a new entry is created each time arpresolve() is called. Therefore the maximum attempts made is always 1. Consequently the error code returned to the application is always 0. 2. Set the expiration of each incomplete entry to a 20-second lifetime. 3. Return "incomplete" entries to the application. Reviewed by: kmacy MFC after: 3 days	2009-10-15 06:12:04 +00:00
Bjoern A. Zeeb	852da713c3	Compare pointer to NULL rather than 0. MFC after: 1 month	2009-10-13 20:29:14 +00:00
Michael Tuexen	f71e78a1d9	Fix a race condition where a mutex was destroyed while sleeping on it. Found while analyzing a report from julian. It might fix his bug. Approved by: rrs (mentor) MFC after: 3 days	2009-10-11 12:23:56 +00:00
Julian Elischer	0b4b0b0fee	Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting. Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months	2009-10-11 05:59:43 +00:00
Michael Tuexen	45623593fb	Correct include order as indicated by bz. Approved by: re (mentor) MFC after: 3 days	2009-10-10 13:59:18 +00:00
Michael Tuexen	3b1de911e0	Do not include vnet.h twice. Approved by: rrs (mentor) MFC after: 3 days	2009-10-09 19:30:23 +00:00
Michael Tuexen	9dd512290c	Use correct arguments when calling SCTP_RTALLOC(). Approved by: rrs (mentor) MFC after: 0 days	2009-10-08 20:33:12 +00:00
Randall Stewart	806a5b8414	Fix so that round robing stream scheduling works as advertised MFC after: 0 days	2009-10-08 11:36:06 +00:00
Robert Watson	f681a5fdd4	Remove tcp_input lock statistics; these are intended for debugging only and are not intended to ship in 8.0 as they dirty additional cache lines in a performance-critical per-packet path. MFC after: 3 days	2009-10-06 20:35:41 +00:00
Robert Watson	883e9bc41d	In tcp_input(), we acquire a global write lock at first only if a segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN). If we later have to upgrade the lock, we acquire an inpcb reference and drop both global/inpcb locks before reacquiring in-order. In that gap, the connection may transition into TIMEWAIT, so we need to loop back and reevaluate the inpcb after relocking. MFC after: 3 days Reported by: Kamigishi Rei <spambox at haruhiism.net> Reviewed by: bz	2009-10-05 22:24:13 +00:00
Qing Li	b4a22c365c	Remove a log message from production code. This log message can be triggered by a misconfigured host that is sending out gratuious ARPs. This log message can also be triggered during a network renumbering event when multiple prefixes co-exist on a single network segment. MFC after: immediately	2009-10-02 01:45:11 +00:00
Qing Li	fa3cfd39ff	Previously, if an address alias is configured on an interface, and this address alias has a prefix matching that of another address configured on the same interface, then the ARP entry for the alias is not deleted from the ARP table when that address alias is removed. This patch fixes the aforementioned issue. PR: kern/139113 MFC after: 3 days	2009-10-02 01:34:55 +00:00
Michael Tuexen	4b6492f5ab	Fix handling of sctp_drain(). Approved by: rrs (mentor) MFC after: 2 month	2009-09-20 11:33:39 +00:00
Michael Tuexen	2c19e7fa86	Fix errnos. Approved by: rrs(mentor) MFC after: 3 days.	2009-09-20 11:32:22 +00:00
Michael Tuexen	4af6c75c39	Use appropriate locking when using interface list. Approved by: rrs (mentor) MFC after: 1 month.	2009-09-19 14:55:12 +00:00
Michael Tuexen	30c3a8430c	Fix the disabling of sctp_drain(). Approved by: rrs (mentor) MFC after: 1 month.	2009-09-19 14:18:42 +00:00
Michael Tuexen	8518270e20	Get SCTP working in combination with VIMAGE. Contains code from bz. Approved by: rrs (mentor) MFC after: 1 month.	2009-09-19 14:02:16 +00:00
Bruce M Simpson	99bf30cf01	Return ENOBUFS consistently if user attempts to exceed in_mcast_maxsocksrc resource limit. Submitted by: syrinx MFC after: 3 days	2009-09-18 15:12:31 +00:00
Randall Stewart	482444b4a5	Support for VNET in SCTP (hopefully)	2009-09-17 15:11:12 +00:00
Michael Tuexen	d830c305ea	Fix a bug reported by Daniel Mentz: When authenticating DATA chunks some DATA chunks might get stuck when the MTU gets decreased via an ICMP message. Approved by: rrs (mentor) MFC after: immediately	2009-09-16 14:23:31 +00:00
Mike Silbersack	b8614722ff	Add the ability to see TCP timers via netstat -x. This can be a useful feature when you have a seemingly stuck socket and want to figure out why it has not been closed yet. No plans to MFC this, as it changes the netstat sysctl ABI. Reviewed by: andre, rwatson, Eric Van Gyzen	2009-09-16 05:33:15 +00:00
Andre Oppermann	11c99a6d7b	-Put the optimized soreceive_stream() under a compile time option called TCP_SORECEIVE_STREAM for the time being. Requested by: brooks Once compiled in make it easily switchable for testers by using a tuneable net.inet.tcp.soreceive_stream and a corresponding read-only sysctl to report the current state. Suggested by: rwatson MFC after: 2 days	2009-09-15 22:23:45 +00:00
Qing Li	9bb7d0f47a	Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly. Reviewed by: bz MFC after: immediately	2009-09-15 19:18:34 +00:00
Qing Li	cd29a7797d	This patch enables the node to respond to ARP requests for configured proxy ARP entries. Reviewed by: bz MFC after: immediately	2009-09-15 18:39:27 +00:00
Qing Li	96ed1732bb	The bootp code installs an interface address and the nfs client module tries to install the same address again. This extra code is removed, which was discovered by the removal of a call to in_ifscrub() in r196714. This call to in_ifscrub is put back here because the SIOCAIFADDR command can be used to change the prefix length of an existing alias. Reviewed by: kmacy	2009-09-15 01:01:03 +00:00
Qing Li	f0bb05fca5	Previously local end of point-to-point interface is not reachable within the system that owns the interface. Packets destined to the local end point leak to the wire towards the default gateway if one exists. This behavior is changed as part of the L2/L3 rewrite efforts. The local end point is now reachable within the system. The inpcb code needs to consider this fact during the address selection process. Reviewed by: bz MFC after: immediately	2009-09-14 22:19:47 +00:00
Randall Stewart	f3d06a3c68	Fixes two bugs: 1) A lock issue, if we ever had to try again we would double lock the INP lock. 2) We were allowing (at wrap) associd 0... which really we cannot allow since 0 normally means in most socket API calls that we are wishing to effect something on the INP not TCB. MFC after: 1 week	2009-09-13 17:45:31 +00:00
Bruce M Simpson	6cbbe26f98	In expire_mfc(), add an assert on the multicast forwarding cache mutex. PR: 138666	2009-09-13 01:00:24 +00:00
Bruce M Simpson	fa2eebfce6	Comment some flawed assumptions in inp_join_group() about mixing SSM full-state and delta-based APIs. ENOTIME to fix right now. No functional changes. MFC after: 5 days	2009-09-12 20:37:44 +00:00
Bruce M Simpson	0eebc0d7b4	Don't allow joins w/o source on an existing group. This is almost always pilot error. We don't need to check for group filter UNDEFINED state at t1, because we only ever allocate filters with their groups, so we unconditionally reject such calls with EINVAL. Trying to change the active filter mode w/o going through IP_MSFILTER is also disallowed. Deals with the case described in PR 137164 upfront, cumulative with the fix in svn rev 197132 which only calls imo_match_source() if the source address family was not unspecified. PR: 137164 MFC after: 5 days	2009-09-12 20:18:23 +00:00
Bruce M Simpson	1fc39d5424	Tighten input checking in inp_join_group(): * Don't try to use the source address, when its family is unspecified. * If we get a join without a source, on an existing inclusive mode group, this is an error, as it would change the filter mode. Fix a problem with the handling of in_mfilter for new memberships: * Do not rely on imf being NULL; it is explicitly initialized to a non-NULL pointer when constructing a membership. * Explicitly initialize *imf to EX mode when the source address is unspecified. This fixes a problem with in_mfilter slot recycling in the join path. PR: 138690 Submitted by: Stef Walter MFC after: 5 days	2009-09-12 19:45:55 +00:00
Bruce M Simpson	cc5776b24d	Fix an obvious logic error in the IPv4 multicast leave processing, where the filter mode vector was not updated correctly after the leave. PR: 138691 Submitted by: Stef Walter MFC after: 5 days	2009-09-12 19:07:03 +00:00
Bruce M Simpson	67e89408e5	Fix an API issue in leave processing for IPv4 multicast groups. * Do not assume that the group lookup performed by imo_match_group() is valid when ifp is NULL in this case. * Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the membership we are being asked to leave. Caveat user: * The way IPv4 multicast memberships are implemented in the inpcb layer at the moment, has the side-effect that struct ip_moptions will still hold the membership, under the old ifp, until ip_freemoptions() is called for the parent inpcb. * The underlying issue is: the inpcb layer does not get notification of ifp being detached going away in a thread-safe manner. This is non-trivial to fix. But hey, at least the kernel should't panic when you unplug a card. PR: 138689 Submitted by: Stef Walter MFC after: 5 days	2009-09-12 18:55:15 +00:00
Navdeep Parhar	9a31144537	Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite. The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed. Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month	2009-09-08 21:17:17 +00:00
Poul-Henning Kamp	2ac047d1fe	Move the duplicate definition of struct sockaddr_storage to its own include file, and include this where the previous duplicate definitions were. Static program checkers like FlexeLint rightfully take a dim view of duplicate definitions, even if they currently are identical.	2009-09-08 10:39:38 +00:00
Shteryana Shopova	e72ae6eafd	When joining a multicast group, the inp_lookup_mcast_ifp call does a KASSERT that the group address is multicast, so the check if this is indeed true and eventually return a EINVAL if not, should be done before calling inp_lookup_mcast_ifp. This fixes a kernel crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...) with invalid group address. Reviewed by: bms Approved by: bms MFC after: 3 days	2009-09-07 16:00:33 +00:00
Pawel Jakub Dawidek	360488410f	Correct comment.	2009-09-06 07:29:22 +00:00
George V. Neville-Neil	54fc657d59	Add ARP statistics to the kernel and netstat. New counters now exist for: requests sent replies sent requests received replies received packets received total packets dropped due to no ARP entry entrys timed out Duplicate IPs seen The new statistics are seen in the netstat command when it is given the -s command line switch. MFC after: 2 weeks In collaboration with: bz	2009-09-03 21:10:57 +00:00
Bjoern A. Zeeb	cc7e9d4325	In case an upper layer protocol tries to send a packet but the L2 code does not have the ethernet address for the destination within the broadcast domain in the table, we remember the original mbuf in `la_hold' in arpresolve() and send out a different packet with an arp request. In case there will be more upper layer packets to send we will free an earlier one held in `la_hold' and queue the new one. Once we get a packet in, with which we can perfect our arp table entry we send out the original 'on hold' packet, should there be any. Rather than continuing to process the packet that we received, we returned without freeing the packet that came in, which basically means that we leaked an mbuf for every arp request we sent. Rather than freeing the received packet and returning, continue to process the incoming arp packet as well. This should (a) improve some setups, also proxy-arp, in case it was an incoming arp request and (b) resembles the behaviour FreeBSD had from day 1, which alignes with RFC826 "Packet reception" (merge case). Rename 'm0' to 'hold' to make the code more understandable as well as diffable to earlier versions more easily. Handle the link-layer entry 'la' lock comepletely in the block where needed and release it as early as possible, rather than holding it longer, down to the end of the function. Found by: pointyhat, ns1 Bug hunting session with: erwin, simon, rwatson Tested by: simon on cluster machines Reviewed by: ratson, kmacy, julian MFC after: 3 days	2009-09-01 17:53:01 +00:00
Qing Li	1bf38b1292	This patch fixes the following issues: - Routing messages are not generated when adding and removing interface address aliases. - Loopback route installed for an interface address alias is not deleted from the routing table when that address alias is removed from the associated interface. - Function in_ifscrub() is called extraneously. Reviewed by: gnn, kmacy, sam MFC after: 3 days	2009-08-31 21:02:48 +00:00
Michael Tuexen	2b77dd0181	Fix a bug where vlan interfaces are not supported by SCTP. Approved by: rrs (mentor) MFC after: 3 days	2009-08-28 08:41:59 +00:00
Qing Li	0437a93339	Do not try to free the rt_lle entry of the cached route in ip_output() if the cached route was not initialized from the flow-table. The rt_lle entry is invalid unless it has been initialized through the flow-table. Reviewed by: kmacy, rwatson MFC after: immediately	2009-08-28 05:37:31 +00:00
Robert Watson	dc56e98f0d	Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held. Reviewed by: bz, kmacy MFC after: 3 days	2009-08-25 09:52:38 +00:00
Michael Tuexen	24ae5c4a73	This fixes a bug where the value set by SCTP_PARTIAL_DELIVERY_POINT was not honored, if the socket buffer size was not 4 times that large. Approved by: rrs (mentor) MFC after: 3 days.	2009-08-24 11:46:40 +00:00
Randall Stewart	0fa753b3fb	This fixes two bugs in the NR-Sack code: 1) When calculating the table offset for sliding the sack array, the two byte values must be "ored" together in order for us to do the correct sliding of the arrays. 2) We were NOT properly doing CC and other changes to things only NR-Sacked. The solution here is to make a separate function that will actually do both CC/updates and free things if its NR sack'd. This actually shrinks out common code from three places (much better). MFC after: 3 days	2009-08-24 11:13:32 +00:00
Marko Zec	2b73aacaf9	Introduce a div_destroy() function which takes over per-vnet cleanup tasks from the existing modevent / MOD_UNLOAD handler, and register div_destroy() in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds. In nooptions VIMAGE builds, div_destroy() will be invoked from the modevent handler, resulting in effectively identical operation as it was prior this change. div_destroy() also tears down hashtables used by ipdivert, which were previously left behind on ipdivert kldunloads. For options VIMAGE builds only, temporarily disable kldunloading of ipdivert, because without introducing additional locking logic it is impossible to atomically check whether all ipdivert instances in all vnets are idle, and proceed with cleanup without opening a race window for a vnet to open an ipdivert socket while ipdivert tear-down is in progress. While here, staticize div_init(), because it is not used outside of ip_divert.c. In cooperation with: julian Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:06:02 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Julian Elischer	d3cef1d91e	Fix another typo right next to the previous one, that amazingly, I did not see before. MFC after: 1 week	2009-08-23 08:49:32 +00:00
Julian Elischer	8f26c03fe6	Fix typo in comment that has been bugging me for days. MFC after: 1 week	2009-08-23 07:59:28 +00:00
Julian Elischer	c4b21cbe4a	Fix ipfw's initialization functions to get the correct order of evaluation to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers instead of the modevent handler. Correct some spelling errors in comments in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when ipfw is unloaded. This patch is a minimal patch for 8.0 I have a much larger patch that actually fixes the underlying problems that will be applied after 8.0 Reviewed by: zec@, rwatson@, bz@(earlier version) Approved by: re (rwatson) MFC after: Immediatly	2009-08-21 11:20:10 +00:00
Peter Wemm	b4e7e7a065	Fix signed comparison bug when ticks goes negative after 24 days of uptime. This causes the tcp time_wait state code to fail to expire sockets in timewait state. Approved by: re (kensmith)	2009-08-20 22:53:28 +00:00
Will Andrews	52e12426d1	Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when CARP tries to free them using M_IFADDR after the last address for a virtual host is removed and when detaching from the parent interface. Reviewed by: mlaier Approved by: re (kib), ken (mentor)	2009-08-20 02:33:12 +00:00
Michael Tuexen	2f99457b0c	Fix a bug in the handling of unreliable messages which results in stalled associations. Approved by: re, rrs (mentor) MFC after: immediately	2009-08-19 12:02:28 +00:00
Kip Macy	3ee42584f9	- change the interface to flowtable_lookup so that we don't rely on the mbuf for obtaining the fib index - check that a cached flow corresponds to the same fib index as the packet for which we are doing the lookup - at interface detach time flush any flows referencing stale rtentrys associated with the interface that is going away (fixes reported panics) - reduce the time between cleans in case the cleaner is running at the time the eventhandler is called and the wakeup is missed less time will elapse before the eventhandler returns - separate per-vnet initialization from global initialization (pointed out by jeli@) Reviewed by: sam@ Approved by: re@	2009-08-18 20:28:58 +00:00
Michael Tuexen	627dfd6df9	Fix a crash when using one-to-one stlye socket in non-blocking mode and there is no listening server. PR: 137795 Approved by: re, rrs (mentor) MFC after:immediately.	2009-08-18 19:58:49 +00:00
Michael Tuexen	810ec53688	* Fix a bug where PR-SCTP settings are ignore when using implicit association setup. * Fix a bug where message with illegal stream ids are not deleted. * Fix a crash when reporting back unsent messages from the send_queue. * Fix a bug related to INIT retransmission when the socket is already closed. * Fix a bug where associations were stalled when partial delivery API was enabled. * Fix a bug where the receive buffer size was smaller than the partial_delivery_point. Approved by: re, rrs (mentor) MFC after: One day.	2009-08-15 21:10:52 +00:00
Qing Li	3ef5e21d01	In function ip_output(), the cached route is flushed when there is a mismatch between the cached entry and the intended destination. The cached rtentry{} is flushed but the associated llentry{} is not. This causes the wrong destination MAC address being used in the output packets. The fix is to flush the llentry{} when rtentry{} is cleared. Reviewed by: kmacy, rwatson Approved by: re	2009-08-14 23:44:59 +00:00
Marko Zec	f92ae4d706	SCTP is not yet compatible with options VIMAGE kernels although it compiles with VIMAGE defined, so explicitly disallow building such kernels. Reviewed by: rrs Approved by: re (rwatson), julian (mentor)	2009-08-14 22:43:25 +00:00
Julian Elischer	72034f5548	Fix ipfw crash on uid or gid check. Receiving any ip packet for which there is no existing socket will crash if ipfw has a uid or gid test rule, as the uid/gid of the non existent owner of said non existent socket is tested. Brooks introduced this error as part of his >16 gids patch. It appears to be a cut-n-paste error from similar code a few lines before. The old code used the 'pcb' variable here, but in the new code that switched the 'inp' variable, which is often NULL and what is tested in the code further up. The rest of the multi-gid patch for ipfw seems solid (and cleaner than previous code). Reviewed by: brooks Approved by: re (rwatson)	2009-08-14 10:09:45 +00:00
Robert Watson	9d2eb78bcb	Add padding to struct inpcb, missed during our padding sweep earlier in the release cycle. Approved by: re (kensmith)	2009-08-02 22:47:08 +00:00
Robert Watson	315e3e38fa	Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events. Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE. The following modules are affected by this change: if_bridge if_cxgb if_gif ip_mroute ipdivert pf In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle. Reviewed by: bz Approved by: re (kib)	2009-08-02 19:43:32 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Xin LI	16e324fc56	Show interface name which received short CARP packet (e.g. a VRRP packet), in order to match other codepaths nearby. This makes troubleshooting easier. Approved by: re (kib) MFC after: 1 month	2009-07-30 17:40:47 +00:00
Julian Elischer	3d1001cb11	Startup the vnet part of initialization a bit after the global part. Fixes crash on boot if ipfw compiled in. Submitted by: tegge@ Reviewed by: tegge@ Approved by: re (kib)	2009-07-28 19:58:07 +00:00
Julian Elischer	7973fba3a4	Somewhere along the line accept sockets stopped honoring the FIB selected for them. Fix this. Reviewed by: ambrisko Approved by: re (kib) MFC after: 3 days	2009-07-28 19:43:27 +00:00
Michael Tuexen	bf3d517756	Fix a bug where wrong initialization value in used for an SCTP specific sysctl variable. Approved by: re, rrs(mentor). MFC after: 2 weeks.	2009-07-28 15:07:41 +00:00
Randall Stewart	cfde3ff70b	Turns out that when a receiver forwards through its TNS's the processing code holds the read lock (when processing a FWD-TSN for pr-sctp). If it finds stranded data that can be given to the application, it calls sctp_add_to_readq(). The readq function also grabs this lock. So if INVAR is on we get a double recurse on a non-recursive lock and panic. This fix will change it so that readq() function gets a flag to tell if the lock is held, if so then it does not get the lock. Approved by: re@freebsd.org (Kostik Belousov) MFC after: 1 week	2009-07-28 14:09:06 +00:00
Qing Li	df813b7ea2	This patch does the following: - Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned. Reviewed by: bz Approved by: re	2009-07-27 17:08:06 +00:00
Michael Tuexen	8e71b6947a	Fix the handling of unordered messages when using PR-SCTP. Approved by: re, rrs (mentor) MFC after: 3 weeks.	2009-07-27 13:41:45 +00:00
Michael Tuexen	4420d9a062	Get rid of unused field. This will also be deleted in the official speciication of the SCTP socket API. Approved by:re, rrs (mentor)	2009-07-27 12:09:32 +00:00
Michael Tuexen	47a490cbbc	Add a missing unlock for the inp lock when returning early from sctp_add_to_readq(). Approved by: re, rrs (mentor) MFC after: 2 weeks.	2009-07-26 15:06:59 +00:00
Julian Elischer	9d85f50ad5	Catch ipfw up to the rest of the vimage code. It got left behind when it moved to its new location. Approved by: re (kensmith)	2009-07-25 06:42:42 +00:00
Robert Watson	d0728d7174	Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)	2009-07-23 20:46:49 +00:00
Bjoern A. Zeeb	a08362ce46	sysctl_msec_to_ticks is used with both virtualized and non-vrtiualized sysctls so we cannot used one common function. Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code. Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there. Convert the two over places to use the new macro. Reviewed by: rwatson Approved by: re (kib)	2009-07-21 21:58:55 +00:00
Robert Watson	a511354af4	Back out the moving in r195782 of V_ip_id's initialization from the top back to the bottom of ip_init() as found in 7.x. I missed the fact that the bottom half of the init routine only runs in the !VNET case. Submitted by: zec Approved by: re (vimage blanket)	2009-07-20 19:40:09 +00:00
Robert Watson	0a4747d4d0	Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)	2009-07-20 13:55:33 +00:00
Robert Watson	5ee847d3ac	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)	2009-07-19 14:20:53 +00:00
Robert Watson	1e77c1056a	Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)	2009-07-16 21:13:04 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Lawrence Stewart	91a5ebde45	Fix a race in the manipulation of the V_tcp_sack_globalholes global variable, which is currently not protected by any type of lock. When triggered, the bug would sometimes cause a panic when the TCP activity to an affected machine eventually slowed during a lull. The panic only occurs if INVARIANTS is compiled into the kernel, and has laid dormant for some time as a result of INVARIANTS being off by default except in FreeBSD-CURRENT. Switch to atomic operations in the locations where the variable is changed. Reads have not been updated to be protected by atomics, so there is a possibility of accounting errors in any given calculation where the variable is read. This is considered unlikely to occur in the wild, and will not cause serious harm on rare occasions where it does. Thanks to Robert Watson for debugging help. Reported by: Kamigishi Rei <spambox at haruhiism dot net> Tested by: Kamigishi Rei <spambox at haruhiism dot net> Reviewed by: silby Approved by: re (rwatson), kensmith (mentor temporarily unavailable)	2009-07-13 11:59:38 +00:00
Lawrence Stewart	237fbe0a1c	Replace struct tcpopt with a proxy toeopt struct in the TOE driver interface to the TCP syncache. This returns struct tcpopt to being private within the TCP implementation, thus allowing it to be modified without ABI concerns. The patch breaks the ABI. Bump __FreeBSD_version to 800103 accordingly. The cxgb driver is the only TOE consumer affected by this change, and needs to be recompiled along with the kernel. Suggested by: rwatson Reviewed by: rwatson, kmacy Approved by: re (kensmith), kensmith (mentor temporarily unavailable)	2009-07-13 11:51:02 +00:00
Lawrence Stewart	962ebef8c0	Pad the following TCP related structs to allow MFCs of upcoming features/fixes back to the 8 branch: tcp_var.h - struct sackhint - struct tcpcb - struct tcpstat The patch breaks the ABI. Bump __FreeBSD_version to 800102 accordingly. User space tools that rely on the size of any of these structs (e.g. sockstat) need to be recompiled. Reviewed by: rpaulo, sam, andre, rwatson Approved by: re & mentor (gnn)	2009-07-12 09:14:28 +00:00
Robert Watson	6c8615603b	Update various IPFW-related modules to use if_addr_rlock()/ if_addr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK(). MFC after: 6 weeks	2009-06-26 00:46:50 +00:00
Robert Watson	d1da0a0672	Add address list locking for in6_ifaddrhead/ia_link: as with locking for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks. Some pieces of code require significant further reworking to be safe from all classes of writer-writer races. Reviewed by: bz MFC after: 6 weeks	2009-06-25 16:35:28 +00:00
Robert Watson	64aeca7b42	Initialize in_ifaddr_lock using RW_SYSINIT() instead of in ip_init(), so that it doesn't run multiple times if VIMAGE is being used. Discussed with: bz MFC after: 6 weeks	2009-06-25 14:44:00 +00:00
Robert Watson	2d9cfabad4	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz	2009-06-25 11:52:33 +00:00
Oleg Bulyzhin	6882bf4d92	- fix dummynet 'fast' mode for WF2Q case. - fix printing of pipe profile data. - introduce new pipe parameter: 'burst' - how much data can be sent through pipe bypassing bandwidth limit.	2009-06-24 22:57:07 +00:00
Robert Watson	32187eb6d9	Fix CARP build. Reported by: bz	2009-06-24 21:34:38 +00:00
Robert Watson	80af0152f3	Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible. Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)	2009-06-24 21:00:25 +00:00
Robert Watson	f8574c7a22	Add missing unlock of if_addr_mtx when an unmatched ARP packet is received. Reported by: lstewart MFC after: 6 weeks	2009-06-24 14:49:26 +00:00
Robert Watson	19e5b0a797	Clear 'ia' after iterating if_addrhead for unicast address matching: since 'ifa' was used as the TAILQ_FOREACH() iterator argument, and 'ia' was just derived form it, it could be left non-NULL which confused later conditional freeing code. This could cause kernel panics if multicast IP packets were received. [1] Call 'struct in_ifaddr *' in ip_rtaddr() 'ia', not 'ifa' in keeping with normal conventions. When 'ipstealth' is enabled returns from ip_input early, properly release the 'ia' reference. Reported by: lstewart, sam [1] MFC after: 6 weeks	2009-06-24 14:29:40 +00:00
Robert Watson	09d547787f	In ARP input, more consistently acquire and release ifaddr references. MFC after: 6 weeks	2009-06-24 10:33:35 +00:00
Bjoern A. Zeeb	88d166bf19	Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory to save the selected source address rather than returning an unreferenced copy to a pointer that might long be gone by the time we use the pointer for anything meaningful. Asked for by: rwatson Reviewed by: rwatson	2009-06-23 22:08:55 +00:00
Robert Watson	8c0fec805f	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
Bjoern A. Zeeb	5736e6fb9d	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
Andre Oppermann	ef760e6ad2	Add soreceive_stream(), an optimized version of soreceive() for stream (TCP) sockets. It is functionally identical to generic soreceive() but has a number stream specific optimizations: o does only one sockbuf unlock/lock per receive independent of the length of data to be moved into the uio compared to soreceive() which unlocks/locks per mbuf. o uses m_mbuftouio() instead of its own copy(out) variant. o much more compact code flow as a large number of special cases is removed. o much improved reability. It offers significantly reduced CPU usage and lock contention when receiving fast TCP streams. Additional gains are obtained when the receiving application is using SO_RCVLOWAT to batch up some data before a read (and wakeup) is done. This function was written by "reverse engineering" and is not just a stripped down variant of soreceive(). It is not yet enabled by default on TCP sockets. Instead it is commented out in the protocol initialization in tcp_usrreq.c until more widespread testing has been done. Testers, especially with 10GigE gear, are welcome. MFP4: r164817 //depot/user/andre/soreceive_stream/	2009-06-22 23:08:05 +00:00
Marko Zec	fa057b15bd	V_irtualize flowtable state. This change should make options VIMAGE kernel builds usable again, to some extent at least. Note that the size of struct vnet_inet has changed, though in accordance with one-bump-per-day policy we didn't update the __FreeBSD_version number, given that it has already been touched by r194640 a few hours ago. Reviewed by: bz Approved by: julian (mentor)	2009-06-22 21:19:24 +00:00
Robert Watson	8896f83a58	Add a new function, ifa_ifwithaddr_check(), which rather than returning a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference. MFC after: 3 weeks	2009-06-22 10:59:34 +00:00
Bjoern A. Zeeb	173de0f9cc	Remove a hack from r186086 so that IPsec via loopback routes continued working. It was targeted for stable/7 compatibility and actually never did anything in HEAD. Reminded by: rwatson X-MFC after: never	2009-06-22 09:24:46 +00:00
Robert Watson	1099f828b3	Clean up common ifaddr management: - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks	2009-06-21 19:30:33 +00:00
Roman Divacky	e40bae9a45	Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang. Reviewed by: rwatson Approved by: ed (mentor)	2009-06-21 10:29:31 +00:00
Brooks Davis	838d985825	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
Bjoern A. Zeeb	7654a365db	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
Randall Stewart	d50c1d79d0	Changes to the NR-Sack code so that: 1) All bit disappears 2) The two sets of gaps (nr and non-nr) are disjointed, you don't have gaps struck in both places. This adjusts us to coorespond to the new draft. Still to-do, cleanup the code so that there are only one set of sack routines (original NR-Sack done by E cloned all sack code).	2009-06-17 12:34:56 +00:00
John Baldwin	6b0c5521b5	Trim extra sets of ()'s. Requested by: bde	2009-06-16 19:00:48 +00:00
John Baldwin	6dfb8b316c	Fix edge cases with ticks wrapping from INT_MAX to INT_MIN in the handling of the per-tcpcb t_badtrxtwin. Submitted by: bde	2009-06-16 19:00:12 +00:00
John Baldwin	9f78a87a06	- Change members of tcpcb that cache values of ticks from int to u_int: t_rcvtime, t_starttime, t_rtttime, t_bw_rtttime, ts_recent_age, t_badrxtwin. - Change t_recent in struct timewait from u_long to u_int32_t to match the type of the field it shadows from tcpcb: ts_recent. - Change t_starttime in struct timewait from u_long to u_int to match the t_starttime field in tcpcb. Requested by: bde (1, 3)	2009-06-16 18:58:50 +00:00
Jamie Gritton	9ed47d01eb	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
Oleg Bulyzhin	1917ef996d	Since dn_pipe.numbytes is int64_t now - remove unnecessary overflow detection code in ready_event_wfq().	2009-06-15 17:14:47 +00:00
Bjoern A. Zeeb	53be8fca00	Move the kernel option FLOWTABLE chacking from the header file to the actual implementation. Remove the accessor functions for the compiled out case, just returning "unavail" values. Remove the kernel conditional from the header file as it is no longer needed, only leaving the externs. Hide the improperly virtualized SYSCTL/TUNABLE for the flowtable size under the kernel option as well. Reviewed by: rwatson	2009-06-12 20:46:36 +00:00
VANHULLEBUS Yvan	7b495c4494	Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ	2009-06-12 15:44:35 +00:00
John Baldwin	a13c655c64	Correct printf format type mismatches.	2009-06-11 14:37:18 +00:00
John Baldwin	1a0e7cfc42	Trim extra ()'s. Submitted by: bde	2009-06-11 14:36:13 +00:00
John Baldwin	0e8cc7e748	Change a few members of tcpcb that store cached copies of ticks to be ints instead of unsigned longs. This fixes a few overflow edge cases on 64-bit platforms. Specifically, if an idle connection receives a packet shortly before 2^31 clock ticks of uptime (about 25 days with hz=1000) and the keep alive timer fires after 2^31 clock ticks, the keep alive timer will think that the connection has been idle for a very long time and will immediately drop the connection instead of sending a keep alive probe. Reviewed by: silby, gnn, lstewart MFC after: 1 week	2009-06-10 18:27:15 +00:00
Warner Losh	f61c07e12d	These are no longer referenced in the tree, so can be safely removed. Reviewed by: bms@	2009-06-10 18:12:15 +00:00
Luigi Rizzo	6167e6c88f	in ip_dn_ctl(), do not allocate a large structure on the stack, and use malloc() instead if/when it is necessary. The problem is less relevant in previous versions because the variable involved (tmp_pipe) is much smaller there. Still worth fixing though. Submitted by: Marta Carbone (GSOC) MFC after: 3 days	2009-06-10 10:47:31 +00:00
Bjoern A. Zeeb	d93a13cb23	Remove the "The option TCPDEBUG requires option INET." requirement. In case of !INET we will not have a timestamp on the trace for now but that might only affect spx debugging as long as INET6 requires INET. Reviewed by: rwatson (earlier version)	2009-06-10 10:39:41 +00:00
Luigi Rizzo	1a5d0c2bf0	small simplifications to the code in charge of reaping deleted rules: - clear the head pointer immediately before using it, so there is no chance of mistakes; - call reap_rules() unconditionally. The function can handle a NULL argument just fine, and the cost of the extra call is hardly significant given that we do it rarely and outside the lock. MFC after: 3 days	2009-06-10 10:34:59 +00:00
Oleg Bulyzhin	dda10d624c	Close long existed race with net.inet.ip.fw.one_pass = 0: If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc) it carries pointer to matching ipfw rule. If this packet then reinjected back to ipfw, ruleset processing starts from that rule. If rule was deleted meanwhile, due to existed race condition panic was possible (as well as other odd effects like parsing rules in 'reap list'). P.S. this commit changes ABI so userland ipfw related binaries should be recompiled. MFC after: 1 month Tested by: Mikolaj Golub	2009-06-09 21:27:11 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Marko Zec	bc29160df3	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
Hiroki Sato	dbe5926046	Fix and add a workaround on an issue of EtherIP packet with reversed version field sent via gif(4)+if_bridge(4). The EtherIP implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had an interoperability issue because it sent the incorrect EtherIP packets and discarded the correct ones. This change introduces the following two flags to gif(4): accept_rev_ethip_ver: accepts both correct EtherIP packets and ones with reversed version field, if enabled. If disabled, the gif accepts the correct packets only. This flag is enabled by default. send_rev_ethip_ver: sends EtherIP packets with reversed version field intentionally, if enabled. If disabled, the gif sends the correct packets only. This flag is disabled by default. These flags are stored in struct gif_softc and can be set by ifconfig(8) on per-interface basis. Note that this is an incompatible change of EtherIP with the older FreeBSD releases. If you need to interoperate older FreeBSD boxes and new versions after this commit, setting "send_rev_ethip_ver" is needed. Reviewed by: thompsa and rwatson Spotted by: Shunsuke SHINOMIYA PR: kern/125003 MFC after: 2 weeks	2009-06-07 23:00:40 +00:00
Marko Zec	403f4aa059	Unbreak options VIMAGE build. Submitted by: julian (mentor) Approved by: julian (mentor)	2009-06-06 12:43:13 +00:00
Pawel Jakub Dawidek	42a3613305	Only four out of nine arguments for ip_ipsec_output() are actually used. Kill unused arguments except for 'ifp' as it might be used in the future for detecting IPsec-capable interfaces.	2009-06-05 23:53:17 +00:00
Luigi Rizzo	908e960ea6	move kernel ipfw-related sources to a separate directory, adjust conf/files and modules' Makefiles accordingly. No code or ABI changes so this and most of previous related changes can be easily MFC'ed MFC after: 5 days	2009-06-05 19:22:47 +00:00

... 2 3 4 5 6 ...

3818 Commits