freebsd-nq

Author	SHA1	Message	Date
Qing Li	8eca593c5a	This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages. 2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic. Reviewed by: julian	2008-12-26 19:45:24 +00:00
Kip Macy	5e96c0a13e	Fix missed unlock and reference drop of lle Found by: pho	2008-12-24 05:31:26 +00:00
Bjoern A. Zeeb	f3b28b6bfb	Remove long unused netinet/ipprotosw.h (basically since r82884). Discussed with: rwatson MFC after: 4 weeks	2008-12-23 16:52:03 +00:00
Qing Li	ce9122fd3e	Don't create a bogus ARP entry for 0.0.0.0.	2008-12-23 03:33:32 +00:00
Qing Li	897d75c98e	The proxy-arp code was broken and responds to ARP requests for addresses that are not proxied locally.	2008-12-19 11:07:34 +00:00
Bjoern A. Zeeb	97590249ad	Another step assimilating IPv[46] PCB code: normalize IN6P_* compat flags usage to their equialent INP_* counterpart. Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 13:00:18 +00:00
Bjoern A. Zeeb	dcdb4371ca	Use inc_flags instead of the inc_isipv6 alias which so far had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition. While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1] Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 12:52:34 +00:00
Kip Macy	00a46b3122	default to doing lla_lookup with shared afdata lock and returning a shared lock on the lle - thus restoring parallel performance to pre-arpv2 level	2008-12-17 00:14:28 +00:00
Robert Watson	ec313afa3f	IPFW's pfil hook/unhook code ignores the return values of pfil_add_hook() and pfil_remove_hook(), so cast them to (void). MFC after: pretty soon	2008-12-16 15:05:35 +00:00
Kip Macy	848552f31f	ipfw doesn't use the radix node head lock to protect the radix tree - remove acquisition	2008-12-16 11:06:30 +00:00
Kip Macy	3bb87a6c70	check pointer against NULL add new line after declaration for style	2008-12-16 03:18:59 +00:00
Kip Macy	86cd829d64	don't unlock lle if it is NULL	2008-12-16 02:48:12 +00:00
Kip Macy	fbc2ca1bef	unlock and destroy an llentry's lock before freeing Found by: sam	2008-12-16 00:20:49 +00:00
Bjoern A. Zeeb	fc384fa5d6	Another step assimilating IPv[46] PCB code - directly use the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb(). Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change. Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)	2008-12-15 21:50:54 +00:00
Qing Li	6e6b3f7cbc	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion	2008-12-15 06:10:57 +00:00
Bjoern A. Zeeb	03d8b6fd1b	Add a check, that is currently under discussion for 8 but that we need to keep for 7-STABLE when MFCing in_pcbladdr() to not change the behaviour there. With this a destination route via a loopback interface is treated as a valid and reachable thing for IPv4 source address selection, even though nothing of that network is ever directly reachable, but it is more like a blackhole route. With this the source address will be selected and IPsec can grab the packets before we would discard them at a later point, encapsulate them and send them out from a different tunnel endpoint IP. Discussed on: net Reported by: Frank Behrens <frank@harz.behrens.de> Tested by: Frank Behrens <frank@harz.behrens.de> MFC after: 4 weeks (just so that I get the mail)	2008-12-14 17:47:33 +00:00
Bjoern A. Zeeb	bccd413962	De-virtualize the MD5 context for TCP initial seq number generation and make it a function local variable like we do almost everywhere inside the kernel. Discussed with: rwatson, silby MFC after: 4 weeks	2008-12-13 21:59:18 +00:00
Kip Macy	cdacee3468	version that will compile	2008-12-13 20:34:41 +00:00
Kip Macy	fe6320b468	radix node head lock needs to be held when calling rnh_addaddr	2008-12-13 20:18:05 +00:00
Kip Macy	979245af95	don't acquire lock recursively	2008-12-13 20:16:03 +00:00
Bjoern A. Zeeb	1b193af610	Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation	2008-12-13 19:13:03 +00:00
Bjoern A. Zeeb	86413abf5f	Put a global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. While there garbage collect a few dead externs from ip6_var.h. Sponsored by: The FreeBSD Foundation	2008-12-11 16:26:38 +00:00
Bjoern A. Zeeb	0750c2ed96	Use the correct INIT_VNET_INET() as the virtualized variable here are in vinet.h not in vinet6.h Sponsored by: The FreeBSD Foundation	2008-12-11 16:05:07 +00:00
Marko Zec	385195c062	Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-12-10 23:12:39 +00:00
Robert Watson	cd416355a8	Remove inconsistent white space from in_pcballoc(). MFC after: pretty soon	2008-12-10 13:24:38 +00:00
Robert Watson	5d04565101	Move syncache flag definitions below data structure, compress some vertical whitespace. MFC after: pretty soon	2008-12-10 11:11:43 +00:00
Robert Watson	c3ce7a790c	Move flag definitions for t_flags and t_oobflags below the definition of struct tcpcb so that the structure definition is a bit more vertically compact. Can't yet fit it on one printed page, though. MFC after: pretty soon	2008-12-10 11:03:16 +00:00
Kip Macy	65954fda79	unlock when done	2008-12-10 08:23:47 +00:00
Kip Macy	e08ab8576d	don't reference if_addr_mtx directly	2008-12-10 08:22:51 +00:00
Robert Watson	0ca989b376	Update comment on INP_TIMEWAIT to say what it's about, as we caution regarding the misplacement of flags in inp_vflag in an earlier comment. MFC after: pretty soon	2008-12-09 23:57:09 +00:00
Robert Watson	d15fb96522	Enhance one comment relating to recent TCP locking changes, and fix a typo in another. MFC after: 6 weeks	2008-12-09 15:49:02 +00:00
Robert Watson	a5654bb2ae	Move macros defining flags and shortcus to nested structure fields in inpcbinfo below the structure definition in order to make inpcbinfo fit on a single printed page; related style tweaks. MFC after: pretty soon	2008-12-09 10:21:38 +00:00
Robert Watson	252ca42863	Move from solely write-locking the global tcbinfo in tcp_input() to read-locking in the TCP input path, allowing greater TCP input parallelism where multiple ithreads or ithread and netisr are able to run in parallel. Previously, most TCP input paths held a write lock on the global tcbinfo lock, effectively serializing TCP input. Before looking up the connection, acquire a write lock if a potentially state-changing flag is set on the TCP segment header (FIN, RST, SYN), and otherwise a read lock. We may later have to upgrade to a write lock in certain cases (ACKs received by the syncache or during TIMEWAIT) in order to support global state transitions, but this is never required for steady-state packets. Upgrading from a write lock to a read lock must be done as a trylock operation to avoid deadlocks, and actually violates the lock order as the tcbinfo lock preceeds the inpcb lock held at the time of upgrade. If the trylock fails, we bump the refcount on the inpcb, drop both locks, and re-acquire in-order. If another thread has freed the connection while the locks are dropped, we free the inpcb and repeat the lookup (this should hardly ever or never happen in practice). For now, maintain a number of new counters measuring how many times various cases execute, and in particular whether various optimistic assumptions about when read locks can be used, whether upgrades are done using the fast path, and whether connections close in practice in the above-described race, actually occur. MFC after: 6 weeks Discussed with: kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy	2008-12-08 20:27:00 +00:00
Robert Watson	28696211d6	Add a reference count to struct inpcb, which may be explicitly incremented using in_pcbref(), and decremented using in_pcbfree() or inpcbrele(). Protocols using only current in_pcballoc() and in_pcbfree() calls will see the same semantics, but it is now possible for TCP to call in_pcbref() and in_pcbrele() to prevent an inpcb from being freed when both tcbinfo and per-inpcb locks are released. This makes it possible to safely transition from holding only the inpcb lock to both tcbinfo and inpcb lock without re-looking up a connection in the input path, timer path, etc. Notice that in_pcbrele() does not unlock the connection after decrementing the refcount, if the connection remains, so that the caller can continue to use it; in_pcbrele() returns a flag indicating whether or not the inpcb pointer is still valid, and in_pcbfee() is now a simple wrapper around in_pcbrele(). MFC after: 1 month Discussed with: bz, kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy	2008-12-08 20:18:50 +00:00
Christian S.J. Peron	4e57bc3338	in_rtalloc1(9) returns a locked route, so make sure that we use RTFREE_LOCKED() here. This macro makes sure the reference count on the route is being managed properly. This elimates another case which results in the following message being printed to the console: rtfree: 0xc841ee88 has 1 refs Reviewed by: bz MFC after: 2 weeks	2008-12-06 19:09:38 +00:00
Randall Stewart	830d754d52	Code from the hack-session known as the IETF (and a bit of debugging afterwards): - Fix protection code for notification generation. - Decouple associd from vtag - Allow vtags to have less strigent requirements in non-uniqueness. o don't pre-hash them when you issue one in a cookie. o Allow duplicates and use addresses and ports to discriminate amongst the duplicates during lookup. - Add support for the NAT draft draft-ietf-behave-sctpnat-00, this is still experimental and needs more extensive testing with the Jason Butt ipfw changes. - Support for the SENDER_DRY event to get DTLS in OpenSSL working with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon). - Update the support of SCTP-AUTH by Peter Lei. - Use macros for refcounting. - Fix MTU for UDP encapsulation. - Fix reporting back of unsent data. - Update assoc send counter handling to be consistent with endpoint sent counter. - Fix a bug in PR-SCTP. - Fix so we only send another FWD-TSN when a SACK arrives IF and only if the adv-peer-ack point progressed. However we still make sure a timer is running if we do have an adv_peer_ack point. - Fix PR-SCTP bug where chunks were retransmitted if they are sent unreliable but not abandoned yet. With the help of: Michael Teuxen and Peter Lei :-) MFC after: 4 weeks	2008-12-06 13:19:54 +00:00
Gleb Smirnoff	0b476f1cce	In a case of CARP status change run through the if_link_state_change() routine, so that devd(8) and others are notified about link state change.	2008-12-05 14:37:14 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Marko Zec	5c890d3c4f	Add an essential .h file that skipped from the last commit (r185419). Pointy hat #1 on... Pointed out by: bz	2008-11-28 23:39:25 +00:00
Marko Zec	f02493cbbd	Unhide declarations of network stack virtualization structs from underneath #ifdef VIMAGE blocks. This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues. In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-28 23:30:51 +00:00
Dag-Erling Smørgrav	3b6fe5fcd9	missing V_	2008-11-28 13:13:44 +00:00
Bjoern A. Zeeb	5cd54324ee	Replace most INP_CHECK_SOCKAF() uses checking if it is an IPv6 socket by comparing a constant inp vflag. This is expected to help to reduce extra locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks	2008-11-27 13:19:42 +00:00
Bjoern A. Zeeb	6aee2fc550	Merge in6_pcbfree() into in_pcbfree() which after the previous IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place. Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-27 12:04:35 +00:00
Bjoern A. Zeeb	6974bd9e75	Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy. Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical. Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks () () possibly need to leave a stub wrappers in 7 to keep the symbols.	2008-11-27 10:43:08 +00:00
Marko Zec	97021c2464	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
Bjoern A. Zeeb	0206cdb846	Remove in6_pcbdetach() as it is exactly the same function as in_pcbdetach() and we don't need the code twice. Reviewed by: rwatson MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-26 20:52:26 +00:00
Bjoern A. Zeeb	a7df09e8c9	Unify the v4 and v6 versions of pcbdetach and pcbfree as good as possible so that they are easily diffable. No functional changes. Reviewed by: rwatson MFC after: 6 weeks	2008-11-26 12:54:31 +00:00
Julian Elischer	bc97ba5100	Fix a scope problem in the multiple routing table code that stopped the SO_SETFIB socket option from working correctly. Obtained from: Ironport MFC after: 3 days	2008-11-19 19:19:30 +00:00
Marko Zec	44e33a0758	Change the initialization methodology for global variables scheduled for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-19 09:39:34 +00:00
Randall Stewart	a1e132720b	-Improvement: Add '\n' on debug output in sctp_lower_sosend(). -Improvement: panic() on INVARIANTS kernels if memory allocation fails for a tagblock in sctp_add_vtag_to_timewait(). -Bugfix: Protect code in sctp_is_in_timewait() by SCTP_INP_INFO_WLOCK/SCTP_INP_INFO_WUNLOCK. -Cleanup: Get rid of unused variable now in sctp_init_asoc(). -Bugfix: Reuse the correct vtag in sctp_add_vtag_to_timewait(). -Cleanup: Get rid of unused constant SCTP_TIME_WAIT_SHORT in sctp_constants.h. -Improvement: Use all hash buckets of the vtag hash table. -Cleanup: Get rid of then unused constant SCTP_STACK_VTAG_HASH_SIZE_A. -Bugfix: Handle SHUTDOWN;SACK packet correctly. -Bugfix: Last TSN in a gap ack block was not being "ack'd" in the internal scoreboard. Obtained from: (with help from Michael Tuexen)	2008-11-12 14:16:39 +00:00
Bjoern A. Zeeb	687a9b4738	For consistency work on the local object passed into the function for the lock operation instead using the global name. Submitted by: ganbold MFC after: 2 months	2008-11-09 14:06:44 +00:00
Bjoern A. Zeeb	8e5c87f4b6	Fix typo and while here another one. Reviewed by: keramida Reported by: keramida MFC after: 2 months (with r184720)	2008-11-06 16:30:20 +00:00
Bjoern A. Zeeb	91d6cfa6b1	Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. Move the TSO logic back to tcp_mss() and out of tcp_mss_update(). We tried to avoid that initially but if were are called from tcp_output() with EMSGSIZE, we cleared the TSO flag on the tcpcb there, called into tcp_mtudisc() and tcp_mss_update() which then would reenable TSO on the tcpcb based on TSO capabilities of the interface as learnt in tcp_maxmtu/6(). So if TSO was enabled on the (possibly new) outgoing interface it was turned back on, which lead to an endless loop between tcp_output() and tcp_mtudisc() until we overflew the stack. Reported by: kmacy MFC after: 2 months (along with r182851)	2008-11-06 13:25:59 +00:00
Bjoern A. Zeeb	4b3f4d3818	Adopt the comment for tcp_maxmtu(); we are returning a number not a pointer. While here update the rest of the comment to better match what we have these days. MFC after: 2 months	2008-11-06 12:59:00 +00:00
Bjoern A. Zeeb	6f01cac68a	Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. In case we return early and got a metricptr to pass the hostcache info back to the caller we need to initialize the data to a defined state (zero it) as tcp_hc_get() would do if there was no hit. Without that the caller would check on random stack garbage which could lead to undefined results. This only affected tcp_mss() if there was no routing entry for the peer, tcp_mtudisc() was not affected. MFC after: 2 months (along with r182851)	2008-11-06 12:33:33 +00:00
Oleg Bulyzhin	02d09f7901	Type of q_time (start of queue idle time) has changed: uint32_t -> uint64_t. This should fix q_time overflow, which happens after 2^32/(86400*hz) days of uptime (~50days for hz = 1000). q_time overflow cause following: - traffic shaping may not work in 'fast' mode (not enabled by default). - incorrect average queue length calculation in RED/GRED algorithm. NB: due to ABI change this change is not applicable to stable. PR: kern/128401	2008-10-28 14:14:57 +00:00
Randall Stewart	73adc48f49	More issues with pre-blocking: a) Need for EEOR mode to take the min of the socket buffer size and the add more threshold, otherwise if you are so silly as to set a send buf size less than the add-more you could block forever in eeor mode. b) We were incorrectly using the sysctl vs the calculated value. This causes us to block forever if the addmore theshold is larger than then the socket buffer size.	2008-10-27 14:49:12 +00:00
Randall Stewart	35e4161b1f	Two inter-related bugs. - If we send EXACTLY the size left in the send buffer and then send again, we end up with exactly 0 bytes and don't hit the pre-block code to wait for more space. - If we fall into the loop with our max_len == 0 (the bug above) we then call in to copy out the data, setup the length of the waiting to transmit data to 0 and call the mbuf copy routine which 0 indicates copy all the data to the mbuf chain.. which it does. This then leaves a "stuck" message on the stream queue with its size exactly 0 bytes but all the data there and thus nothing left in the uio structure. We then reach a stuck forever state never being able to send data.	2008-10-27 14:01:23 +00:00
Randall Stewart	a4c651183e	Get rid of ifdef for vimage on version 8 comparison. Now the scrubbing program properly takes care of this.	2008-10-27 13:54:54 +00:00
Randall Stewart	83416c885d	Invariants changes that make more sense.	2008-10-27 13:53:31 +00:00
Robert Watson	dd8ac7f990	In both dropwithreset paths in tcp_input.c, drop the tcbinfo lock sooner to decomplicate locking and eliminate the need for a rather chatty comment about why we have to handle the global lock in a special way for the benefit of ipfw and pf cred rules. MFC after: 3 days	2008-10-26 22:03:52 +00:00
Robert Watson	4c95fd23d6	Remove endearing but syntactically unnecessary "return;" statements directly before the final closeing brackets of some TCP functions. MFC after: 3 days	2008-10-26 19:33:22 +00:00
Bjoern A. Zeeb	460473a071	Style changes only: - Consistently add parentheses to return statements. - Use NULL instead of 0 when comparing pointers, also avoiding unnecessary casts. - Do not use pointers as booleans. Reviewed by: rwatson (earlier version) MFC after: 2 months	2008-10-26 19:17:25 +00:00
Dag-Erling Smørgrav	e11e3f187d	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Bjoern A. Zeeb	7e1bc2729c	Update a comment which to my reading had been misplaced in rev. 1.12 already (but probably had been way above as the code was there twice) and describe what was last changed in rev. 1.199 there (which now is in sync with in6_src.c r184096). Pointed at by: mlaier MFC after: 2 mmonths	2008-10-20 18:56:00 +00:00
Bjoern A. Zeeb	dc3c09c89f	Bring over the change switching from using sequential to random ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143 (initially from OpenBSD) and follow-up commits during the last four and a half years including rev. 1.157, 1.162 and 1.199. This now is relying on the same infrastructure as has been implemented in in_pcb.c since rev. 1.199. Reviewed by: silby, rpaulo, mlaier MFC after: 2 months	2008-10-20 18:43:59 +00:00
Randall Stewart	1b9f62a044	The flags value was not always being copied out in the recv routine like it should be. Obtained from: Michael Tuexen	2008-10-18 15:56:52 +00:00
Randall Stewart	ac29704161	New sockets (accepted) were not inheriting the proper snd/rcv buffer value. Obtained from: Michael Tuexen	2008-10-18 15:56:12 +00:00
Randall Stewart	1862b24533	- Peers rwnd is now available for the MIB. Obtained from: Michael Tuexen	2008-10-18 15:55:15 +00:00
Randall Stewart	fc69c30240	- Adapt layer indication was always being given (it should only be given when the user has enabled it). (Michael Tuexen) - Sack Immediately was not being set properly on the actual chunk, it was only put in the rcvd_flags which is incorrect. (Michael Tuexen) - added an ifndef userspace to one of the already present macro's for inet (Brad Penoff) Obtained from: Michael Tuexen and Brad Penoff MFC after: 4 weeks	2008-10-18 15:54:25 +00:00
Randall Stewart	fcea7c2ed3	Reported by Yehuda Weinraub (yehudasa@gamil.com) - CRC32C algorithm uses incorrect init_bytes value. It SHOULD have the number of bytes to get to a 4 byte boundary. PR: 128134 MFC after: 4 weeks	2008-10-18 15:53:31 +00:00
Bjoern A. Zeeb	f08ef6c595	Add cr_canseeinpcb() doing checks using the cached socket credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock. Reviewed by: rwatson MFC after: 3 months (set timer; decide then)	2008-10-17 16:26:16 +00:00
Marko Zec	3ff0b2135b	Remove a useless global static variable. Approved by: bz (ad-hoc mentor)	2008-10-16 12:31:03 +00:00
Maxim Konovalov	0279bb29a0	o Remove unnecessary parentheses and restore identation. Prodded by: mlaier	2008-10-14 17:47:29 +00:00
Maxim Konovalov	8e6c0f8cfd	o Reformat ipfw nat get\|setsockopt code to look it more style(9) compliant. No functional changes.	2008-10-14 12:26:55 +00:00
Robert Watson	1f6ef666b5	Fix content and spelling of comment on _ipfw_insn.len -- a count of 32-bit words, not 32-byte words. MFC after: 3 days	2008-10-10 14:33:47 +00:00
Robert Watson	6c8286e42d	Don't pass curthread to sbreserve_locked() in tcp_do_segment(), as the netisr or ithread's socket buffer size limit is not the right limit to use. Instead, pass NULL as the other two calls to sbreserve_locked() in the TCP input path (tcp_mss()) do. In practice, this is a no-op, as ithreads and the netisr run without a process limit on socket buffer use, and a NULL thread pointer leads to not using the process's limit, if any. However, if tcp_input() is called in other contexts that do have limits, this may prevent the incorrect limit from being used. MFC after: 3 days	2008-10-07 09:41:07 +00:00
Bjoern A. Zeeb	c6ddb94cf2	Remove an INP_RUNLOCK() missed in SVN r183606, cvs rev. 1.195 raw_ip.c when transitioning from so_cred to inp_cred. MFC after: 6 weeks	2008-10-04 16:48:09 +00:00
Bjoern A. Zeeb	86d02c5c63	Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred	2008-10-04 15:06:34 +00:00
Bjoern A. Zeeb	0895aec30c	Implement IPv4 source address selection for unbound sockets. For the jail case we are already looping over the interface addresses before falling back to the only IP address of a jail in case of no match. This is in preparation for the upcoming multi-IPv4/v6/no-IP jail patch this change was developed with initially. This also changes the semantics of selecting the IP for processes within a jail as it now uses the same logic as outside the jail (with additional checks) but no longer is on a mutually exclusive code path. Benchmarks had shown no difference at 95.0% confidence for neither the plain nor the jail case (even with the additional overhead). See: http://lists.freebsd.org/pipermail/freebsd-net/2008-September/019531.html Inpsired by a patch from: Yahoo! (partially) Tested by: latest multi-IP jail patch users (implictly) Discussed with: rwatson (general things around this) Reviewed by: mostly silence (feedback from bms) Help with benchmarking from: kris MFC after: 2 months	2008-10-03 12:21:21 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Robert Watson	c0a211c51f	Expand comments relating various detach/free/drop inpcb routines. MFC after: 3 days	2008-09-29 13:50:17 +00:00
Robert Watson	fc18af966f	Fix typo in comment. MFC after: 3 days	2008-09-29 13:48:48 +00:00
Robert Watson	47505890d6	When an inpcb doesn't have a socket but the inpcb is passed to ipfw in the transmit path, such as TCPS_TIMEWAIT, fail the credential extraction immediately rather than acquiring locks and looking up the inpcb on the global lists in order to reach the conclusion that the credential extraction has failed. This is more efficient, but more importantly, it avoids lock recursion on the inpcbinfo, which is no longer allowed with rwlocks. This appears to have been responsible for at least two reported panics. MFC after: 3 days Reported by: ganbold	2008-09-27 19:28:28 +00:00
Robert Watson	d83412e791	Rather than shadowing global variable 'lookup' in check_uidgid(), rename it to ugid_lookupp. This should make debugging issues with ipfw uid rules easier. MFC after: 3 days	2008-09-27 10:14:02 +00:00
Ed Maste	d2035ffb7a	Move CTASSERT from header file to source file, per implementation note now in the CTASSERT man page. Submitted by: Ryan Stone	2008-09-26 18:30:11 +00:00
Robert Watson	014ea782b1	As a follow-on to r183323, correct another case where ip_output() was called without an inpcb pointer despite holding the tcbinfo global lock, which lead to a deadlock or panic when ipfw tried to further acquire it recursively. Reported by: Stefan Ehmann <shoesoft at gmx dot net> MFC after: 3 days	2008-09-25 17:26:54 +00:00
Robert Watson	a0ca087183	When dropping a packet and issuing a reset during TCP segment handling, unconditionally drop the tcbinfo lock (after all, we assert it lines before), but call tcp_dropwithreset() under both inpcb and inpcbinfo locks only if we pass in an tcpcb. Otherwise, if the pointer is NULL, firewall code may later recurse the global tcbinfo lock trying to look up an inpcb. This is an instance where a layering violation leads not only potentially to code reentrace and recursion, but also to lock recursion, and was revealed by the conversion to rwlocks because acquiring a read lock on an rwlock already held with a write lock is forbidden. When these locks were mutexes, they simply recursed. Reported by: Stefan Ehmann <shoesoft at gmx dot net> MFC after: 3 days	2008-09-24 11:07:03 +00:00
Roman Kurakin	f7b5554eb7	Export IPFW_TABLES_MAX value for compiled in defaults.	2008-09-21 20:42:42 +00:00
Roman Kurakin	6b057f1b5e	Export IPFW_TABLES_MAX via sysctl. Part of PR: 127058. PR: 127058	2008-09-14 09:24:12 +00:00
Julian Elischer	de34ad3f4b	oops commit the version that compiles	2008-09-14 08:24:45 +00:00
Julian Elischer	93fcb5a28d	Revert a part of the MRT commit that proved un-needed. rt_check() in its original form proved to be sufficient and rt_check_fib() can go away (as can its evil twin in_rt_check()). I believe this does NOT address the crashes people have been seeing in rt_check. MFC after: 1 week	2008-09-14 08:19:48 +00:00
Roman Kurakin	eb29d14ccb	Make the commet for the default rule number more clear. Submitted by: yar@	2008-09-14 06:14:06 +00:00
Bjoern A. Zeeb	3418daf2f1	Implement IPv6 support for TCP MD5 Signature Option (RFC 2385) the same way it has been implemented for IPv4. Reviewed by: bms (skimmed) Tested by: Nick Hilliard (nick netability.ie) (with more changes) MFC after: 2 months	2008-09-13 17:26:46 +00:00
Bjoern A. Zeeb	c10eb6d10a	Work around an integer division resulting in 0 and thus the congestion window not being incremented, if cwnd > maxseg^2. As suggested in RFC2581 increment the cwnd by 1 in this case. See http://caia.swin.edu.au/reports/080829A/CAIA-TR-080829A.pdf for more details. Submitted by: Alana Huebner, Lawrence Stewart, Grenville Armitage (caia.swin.edu.au) Reviewed by: dwmalone, gnn, rpaulo MFC After: 3 days	2008-09-09 07:35:21 +00:00
Bjoern A. Zeeb	00db174bc2	To my reading there are no real consumers of ip6_plen (IPv6 Payload Length) as set in tcpip_fillheaders(). ip6_output() will calculate it based of the length from the mbuf packet header itself. So initialize the value in tcpip_fillheaders() in correct (network) byte order. With the above change, to my reading, all places calling tcp_trace() pass in the ip6 header via ipgen as serialized in the mbuf and with ip6_plen in network byte order. Thus convert the IPv6 payload length to host byte order before printing. MFC after: 2 months	2008-09-07 20:44:45 +00:00
Bjoern A. Zeeb	3cee92e074	Split tcp_mss() in tcp_mss() and tcp_mss_update() where the former calls the latter. Merge tcp_mss_update() with code from tcp_mtudisc() basically doing the same thing. This gives us one central place where we calcuate and check mss values to update t_maxopd (maximum mss + options length) instead of two slightly different but almost equal implementations to maintain. PR: kern/118455 Reviewed by: silby (back in March) MFC after: 2 months	2008-09-07 18:50:25 +00:00
Bjoern A. Zeeb	ebe5426934	V_irtualize SVN r182846 tcp_mssdflt/tcp_v6mssdflt procedure based sysctl implementations for VIMAGE the same way we did elsewhere: update the implementation but leave the globals and the SYSCTL statement untouched.	2008-09-07 15:20:21 +00:00
Bjoern A. Zeeb	4cdf3bedf3	Convert SYSCTL_INTs for tcp_mssdflt and tcp_v6mssdflt to SYSCTL_PROCs and check that the default mss for neither v4 nor v6 goes below the minimum MSS constant (216). This prevents people from shooting themselves in the foot. PR: kern/118455 (remotely related) Reviewed by: silby (as part of a larger patch in March) MFC after: 2 months	2008-09-07 14:44:55 +00:00
Bjoern A. Zeeb	c4982fae59	Add a second KASSERT checking for len >= 0 in the tcp output path. This is different to the first one (as len gets updated between those two) and would have caught various edge cases (read bugs) at a well defined place I had been debugging the last months instead of triggering (random) panics further down the call graph. MFC after: 2 months	2008-09-07 11:38:30 +00:00
Roman Kurakin	8191aa7c0b	Export the IPFW_DEFAULT_RULE outside ip_fw2.c. This number in not only the default rule number but also the maximum rule number. User space software such as ipfw and natd should be aware of its value. The software that already includes ip_fw.h should use the defined value. All other a expected to use sysctl (as discussed on net@). MFC after: 5 days. Discussed on: net@	2008-09-06 16:47:07 +00:00
Giorgos Keramidas	57a5a46e00	Slightly reword comment and remove typos.	2008-09-05 01:36:30 +00:00
Julian Elischer	9c6b07a695	whitespace nit	2008-09-03 18:09:15 +00:00
Brooks Davis	0eb7cf4d5f	Wrap an 81 column SYSCTL_NODE decleration. Obtained from: //depot/projects/vimage-commit2/...	2008-09-01 19:25:27 +00:00
Kip Macy	7c80e4f37f	Don't check if an interface can do tcp offload if there are no offload devices registered on the system. Suggested by: rwatson MFC after: 3 days	2008-09-01 05:30:22 +00:00
Julian Elischer	22b55ba9a0	fix tiny nti in comment	2008-08-31 18:54:35 +00:00
Christian S.J. Peron	8751c5bac8	Improve the entropy of the source port randomization for network address translation. It turns out this is useful for applications which require source port randomization for security (i.e. dns servers). Discussed with: secteam Requested by: mlaier MFC after: 2 weeks	2008-08-30 20:58:34 +00:00
George V. Neville-Neil	e4762f75c3	Fix a bug whereby multicast packets that are looped back locally wind up with the incorrect checksum on the wire when transmitted via devices that do checksum offloading. PR: kern/119635 Reviewed by: rwatson MFC after: 5 days	2008-08-29 20:42:58 +00:00
Rui Paulo	003c7e36b2	Fix typo in comment.	2008-08-28 21:55:40 +00:00
Randall Stewart	df4ad1fd93	ok, non static the function and put in the .h so when we do INVARANT compile the compiler will not dis the function that is not used. Hmm maybe I should have made it ifndef INVARIANTs..	2008-08-28 20:31:24 +00:00
Randall Stewart	e1bfc4d739	Fixes compile error when INVARIANTs is on. Adds an empty goto to keep the compiler happy.	2008-08-28 20:14:07 +00:00
Randall Stewart	df6e0cc37d	- Make strict-sacks be the default. - Change it so that without INVARIANTs there are no panics in SCTP. - sctp_timer changes so that we have a recovery mechanism when the sent list is out of order.	2008-08-28 09:44:07 +00:00
Christian S.J. Peron	f440aeea83	Fix a panic in MAC kernels that was a result of un-initialized label storage. We can safely remove the label copying operations since M_MOVE_PKTHDR will move the mbuf tags (which contain MAC labels) to the destination mbuf. MFC after: 1 week Discussed with: rwatson	2008-08-27 23:52:03 +00:00
Randall Stewart	4a16c2c883	- When we close a socket with pending assoc's that are still shutting down, NULL out the socket pointer so we won't ever refer to a dead socket. Obtained from: Neil Wilson	2008-08-27 13:13:35 +00:00
Julian Elischer	2c0d658fca	Another missed V_ instance	2008-08-25 05:57:56 +00:00
Julian Elischer	b53c8130e5	Another V_ forgotten	2008-08-25 05:49:16 +00:00
Julian Elischer	576c43c844	We left out V_static_len from ip_fw2.c (also a whitespace diff that i'd rahter fix her ethan break in the vimage branch.)	2008-08-25 05:38:18 +00:00
Julian Elischer	e0306e8be7	Move some struct defs around. This is a prep step for Vimage.A No real effect of this at this time.	2008-08-25 00:33:30 +00:00
Bjoern A. Zeeb	ad27dca959	Make the kernel compile with SCTP and SCTP_DEBUG but no INET6 defined.	2008-08-24 18:29:22 +00:00
Kip Macy	4570959392	Don't calculate checksum if it has already been validated Obtained from: Chelsio Inc. MFC after: 3 days	2008-08-24 02:31:09 +00:00
Bjoern A. Zeeb	c06f087ccb	Cache the cred locally in _syncache_add() while holding the locks, so we can be sure that it's valid. In case we abort early free it again else put it into the syncache. We need the cred in the syncache to be able to restrict what will be exportet by the sysctl helper function syncache_pcblist() (to netstat) within jails. PR: kern/126493 Reviewed by: rwatson (earlier versions) MFC after: 3 days	2008-08-23 14:22:12 +00:00
Bjoern A. Zeeb	bb580846dc	Add an explicit comment why we NULLify the two variables. Reviewed by: rwatson MFC after: 3 days	2008-08-23 12:27:18 +00:00
Robert Watson	5060346d0b	Remove comments and #ifdef notyet'd code relating to directly dispatching the IP multicast input code from the output path; we don't allow reentrance of the input path from the IP output path, it must use the netisr due to potential lock recursion. MFC after: 3 days	2008-08-21 17:24:49 +00:00
Julian Elischer	5ed3800e41	Fix some of the formatting fixes.. It's amazing how some thing stand out in a commit message.	2008-08-20 01:24:55 +00:00
Julian Elischer	ac957cd271	A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.	2008-08-20 01:05:56 +00:00
Philip Paeps	80b11ee46a	Fix ARP in bridging scenarios where the bridge shares its MAC address with one of its members (see my r180140). Pointy hat to: philip Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru> MFC after: 3 days	2008-08-18 09:06:11 +00:00
Bjoern A. Zeeb	603724d3ab	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch	2008-08-17 23:27:27 +00:00
Bjoern A. Zeeb	48d48eb980	Fix a regression introduced in r179289 splitting up ip6_savecontrol() into v4-only vs. v6-only inp_flags processing. When ip6_savecontrol_v4() is called from ip6_savecontrol() we were not passing back the mp thus the information will be missing in userland. Istead of going with a * as suggested in the PR we are returning **mp now and passing in the v4only flag as a pointer argument. PR: kern/126349 Reviewed by: rwatson, dwmalone	2008-08-16 06:39:18 +00:00
Dag-Erling Smørgrav	c3a7b734ad	Nit	2008-08-09 11:28:57 +00:00
Robert Watson	5cb2685a59	Minor white space tweaks. MFC after: 1 week	2008-08-07 09:06:04 +00:00
Robert Watson	72bed08287	Correct comment typo. MFC after: 1 week (after inpcb rwlocking)	2008-08-07 09:03:51 +00:00
John Baldwin	aa91bee2dc	Minor style tweaks.	2008-08-05 21:59:20 +00:00
Julian Elischer	711ca7efbb	The IPFW code accepts the use of the tablearg keyword along with the skipto keyword. But it doesn't work. Two options.. make it no longer accept it, or actually make it work.. I chose the 2nd.. Allow the tablearg to be used to specify a skipto destination. This is actually a very powerful construct if used correctly, or a sink of cpu cycles if used badly. changes t teh man page will follow.	2008-08-01 22:21:03 +00:00
Rui Paulo	f2512ba12a	MFp4 (//depot/projects/tcpecn/): TCP ECN support. Merge of my GSoC 2006 work for NetBSD. TCP ECN is defined in RFC 3168. Partly reviewed by: dwmalone, silby Obtained from: NetBSD	2008-07-31 15:10:09 +00:00
Randall Stewart	6d9e8f2b3a	Adds support for the SCTP_PORT_REUSE option Fixes a refcount bug found in the process Obtained from: With the help of Michael Tuexen	2008-07-31 11:08:30 +00:00
Randall Stewart	52baa64a19	Fix build breakage - kthread_exit() in 8 now has no arguments MFC after: 1 week	2008-07-29 09:30:50 +00:00
Randall Stewart	d6af161a34	- Out with some printfs. - Fix a initialization of last_tsn_used - Fix handling of mapped IPv4 addresses Obtained from: Michael Tuexen and I :-) MFC after: 1 week	2008-07-29 09:06:35 +00:00
Alexander Motin	18f401c664	Some style and assertion fixes to the previous commits hinted by rwatson. There is no functional changes.	2008-07-28 06:57:28 +00:00
Alexander Motin	d185578a78	According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock.	2008-07-27 20:48:22 +00:00
Alexander Motin	e2ed8f3514	Increase UDBHASHSIZE from 16 to 128 items. Previous value was chosen 10 years ago and not very effective now. This change gives several percents speedup on 1000 L2TP mpd links.	2008-07-26 23:07:34 +00:00
Alexander Motin	0ca3b0967b	According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock. This relaxed locking noticably increses receive socket lookup performance.	2008-07-26 21:12:00 +00:00
Alexander Motin	9ed324c9a5	Add hash table lookup for a fully connected raw sockets. This gives significant performance improvements when many raw sockets used. Benchmarks of mpd handeling 1000 simultaneous PPTP connections show up to 50% performance boost. With higher number of connections benefit becomes even bigger. PopTop snd others should also get some benefits.	2008-07-26 17:32:15 +00:00
Tai-hwa Liang	df9cf830d1	Trying to fix compilation bustage: - removing 'const' qualifier from an input parameter to conform to the type required by rw_assert(); - using in_addr->s_addr to retrive 32 bits address value. Observed by: tinderbox	2008-07-22 04:23:57 +00:00
Kip Macy	9d29c635da	make new accessor functions consistent with existing style	2008-07-21 22:11:39 +00:00
Kip Macy	84330faa64	- Switch to INP_WLOCK macro from inp_wlock - calling sodisconnect after tcp_twstart is both gratuitous and unsafe - remove Submitted by: rwatson	2008-07-21 21:22:56 +00:00
Kip Macy	b1f8bd6464	Add versions of tcp_twstart, tcp_close, and tcp_drop that hide the acquisition the tcbinfo lock. MFC after: 1 week	2008-07-21 02:23:02 +00:00
Kip Macy	409d8ba5c7	add interface for external consumers to syncache_expand - rename syncache_add in a manner consistent with other bits intended for offload	2008-07-21 02:11:06 +00:00
Kip Macy	dd0e6c383a	Add accessor functions for socket fields. MFC after: 1 week	2008-07-21 00:49:34 +00:00
Kip Macy	9378e4377f	add inpcb accessor functions for fields needed by TOE devices	2008-07-21 00:08:34 +00:00
Tom Rhodes	41698ebf5b	Document a few sysctls. Reviewed by: rwatson	2008-07-20 15:29:58 +00:00
Bjoern A. Zeeb	8699ea087e	ia is a pointer thus use NULL rather then 0 for initialization and in comparisons to make this more obvious. MFC after: 5 days	2008-07-20 12:31:36 +00:00
Kip Macy	b1bc0b2a86	remove unused toedev functions and add comments for rest	2008-07-20 02:02:50 +00:00
David Malone	744eaff7e6	Add an accept filter for TCP based DNS requests. It waits until the whole first request is present before returning from accept.	2008-07-18 14:44:51 +00:00
Robert Watson	3b19fa3597	Eliminate use of the global ripsrc which was being used to pass address information from rip_input() to rip_append(). Instead, pass the source address for an IP datagram to rip_append() using a stack-allocated sockaddr_in, similar to udp_input() and udp_append(). Prior to the move to rwlocks for inpcbinfo, this was not a problem, as use of the global was synchronized using the ripcbinfo mutex, but with read-locking there is the potential for a race during concurrent receive. This problem is not present in the IPv6 raw IP socket code, which already used a stack variable for the address. Spotted by: mav MFC after: 1 week (before inpcbinfo rwlock changes)	2008-07-18 10:47:07 +00:00
Robert Watson	ca528788b8	Fix error in comment. MFC after: 3 weeks	2008-07-16 10:55:50 +00:00
Robert Watson	43cc0bc1df	Merge last of a series of rwlock conversion changes to UDP, which completes the move to a fully parallel UDP transmit path by using global read, rather than write, locking of inpcbinfo in further semi-connected cases: - Add macros to allow try-locking of inpcb and inpcbinfo. - Always acquire an incpcb read lock in udp_output(), which stablizes the local inpcb address and port bindings in order to determine what further locking is required: - If the inpcb is currently not bound (at all) and are implicitly connecting, we require inpcbinfo and inpcb write locks, so drop the read lock and re-acquire. - If the inpcb is bound for at least one of the port or address, but an explicit source or destination is requested, trylock the inpcbinfo lock, and if that fails, drop the inpcb lock, lock the global lock, and relock the inpcb lock. - Otherwise, no further locking is required (common case). - Update comments. In practice, this means that the vast majority of consumers of UDP sockets will not acquire any exclusive locks at the socket or UDP levels of the network stack. This leads to a marked performance improvement in several important workloads, including BIND, nsd, and memcached over UDP, as well as significant improvements in pps microbenchmarks. The plan is to MFC all of the rwlock changes to RELENG_7 once they have settled for a weeks in the tree. Tested by: ps, kris (older revision), bde MFC after: 3 weeks	2008-07-15 15:38:47 +00:00
Rui Paulo	b27227029b	Fix commment in typo. M tcp_output.c	2008-07-15 10:32:35 +00:00
Ermal Luçi	7972c979c5	Fix carp(4) panics that can occur during carp interface configuration. Approved by: mlaier (mentor) Reported by: Scott Ullrich MFC after: 1 week	2008-07-14 20:11:51 +00:00
Robert Watson	3144b7d3d3	Slightly rearrange validation of UDP arguments and jail processing in udp_output() so that argument validation occurs before jail processing. Add additional comments explaining what's going on when we process addresses and binding during udp_output(). MFC after: 3 weeks	2008-07-10 16:20:18 +00:00
Bjoern A. Zeeb	078b704233	Pass the ucred along into in{,6}_pcblookup_local for upcoming prison checks. Reviewed by: rwatson	2008-07-10 13:31:11 +00:00
Bjoern A. Zeeb	cdcb11b92c	For consistency take lport as u_short in in{,6}_pcblookup_local. All callers either pass in an u_short or u_int16_t. Reviewed by: rwatson	2008-07-10 13:23:22 +00:00
Robert Watson	1175d9d56d	Apply the MAC label to an outgoing UDP packet when other inpcb properties are processed, meaning that we avoid the cost of MAC label assignment if we're going to drop the packet due to mbuf exhaustion, etc. MFC after: 3 weeks	2008-07-10 09:45:28 +00:00
Bjoern A. Zeeb	e5cf427baf	For consistency with the rest of the function use the locally cached pointer pcbinfo rather than inp->inp_pcbinfo. MFC after: 3 weeks	2008-07-09 19:03:06 +00:00
Randall Stewart	fc14de76f4	1) Adds the rest of the VIMAGE change macros 2) Adds some __UserSpace__ on some of the common defines that the user space code needs 3) Fixes a bug when we send up data to a user that failed. We need to a) trim off the data chunk headers, if present, and b) make sure the frag bit is communicated properly for the msgs coming off the stream queues... i.e. we see if some of the msg has been taken. Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!	2008-07-09 16:45:30 +00:00
Robert Watson	7b709f8ad4	Provide some initial chicken-scratching annotations of locking for struct inpcb. Prodded by: bz MFC after: 3 days	2008-07-08 17:22:59 +00:00
Robert Watson	ac9ae27991	Allow udp_notify() to accept read, as well as write, locks on the passed inpcb. When directly invoking udp_notify() from udp_ctlinput(), acquire only a read lock; we may still see write locks in udp_notify() as the in_pcbnotifyall() routine is shared with TCP and always uses a write lock on the inpcb being notified. MFC after: 1 month	2008-07-07 12:27:55 +00:00
Robert Watson	c4d585aefe	Add additional udbinfo and inpcb locking assertions to udp_output(); for some code paths, global or inpcb write locks are required, but for other code paths, read locks or no locking at all are sufficient for the data structures. MFC after: 1 month	2008-07-07 12:14:10 +00:00
Robert Watson	948d0fc926	First step towards parallel transmit in UDP: if neither a specific source or a specific destination address is requested as part of a send on a UDP socket, read lock the inpcb rather than write lock it. This will allow fully parallel transmit down to the IP layer when sending simultaneously from multiple threads on a connected UDP socket. Parallel transmit for more complex cases, such as when sendto(2) is invoked with an address and there's already a local binding, will follow. MFC after: 1 month	2008-07-07 10:56:55 +00:00
Robert Watson	10cc62b7a6	Drop read lock on udbinfo earlier during delivery to the last matching UDP socket for a datagram; the inpcb read lock is sufficient to provide inpcb stability during udp_append(). MFC after: 1 month	2008-07-07 09:26:52 +00:00
Robert Watson	cec9ffee22	Rename raw_append() to rip_append(): the raw_ prefix is generally used for functions in the generic raw socket library (raw_cb.c, raw_usrreq.c), and they are not used for IPv4 raw sockets. MFC after: 3 days	2008-07-05 18:55:03 +00:00
Robert Watson	0ae76120da	Improve approximation of style(9) in raw socket code.	2008-07-05 18:03:39 +00:00
Oleksandr Tymoshenko	06a37c4203	Enqueue de-capsulated packet instead of performing direct dispatch. It's possible to exhaust and garble stack with a packet that contains a couple of hundreds nested encapsulation levels. Submitted by: Ming Fu <fming@borderware.com> Reviewed by: rwatson PR: kern/85320	2008-07-04 21:01:30 +00:00
Robert Watson	59dd72d040	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.	2008-07-04 00:21:38 +00:00
Bjoern A. Zeeb	62ee136457	Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029, generating an RTM_MISS for every IP packet forwarded making user space routing daemons unhappy. PR: kern/123621, kern/124540, kern/122338 Reported by: Paul <paul gtcomm.net>, Mike Tancsa <mike sentex.net> on net@ Tested by: Paul and Mike Reviewed by: andre MFC after: 3 days	2008-07-03 12:44:36 +00:00
Robert Watson	5df3e83946	Add soreceive_dgram(9), an optimized socket receive function for use by datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets. This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers significant performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP. Tested by: kris, ps	2008-07-02 23:23:27 +00:00
Robert Watson	119d85f6e0	In udp_append() and udp_input(), make use of read locking on incpbs rather than write locking: while we need to maintain a valid reference to the inpcb and fix its state, no protocol layer state is modified during an IPv4 UDP receive -- there are only changes at the socket layer, which is separately protected by socket locking. While parallel concurrent receive on a single UDP socket is currently relatively unusual, introducing read locking in the transmit path, allowing concurrent receive and transmit, will significantly improve performance for loads such as BIND, memcached, etc. MFC after: 2 months Tested by: gnn, kris, ps	2008-06-30 18:26:43 +00:00
Oleksandr Tymoshenko	cf77b84879	In case of interface initialization failure remove struct in_ifaddr* from in_ifaddrhashtbl in in_ifinit because error handler in in_control removes entries only for AF_INET addresses. If in_ifinit is called for the cloned inteface that has just been created its address family is not AF_INET and therefor LIST_REMOVE is not called for respective LIST_INSERT_HEAD and freed entries remain in in_ifaddrhashtbl and lead to memory corruption. PR: kern/124384	2008-06-24 13:58:28 +00:00
Alexander Motin	48ca67bea6	Partially revert previous commit. DeleteLink() does not deletes permanent links so we should be aware of it and try to delete every link only once or we will loop forever.	2008-06-22 11:39:42 +00:00
Alexander Motin	ea29dd9241	Implement UDP transparent proxy support. PR: bin/54274 Submitted by: Nicolai Petri <nicolai@petri.cc>	2008-06-21 20:18:57 +00:00
Alexander Motin	b46d3e21bb	Add support for PORT/EPRT FTP commands in lowercase. Use strncasecmp() instead of huge local implementation to reduce code size. Check space presence after command/code. PR: kern/73034	2008-06-21 16:22:56 +00:00
Stephan Uphoff	606a2669cf	Change incorrect stale cookie detection in syncookie_lookup() that prematurely declared a cookie as expired. Reviewed by: andre@, silby@ Reported by: Yahoo!	2008-06-16 20:08:22 +00:00
Stephan Uphoff	104ac85378	Fix a check in SYN cache expansion (syncache_expand()) to accept packets that arrive in the receive window instead of just on the left edge of the receive window. This is needed for correct behavior when packets are lost or reordered. PR: kern/123950 Reviewed by: andre@, silby@ Reported by: Yahoo!, Wang Jin MFC after: 1 week	2008-06-16 19:56:59 +00:00
Randall Stewart	97a7b90ff3	More prep for Vimage: - only one functino to destroy an SCTP stack sctp_finish() - Make it so this function also arranges for any threads created by the image to do a kthread_exit()	2008-06-15 12:31:23 +00:00
Randall Stewart	9b02321796	- Fixes foobar on my part. Some missing virtualization macros from specific logging cases.	2008-06-14 13:24:49 +00:00
Randall Stewart	b3f1ea41fd	- Macro-izes the packed declaration in all headers. - Vimage prep - these are major restructures to move all global variables to be accessed via a macro or two. The variables all go into a single structure. - Asconf address addition tweaks (add_or_del Interfaces) - Fix rwnd calcualtion to be more conservative. - Support SACK_IMMEDIATE flag to skip delayed sack by demand of peer. - Comment updates in the sack mapping calculations - Invarients panic added. - Pre-support for UDP tunneling (we can do this on MAC but will need added support from UDP to get a "pipe" of UDP packets in. - clear trace buffer sysctl added when local tracing on. Note the majority of this huge patch is all the vimage prep stuff :-)	2008-06-14 07:58:05 +00:00
Jack F Vogel	6c5087a818	Add generic TCP LOR into netinet	2008-06-11 22:12:50 +00:00
Max Laier	1ead26d4e1	Sort IP addresses before hashing them for the signature. Otherwise carp is sensitive to address configuration order. PR: kern/121574 Reported by: Douglas K. Rand, Wouter de Jong Obtained from: OpenBSD (rev 1.114 + fixes) MFC after: 2 weeks	2008-06-02 18:58:07 +00:00
Robert Watson	53640b0e3a	When allocating temporary storage to hold a TCP/IP packet header template, use an M_TEMP malloc(9) allocation rather than an mbuf with mtod(9) and dtom(9). This eliminates the last use of dtom(9) in TCP. MFC after: 3 weeks	2008-06-02 14:20:26 +00:00
Alexander Motin	ef30318ee9	Increase LINK_TABLE_OUT_SIZE from 101 to 4001 like LINK_TABLE_IN_SIZE to reduce performance degradation under heavy outgoing scan/flood. Scalability is now much more important then several kilobytes of RAM. Remove unneded TCP-specific expiration handeling. Before this connected TCP sessions could never expire. Now connected TCP sessions will expire after 24hours of inactivity. Simplify HouseKeeping() to avoid several mul/div-s per packet. Taking into account increased LINK_TABLE_OUT_SIZE, precision is still much more then required.	2008-06-01 18:34:58 +00:00
Alexander Motin	efc66711f9	Make m_megapullup() more intelligent: - to increase performance do not reallocate mbuf when possible, - to support up to 16K packets (was 2K max) use mbuf cluster of proper size. This change depends on recent ng_nat and ip_fw_nat changes.	2008-06-01 17:52:40 +00:00
Alexander Motin	1913488d10	PKT_ALIAS_FOUND_HEADER_FRAGMENT result is not an error, so pass that packet. This fixes packet fragmentation handeling. Pass really available buffer size to libalias instead of MCLBYTES constant. MCLBYTES constant were used with believe that m_megapullup() always moves date into a fresh cluster that sometimes may become not so.	2008-06-01 12:29:23 +00:00
Alexander Motin	aac54f0a70	Fix packet fragmentation support broken by copy/paste error in rev.1.60. ip_id should be u_short, but not u_char.	2008-06-01 11:47:04 +00:00
Robert Watson	c28cb4d82f	Read lock rather than write lock TCP inpcbs in monitoring sysctls. In some cases, add explicit inpcb locking rather than relying on the global lock, as we dereference inp_socket, but also allowing us to drop the global lock more quickly. MFC after: 1 week	2008-05-29 14:28:26 +00:00
Robert Watson	9622e84fcf	Employ read locks on UDP inpcbs, rather than write locks, when monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly. MFC after: 1 week	2008-05-29 08:27:14 +00:00
Bjoern A. Zeeb	9a38ba8101	Factor out the v4-only vs. the v6-only inp_flags processing in ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options. Requested by: rwatson Reviewed by: gnn MFC after: 10 days	2008-05-24 15:20:48 +00:00
Robert Watson	22c82719cf	Consistently check IPFW and DUMMYNET privileges in the configuration routines for those modules, rather than in the raw socket code. This each privilege check to occur in exactly once place and avoids duplicate checks across layers. MFC after: 3 weeks Sponsored by: nCircle Network Security, Inc.	2008-05-22 08:10:31 +00:00
Randall Stewart	d61374e183	- sctputil.c - If debug is on, the INPKILL timer can deref a freed value. Change so that we save off a type field for display and NULL inp just for good measure. - sctp_output.c - Fix it so in sending to the loopback we use the src address of the inbound INIT. We don't want to do this for non local addresses since otherwise we might be ingressed filtered so we need to use the best src address and list the address sent to. Obtained from: time bug - Neil Wilson MFC after: 1 week	2008-05-21 16:51:21 +00:00
Randall Stewart	c54a18d26b	- Adds support for the multi-asconf (From Kozuka-san) - Adds some prepwork (Not all yet) for vimage in particular support the delete the sctppcbinfo.xx structs. There is still a leak in here if it were to be called plus we stil need the regrouping (From Me and Michael Tuexen) - Adds support for UDP tunneling. For BSD there is no socket yet setup so its disabled, but major argument changes are in here to emcompass the passing of the port number (zero when you don't have a udp tunnel, the default for BSD). Will add some hooks in UDP here shortly (discussed with Robert) that will allow easy tunneling. (Mainly from Peter Lei and Michael Tuexen with some BSD work from me :-D) - Some ease for windows, evidently leave is reserved by their compile move label leave: -> out: MFC after: 1 week	2008-05-20 13:47:46 +00:00

... 2 3 4 5 6 ...

3433 Commits