freebsd-dev

Author	SHA1	Message	Date
Robert Watson	c3ce7a790c	Move flag definitions for t_flags and t_oobflags below the definition of struct tcpcb so that the structure definition is a bit more vertically compact. Can't yet fit it on one printed page, though. MFC after: pretty soon	2008-12-10 11:03:16 +00:00
Kip Macy	65954fda79	unlock when done	2008-12-10 08:23:47 +00:00
Kip Macy	e08ab8576d	don't reference if_addr_mtx directly	2008-12-10 08:22:51 +00:00
Robert Watson	0ca989b376	Update comment on INP_TIMEWAIT to say what it's about, as we caution regarding the misplacement of flags in inp_vflag in an earlier comment. MFC after: pretty soon	2008-12-09 23:57:09 +00:00
Robert Watson	d15fb96522	Enhance one comment relating to recent TCP locking changes, and fix a typo in another. MFC after: 6 weeks	2008-12-09 15:49:02 +00:00
Robert Watson	a5654bb2ae	Move macros defining flags and shortcus to nested structure fields in inpcbinfo below the structure definition in order to make inpcbinfo fit on a single printed page; related style tweaks. MFC after: pretty soon	2008-12-09 10:21:38 +00:00
Robert Watson	252ca42863	Move from solely write-locking the global tcbinfo in tcp_input() to read-locking in the TCP input path, allowing greater TCP input parallelism where multiple ithreads or ithread and netisr are able to run in parallel. Previously, most TCP input paths held a write lock on the global tcbinfo lock, effectively serializing TCP input. Before looking up the connection, acquire a write lock if a potentially state-changing flag is set on the TCP segment header (FIN, RST, SYN), and otherwise a read lock. We may later have to upgrade to a write lock in certain cases (ACKs received by the syncache or during TIMEWAIT) in order to support global state transitions, but this is never required for steady-state packets. Upgrading from a write lock to a read lock must be done as a trylock operation to avoid deadlocks, and actually violates the lock order as the tcbinfo lock preceeds the inpcb lock held at the time of upgrade. If the trylock fails, we bump the refcount on the inpcb, drop both locks, and re-acquire in-order. If another thread has freed the connection while the locks are dropped, we free the inpcb and repeat the lookup (this should hardly ever or never happen in practice). For now, maintain a number of new counters measuring how many times various cases execute, and in particular whether various optimistic assumptions about when read locks can be used, whether upgrades are done using the fast path, and whether connections close in practice in the above-described race, actually occur. MFC after: 6 weeks Discussed with: kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy	2008-12-08 20:27:00 +00:00
Robert Watson	28696211d6	Add a reference count to struct inpcb, which may be explicitly incremented using in_pcbref(), and decremented using in_pcbfree() or inpcbrele(). Protocols using only current in_pcballoc() and in_pcbfree() calls will see the same semantics, but it is now possible for TCP to call in_pcbref() and in_pcbrele() to prevent an inpcb from being freed when both tcbinfo and per-inpcb locks are released. This makes it possible to safely transition from holding only the inpcb lock to both tcbinfo and inpcb lock without re-looking up a connection in the input path, timer path, etc. Notice that in_pcbrele() does not unlock the connection after decrementing the refcount, if the connection remains, so that the caller can continue to use it; in_pcbrele() returns a flag indicating whether or not the inpcb pointer is still valid, and in_pcbfee() is now a simple wrapper around in_pcbrele(). MFC after: 1 month Discussed with: bz, kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy	2008-12-08 20:18:50 +00:00
Christian S.J. Peron	4e57bc3338	in_rtalloc1(9) returns a locked route, so make sure that we use RTFREE_LOCKED() here. This macro makes sure the reference count on the route is being managed properly. This elimates another case which results in the following message being printed to the console: rtfree: 0xc841ee88 has 1 refs Reviewed by: bz MFC after: 2 weeks	2008-12-06 19:09:38 +00:00
Randall Stewart	830d754d52	Code from the hack-session known as the IETF (and a bit of debugging afterwards): - Fix protection code for notification generation. - Decouple associd from vtag - Allow vtags to have less strigent requirements in non-uniqueness. o don't pre-hash them when you issue one in a cookie. o Allow duplicates and use addresses and ports to discriminate amongst the duplicates during lookup. - Add support for the NAT draft draft-ietf-behave-sctpnat-00, this is still experimental and needs more extensive testing with the Jason Butt ipfw changes. - Support for the SENDER_DRY event to get DTLS in OpenSSL working with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon). - Update the support of SCTP-AUTH by Peter Lei. - Use macros for refcounting. - Fix MTU for UDP encapsulation. - Fix reporting back of unsent data. - Update assoc send counter handling to be consistent with endpoint sent counter. - Fix a bug in PR-SCTP. - Fix so we only send another FWD-TSN when a SACK arrives IF and only if the adv-peer-ack point progressed. However we still make sure a timer is running if we do have an adv_peer_ack point. - Fix PR-SCTP bug where chunks were retransmitted if they are sent unreliable but not abandoned yet. With the help of: Michael Teuxen and Peter Lei :-) MFC after: 4 weeks	2008-12-06 13:19:54 +00:00
Gleb Smirnoff	0b476f1cce	In a case of CARP status change run through the if_link_state_change() routine, so that devd(8) and others are notified about link state change.	2008-12-05 14:37:14 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Marko Zec	5c890d3c4f	Add an essential .h file that skipped from the last commit (r185419). Pointy hat #1 on... Pointed out by: bz	2008-11-28 23:39:25 +00:00
Marko Zec	f02493cbbd	Unhide declarations of network stack virtualization structs from underneath #ifdef VIMAGE blocks. This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues. In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-28 23:30:51 +00:00
Dag-Erling Smørgrav	3b6fe5fcd9	missing V_	2008-11-28 13:13:44 +00:00
Bjoern A. Zeeb	5cd54324ee	Replace most INP_CHECK_SOCKAF() uses checking if it is an IPv6 socket by comparing a constant inp vflag. This is expected to help to reduce extra locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks	2008-11-27 13:19:42 +00:00
Bjoern A. Zeeb	6aee2fc550	Merge in6_pcbfree() into in_pcbfree() which after the previous IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place. Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-27 12:04:35 +00:00
Bjoern A. Zeeb	6974bd9e75	Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy. Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical. Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks () () possibly need to leave a stub wrappers in 7 to keep the symbols.	2008-11-27 10:43:08 +00:00
Marko Zec	97021c2464	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
Bjoern A. Zeeb	0206cdb846	Remove in6_pcbdetach() as it is exactly the same function as in_pcbdetach() and we don't need the code twice. Reviewed by: rwatson MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-26 20:52:26 +00:00
Bjoern A. Zeeb	a7df09e8c9	Unify the v4 and v6 versions of pcbdetach and pcbfree as good as possible so that they are easily diffable. No functional changes. Reviewed by: rwatson MFC after: 6 weeks	2008-11-26 12:54:31 +00:00
Julian Elischer	bc97ba5100	Fix a scope problem in the multiple routing table code that stopped the SO_SETFIB socket option from working correctly. Obtained from: Ironport MFC after: 3 days	2008-11-19 19:19:30 +00:00
Marko Zec	44e33a0758	Change the initialization methodology for global variables scheduled for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-19 09:39:34 +00:00
Randall Stewart	a1e132720b	-Improvement: Add '\n' on debug output in sctp_lower_sosend(). -Improvement: panic() on INVARIANTS kernels if memory allocation fails for a tagblock in sctp_add_vtag_to_timewait(). -Bugfix: Protect code in sctp_is_in_timewait() by SCTP_INP_INFO_WLOCK/SCTP_INP_INFO_WUNLOCK. -Cleanup: Get rid of unused variable now in sctp_init_asoc(). -Bugfix: Reuse the correct vtag in sctp_add_vtag_to_timewait(). -Cleanup: Get rid of unused constant SCTP_TIME_WAIT_SHORT in sctp_constants.h. -Improvement: Use all hash buckets of the vtag hash table. -Cleanup: Get rid of then unused constant SCTP_STACK_VTAG_HASH_SIZE_A. -Bugfix: Handle SHUTDOWN;SACK packet correctly. -Bugfix: Last TSN in a gap ack block was not being "ack'd" in the internal scoreboard. Obtained from: (with help from Michael Tuexen)	2008-11-12 14:16:39 +00:00
Bjoern A. Zeeb	687a9b4738	For consistency work on the local object passed into the function for the lock operation instead using the global name. Submitted by: ganbold MFC after: 2 months	2008-11-09 14:06:44 +00:00
Bjoern A. Zeeb	8e5c87f4b6	Fix typo and while here another one. Reviewed by: keramida Reported by: keramida MFC after: 2 months (with r184720)	2008-11-06 16:30:20 +00:00
Bjoern A. Zeeb	91d6cfa6b1	Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. Move the TSO logic back to tcp_mss() and out of tcp_mss_update(). We tried to avoid that initially but if were are called from tcp_output() with EMSGSIZE, we cleared the TSO flag on the tcpcb there, called into tcp_mtudisc() and tcp_mss_update() which then would reenable TSO on the tcpcb based on TSO capabilities of the interface as learnt in tcp_maxmtu/6(). So if TSO was enabled on the (possibly new) outgoing interface it was turned back on, which lead to an endless loop between tcp_output() and tcp_mtudisc() until we overflew the stack. Reported by: kmacy MFC after: 2 months (along with r182851)	2008-11-06 13:25:59 +00:00
Bjoern A. Zeeb	4b3f4d3818	Adopt the comment for tcp_maxmtu(); we are returning a number not a pointer. While here update the rest of the comment to better match what we have these days. MFC after: 2 months	2008-11-06 12:59:00 +00:00
Bjoern A. Zeeb	6f01cac68a	Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. In case we return early and got a metricptr to pass the hostcache info back to the caller we need to initialize the data to a defined state (zero it) as tcp_hc_get() would do if there was no hit. Without that the caller would check on random stack garbage which could lead to undefined results. This only affected tcp_mss() if there was no routing entry for the peer, tcp_mtudisc() was not affected. MFC after: 2 months (along with r182851)	2008-11-06 12:33:33 +00:00
Oleg Bulyzhin	02d09f7901	Type of q_time (start of queue idle time) has changed: uint32_t -> uint64_t. This should fix q_time overflow, which happens after 2^32/(86400*hz) days of uptime (~50days for hz = 1000). q_time overflow cause following: - traffic shaping may not work in 'fast' mode (not enabled by default). - incorrect average queue length calculation in RED/GRED algorithm. NB: due to ABI change this change is not applicable to stable. PR: kern/128401	2008-10-28 14:14:57 +00:00
Randall Stewart	73adc48f49	More issues with pre-blocking: a) Need for EEOR mode to take the min of the socket buffer size and the add more threshold, otherwise if you are so silly as to set a send buf size less than the add-more you could block forever in eeor mode. b) We were incorrectly using the sysctl vs the calculated value. This causes us to block forever if the addmore theshold is larger than then the socket buffer size.	2008-10-27 14:49:12 +00:00
Randall Stewart	35e4161b1f	Two inter-related bugs. - If we send EXACTLY the size left in the send buffer and then send again, we end up with exactly 0 bytes and don't hit the pre-block code to wait for more space. - If we fall into the loop with our max_len == 0 (the bug above) we then call in to copy out the data, setup the length of the waiting to transmit data to 0 and call the mbuf copy routine which 0 indicates copy all the data to the mbuf chain.. which it does. This then leaves a "stuck" message on the stream queue with its size exactly 0 bytes but all the data there and thus nothing left in the uio structure. We then reach a stuck forever state never being able to send data.	2008-10-27 14:01:23 +00:00
Randall Stewart	a4c651183e	Get rid of ifdef for vimage on version 8 comparison. Now the scrubbing program properly takes care of this.	2008-10-27 13:54:54 +00:00
Randall Stewart	83416c885d	Invariants changes that make more sense.	2008-10-27 13:53:31 +00:00
Robert Watson	dd8ac7f990	In both dropwithreset paths in tcp_input.c, drop the tcbinfo lock sooner to decomplicate locking and eliminate the need for a rather chatty comment about why we have to handle the global lock in a special way for the benefit of ipfw and pf cred rules. MFC after: 3 days	2008-10-26 22:03:52 +00:00
Robert Watson	4c95fd23d6	Remove endearing but syntactically unnecessary "return;" statements directly before the final closeing brackets of some TCP functions. MFC after: 3 days	2008-10-26 19:33:22 +00:00
Bjoern A. Zeeb	460473a071	Style changes only: - Consistently add parentheses to return statements. - Use NULL instead of 0 when comparing pointers, also avoiding unnecessary casts. - Do not use pointers as booleans. Reviewed by: rwatson (earlier version) MFC after: 2 months	2008-10-26 19:17:25 +00:00
Dag-Erling Smørgrav	e11e3f187d	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Bjoern A. Zeeb	7e1bc2729c	Update a comment which to my reading had been misplaced in rev. 1.12 already (but probably had been way above as the code was there twice) and describe what was last changed in rev. 1.199 there (which now is in sync with in6_src.c r184096). Pointed at by: mlaier MFC after: 2 mmonths	2008-10-20 18:56:00 +00:00
Bjoern A. Zeeb	dc3c09c89f	Bring over the change switching from using sequential to random ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143 (initially from OpenBSD) and follow-up commits during the last four and a half years including rev. 1.157, 1.162 and 1.199. This now is relying on the same infrastructure as has been implemented in in_pcb.c since rev. 1.199. Reviewed by: silby, rpaulo, mlaier MFC after: 2 months	2008-10-20 18:43:59 +00:00
Randall Stewart	1b9f62a044	The flags value was not always being copied out in the recv routine like it should be. Obtained from: Michael Tuexen	2008-10-18 15:56:52 +00:00
Randall Stewart	ac29704161	New sockets (accepted) were not inheriting the proper snd/rcv buffer value. Obtained from: Michael Tuexen	2008-10-18 15:56:12 +00:00
Randall Stewart	1862b24533	- Peers rwnd is now available for the MIB. Obtained from: Michael Tuexen	2008-10-18 15:55:15 +00:00
Randall Stewart	fc69c30240	- Adapt layer indication was always being given (it should only be given when the user has enabled it). (Michael Tuexen) - Sack Immediately was not being set properly on the actual chunk, it was only put in the rcvd_flags which is incorrect. (Michael Tuexen) - added an ifndef userspace to one of the already present macro's for inet (Brad Penoff) Obtained from: Michael Tuexen and Brad Penoff MFC after: 4 weeks	2008-10-18 15:54:25 +00:00
Randall Stewart	fcea7c2ed3	Reported by Yehuda Weinraub (yehudasa@gamil.com) - CRC32C algorithm uses incorrect init_bytes value. It SHOULD have the number of bytes to get to a 4 byte boundary. PR: 128134 MFC after: 4 weeks	2008-10-18 15:53:31 +00:00
Bjoern A. Zeeb	f08ef6c595	Add cr_canseeinpcb() doing checks using the cached socket credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock. Reviewed by: rwatson MFC after: 3 months (set timer; decide then)	2008-10-17 16:26:16 +00:00
Marko Zec	3ff0b2135b	Remove a useless global static variable. Approved by: bz (ad-hoc mentor)	2008-10-16 12:31:03 +00:00
Maxim Konovalov	0279bb29a0	o Remove unnecessary parentheses and restore identation. Prodded by: mlaier	2008-10-14 17:47:29 +00:00

1 2 3 4 5 ...

3257 Commits