freebsd-dev

Author	SHA1	Message	Date
Bjoern A. Zeeb	6a9148fe92	Implement UDP control block support. So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel tunneling callbacks. Move that into the udpcb and add a field for flags there to be used by upcoming changes instead of sticking udp only flags into in_pcb flags2. Bump __FreeBSD_version for ports to detect it and because of vnet* struct size changes. Submitted by: jhb (7.x version) Reviewed by: rwatson	2009-05-23 16:51:13 +00:00
Bjoern A. Zeeb	db2e47925e	Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL kernel option. This also permits tuning of the option per virtual network stack, as well as separately per inet, inet6. The kernel option is left for a transition period, marked deprecated, and will be removed soon. Initially requested by: phk (1 year 1 day ago) MFC after: 4 weeks	2009-05-23 16:42:38 +00:00
Bjoern A. Zeeb	f81a8a320c	If including vnet.h one has to include opt_route.h as well. This is because struct vnet_net holds the rt_tables[][] for MRT and array size is compile time dependent. If you had ROUTETABLES set to >1 after r192011 V_loif was pointing into nonsense leading to strange results or even panics for some people. Reviewed by: mz	2009-05-22 23:03:15 +00:00
Robert Watson	62e1ba833c	Consolidate and clean up the first section of ip_output.c in light of the last year or two's work on routing: - Combine iproute initialization and flowtable lookup blocks, eliminating unnecessary tests for known-zero'd iproute fields. - Add a comment indicating (a) why the route entry returned by the flowtable is considered stable and (b) that the flowtable lookup must occur after the setup of the mbuf flow ID. - Assert the inpcb lock before any use of inpcb fields. Reviewed by: kmacy	2009-05-21 09:45:47 +00:00
Qing Li	c9d763bf41	When an interface address is removed and the last prefix route is also being deleted, the link-layer address table (arp or nd6) will flush those L2 llinfo entries that match the removed prefix. Reviewed by: kmacy	2009-05-20 21:07:15 +00:00
Bjoern A. Zeeb	75ac4f3d32	Revert the logical change of r192341. net.inet.ip.fw.one_pass is a classic ip_input.c variable and is used in the pfil and bridge code as well. As ipfw is loadable we need to always provide it. That is the reason why it lives in struct vnet_inet and not in struct vnet_ipfw.	2009-05-18 22:34:44 +00:00
John Baldwin	10f5c8be92	- Fix typo in description of 'net.inet.ip.fw.autoinc_step'. - Use 'vnet_ipfw' instead of 'vnet_inet' for 'net.inet.ip.fw.one_pass'.	2009-05-18 21:46:46 +00:00
Bjoern A. Zeeb	1600c117d8	Unbreak options VIMAGE builds, in a followup to r192011 which did not introduce INIT_VNET_NET() initializers necessary for accessing V_loif. Submitted by: zec Reviewed by: julian	2009-05-17 20:53:10 +00:00
Robert Watson	6d888973c8	Staticize two functions not used outside of in_pcb.c: in_pcbremlists() and db_print_inpcb(). MFC after: 1 month	2009-05-14 20:59:36 +00:00
Qing Li	92fac99477	Ignore the INADDR_ANY address inserted/deleted by DHCP when installing a loopback route to the interface address.	2009-05-14 05:27:09 +00:00
Qing Li	ebc90701ac	This patch adds a host route to an interface address (that is assigned to a non loopback/ppp link types) through the loopback interface. Prior to the new L2/L3 rewrite, this host route is implicitly added by the L2 code during RTM_RESOLVE of that interface address. This host route is deleted when that interface is removed. Reviewed by: kmacy	2009-05-12 07:41:20 +00:00
Warner Losh	573a04c930	Remove bogus comment.	2009-05-09 18:50:01 +00:00
John Baldwin	5f17ebf94d	Convert IPFW_DEFAULT_TO_ACCEPT into a loader tunable 'net.inet.ip.fw.default_to_accept'. The current value can also be queried via a read-only sysctl of the same name. Requested by: plosher MFC after: 1 week	2009-05-09 05:07:36 +00:00
Marko Zec	2114e063f0	A NOP change: style / whitespace cleanup of the noise that slipped into r191816. Spotted by: bz Approved by: julian (mentor) (an earlier version of the diff)	2009-05-08 14:34:25 +00:00
Marko Zec	ddd50c3439	Remove a bogus check that unintentionally slipped in r191816. This change has no functional impact on nooptions VIMAGE builds. Submitted by: bz	2009-05-08 14:28:06 +00:00
Randall Stewart	096ed42dad	repository sync to multi-OS repo ... spaceing change	2009-05-07 16:43:49 +00:00
Randall Stewart	892f1c7141	ABI expansions to hopefully future-proof our MIB/netstat code for 8.0	2009-05-07 16:42:45 +00:00
Marko Zec	94e9f5a1c2	Remove unnecessary CURVNET_SET() calls where curvnet context is (i.e. seems to be) already set. This should reduce console noise due to curvnet recursion reports. This change has no impact on nooptions VIMAGE builds. Approved by: julian (mentor)	2009-05-06 13:30:46 +00:00
Marko Zec	743da3bcdb	Unbreak options VIMAGE kernel builds. Approved by: julian (mentor)	2009-05-06 08:49:39 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Marko Zec	5f416f8e84	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
Marko Zec	d7fcc52895	Unbreak options VIMAGE + nooptions INVARIANTS kernel builds. Submitted by: julian Approved by: julian (mentor)	2009-05-02 05:02:28 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Bruce M Simpson	33cde13046	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00
Bruce M Simpson	9efc1a1bbf	Add MLDv2 prototypes and defines.	2009-04-29 10:20:17 +00:00
Bruce M Simpson	5cf93e5d2c	Use KTR_INET for MROUTING CTRs.	2009-04-29 10:17:08 +00:00
Bruce M Simpson	c566b47669	Cut over to KTR_INET for CTR. For clarity, put pointer incremement/size decrement on own line when copying out in-mode source filters to userland.	2009-04-29 10:14:16 +00:00
Bruce M Simpson	1096332a4a	Do not assume that ip6_moptions is always set, it is a lazy-allocated structure.	2009-04-29 10:13:22 +00:00
Bruce M Simpson	31a3e65dc2	Fix a problem whereby enqueued IGMPv3 filter list changes would be incorrectly output, if the RB-tree enumeration happened to reuse the same chain for a mode switch: that is, both ALLOW and BLOCK records were appended for the same group, in the same mbuf packet chain. This was introduced during an mbuf chain layout bug fix involving m_getptr(), which obviously cannot count from offset 0 on the second pass through the RB-tree when serializing the IGMPv3 group records into the pending mbuf chain. Cut over to KTR_INET for IGMPv3 CTR usage.	2009-04-29 10:12:01 +00:00
Edward Tomasz Napierala	1a4998162e	Don't require packet to match a route (any route; this information wasn't used anyway, so a typical workaround was to add a dummy route) if it's going to be sent through IPSec tunnel. Reviewed by: bz	2009-04-28 11:10:33 +00:00
Oleg Bulyzhin	a3a981b7f9	Optimize packet flow: if net.inet.ip.fw.one_pass != 0 and packet was processed by ipfw once - avoid second ipfw_chk() call. This saves us from unnecessary IPFW_RLOCK(), m_tag_find() calls and ip/tcp/udp header parsing. MFC after: 2 month	2009-04-27 17:37:36 +00:00
Marko Zec	093f25f8c8	In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)	2009-04-26 22:06:42 +00:00
Robert Watson	db091502fb	Acquire IF_ADDR_LOCK() around most iterations over ifp->if_addrhead (colloquially known as if_addrlist). Currently not acquired around interface address loops that call out to the routing code due to potential lock order issues. MFC after: 3 weeks	2009-04-26 19:05:40 +00:00
Robert Watson	588885f2f5	Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial lookup of 'ia' from if_addrhead through most use. Note that we currently have to drop it prematurely in some cases due to calls out to the routing and interface code while using 'ia', but this closes many races. Annotate several potential races that persist after this change. Move to using M_NOWAIT for allocating new interface addresses due to lock(s) being held. MFC after: 3 weeks	2009-04-25 23:02:57 +00:00
Robert Watson	07cde5e92c	In in_purgemaddrs(), remove the inm being freed from the address list before freeing it, rather than vice version, to avoid potential use after free. Reviewed by: bms	2009-04-24 22:11:53 +00:00
Robert Watson	cf7b18f15e	Relocate permissions checking code in in_control() to before the body of the implementation of ioctls. This makes the mapping of ioctls to specific privileges more explicit, and also simplifies the implementation by reducing the use of FALLTHROUGH handling in switch. While this is not intended to be a functional change, it does mean that certain privilege checks are now performed earlier, so EPERM might be returned in preference to EADDRNOTAVAIL for management ioctls that could have failed for both reasons. MFC after: 3 weeks	2009-04-24 09:54:46 +00:00
Robert Watson	bbb3fb6194	Reorganize in_control() so that invariants are more obvious, and so that it is easier to lock: - Handle the unsupported ioctl case at the beginning of in_control(), handing off to ifp->if_ioctl, rather than looking up interfaces and addresses unnecessarily in this case. - Make it an invariant that ifp is always non-NULL when running in_control()-implemented ioctls, simplifying the code structure. MFC after: 3 weeks	2009-04-23 21:41:37 +00:00
Bruce M Simpson	86979280fc	Bracket struct mfc and struct rtdetq with #ifdef _KERNEL. Match the bracketing in netstat. Since the cleanup of MROUTING, ports have broken because they expect to include <netinet/ip_mroute.h> without including <sys/queue.h>. Fix breakage at source. The real fix, of course, is to fix the MROUTING APIs by blowing them away and replacing them with something else...	2009-04-21 12:47:09 +00:00
Bruce M Simpson	5def3edcad	remove IFF_ASSERTGIANT	2009-04-21 09:43:51 +00:00
Robert Watson	4ed6f8c1f1	Prefer actual field names (if_addrhead, ifa_link) to macros aliasing those field names in FreeBSD code. MFC after: 2 weeks	2009-04-20 22:40:44 +00:00
Robert Watson	0aade26e6d	In ip_input(), cache the received mbuf's network interface in a local variable. Acquire the interface address list lock when iterating over the interface address list searching for a matching received broadcast address. MFC after: 2 weeks	2009-04-20 14:35:42 +00:00
Robert Watson	33c4f96d88	In icmp_reflect(), acquire the inteface address list lock when searching for a source address to use. MFC after: 2 weeks Reviewed by: bz	2009-04-20 13:45:39 +00:00
Robert Watson	072b8f8ea7	Lock the interface address list when searching for a matching interface by address, or when implementing 'me' rules on IPv6. Prefer the field name if_addrhead to the macro if_addrlist. MFC after: 2 weeks	2009-04-19 22:34:35 +00:00
Robert Watson	b132600ab2	In divert_packet(), lock the interface address list before iterating over it in search of an address. MFC after: 2 weeks	2009-04-19 22:29:16 +00:00
Robert Watson	9317b04e46	Lock interface address lists in in_pcbladdr() when searching for a source address for a connection and there's no route or now interface for the route. MFC after: 2 weeks	2009-04-19 22:25:09 +00:00
Robert Watson	8021456a24	Protect against some writer-writer races in in_control() by acquiring the interface address list lock around interface address list modifications. More to do here. MFC after: 2 weeks	2009-04-19 22:16:19 +00:00
Bruce M Simpson	b5fbc0b98f	Now that IFF_NEEDSGIANT has been removed from the network stack, catch up with this in IGMPv3 and remove dead code. This has the side-effect of not being back-portable to RELENG_7 w/o further changes.	2009-04-19 08:14:21 +00:00
Kip Macy	65111ec7aa	- Allocate a small flowtable in ip_input.c (changeable by tuneable) - Use for accelerating ip_output	2009-04-19 04:44:05 +00:00
Kip Macy	ab25fa3558	s/void/void */	2009-04-16 23:02:56 +00:00
Kip Macy	114f15c686	restore spare pointers for MFCing	2009-04-16 22:47:43 +00:00
Kip Macy	279aa3d419	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00
Kip Macy	8b12a7c2a6	- convert pspare pointers in inpcb to an llentry and rtentry cache - add flags to indicate their validity	2009-04-15 22:22:00 +00:00
Kip Macy	773b573a96	- add second flags field to to inpcb - update comments in vflag	2009-04-15 22:09:42 +00:00
Kip Macy	82c33e73f2	provide additional convenience macros for inpcb locking (upgrade, downgrade, exclusive)	2009-04-15 21:39:56 +00:00
Kip Macy	582b6122ab	make LLTABLE visible to netinet	2009-04-15 20:49:59 +00:00
Kip Macy	de4ab55e43	add an llentry to struct route{_in6} to allow it to be passed around with the rtentry	2009-04-15 20:34:19 +00:00
Randall Stewart	e261340ef7	Add missing address lock when we look at the ifa list	2009-04-14 19:20:27 +00:00
Randall Stewart	544e35bd97	Move the flight size reduction to right after we recognize its a retransmit, ahead of the PR-SCTP work. Without this fix, we end up NOT reducing flight size and causing an miscalculation when PR-SCTP is active and data is skipped. Obtained from: Michael Tuexen.	2009-04-14 07:50:29 +00:00
Robert Watson	de231a063a	Put TCPSTAT_ADD() and TCPSTAT_INC() behind _KERNEL. MFC after: 3 days	2009-04-12 21:28:35 +00:00
Robert Watson	6bf65bcf3a	Update stats in struct carpstats using two new macros: CARPSTATS_ADD() and CARPSTATS_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure. MFC after: 3 days	2009-04-12 14:19:37 +00:00
Robert Watson	07cf7ab29c	Update stats in struct pimstat using two new macros: PIMSTAT_ADD() and PIMSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure. MFC after: 3 days	2009-04-12 14:06:26 +00:00
Robert Watson	fb83a36856	Update stats in struct mrtstat using two new macros: MRTSTAT_ADD() and MRTSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure. MFC after: 3 days	2009-04-12 14:00:36 +00:00
Robert Watson	bd88cce2ed	Update stats in struct igmpstat using two new macros: IGMPSTAT_ADD() and IGMPSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-12 13:41:13 +00:00
Robert Watson	e27b0c8775	Update stats in struct icmpstat and icmp6stat using four new macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and ICMP6STAT_INC(), rather than directly manipulating the fields of these structures across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. In on case, icmp6stat members are manipulated indirectly, by icmp6_errcount(), and this will require further work to fix for per-CPU stats. MFC after: 3 days	2009-04-12 13:22:33 +00:00
Robert Watson	026decb8f3	Update stats in struct udpstat using two new macros, UDPSTAT_ADD() and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-12 11:42:40 +00:00
Robert Watson	86425c62a0	Update stats in struct ipstat using four new macros, IPSTAT_ADD(), IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-11 23:35:20 +00:00
Robert Watson	78b5071407	Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-11 22:07:19 +00:00
Paolo Pisati	50d25dda1b	What's the point of adjusting a checksum if we are going to toss the packet? Anticipate the check/return code.	2009-04-11 15:26:31 +00:00
Paolo Pisati	ea80b0ac03	Plug two bugs introduced with modules conversion: -UdpAliasIn(): correctly check return code after modules ran. -alias_nbt: in case of malformed packets (or some other unrecoverable error), toss the packet.	2009-04-11 15:19:09 +00:00
Paolo Pisati	1cd68a24c7	Remove stale comments.	2009-04-11 15:05:19 +00:00
Marko Zec	bfe1aba468	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
Kip Macy	80cb9f211a	Import "flowid" support for serializing flows across transmit queues Reviewed by: rwatson and jeli	2009-04-10 06:16:14 +00:00
Luigi Rizzo	4bb7ae9deb	Add emulation of delay profiles, which lets you model various types of MAC overheads such as preambles, link level retransmissions and more. Note- this commit changes the userland/kernel ABI for pipes (but not for ordinary firewall rules) so you need to rebuild kernel and /sbin/ipfw to use dummynet features. Please check the manpage for details on the new feature. The MFC would be trivial but it breaks the ABI, so it will be postponed until after 7.2 is released. Interested users are welcome to apply the patch manually to their RELENG_7 tree. Work supported by the European Commission, Projects Onelab and Onelab2 (contract 224263).	2009-04-09 12:46:00 +00:00
Randall Stewart	abe15ad66c	Fix a FR bug. When doing PR-SCTP with number rtx set to a low number. The check for skipping was in the incorrect place. Which meant we would FR chunks we should not. MFC after: 1 Month	2009-04-08 12:52:05 +00:00
Randall Stewart	e29d4aa6bd	Add more padding and a new variable. This will help us be able to keep ABI compatibility between 8 and 9. MFC after: Never	2009-04-08 12:49:36 +00:00
Paolo Pisati	43197d291a	-don't pass down, to module's fingerprint function, unused data like a pointer to the ip header. -style -spacing	2009-04-08 11:56:49 +00:00
Bjoern A. Zeeb	970caf60dd	With the right comparison we get a proper wscale value and thus more adequate TCP performance with IPv6. Changes for IPv4, r166403 and r172795, both ignored the IPv6 counterpart and left it in the state of art of year 2000. The same logic in syncache already shares code between v4 and v6 so things do not need to be adapted there. Reported by: Steinar Haug (sthaug nethelp.no) Tested by: Steinar Haug (sthaug nethelp.no) MFC after: 3 days	2009-04-07 14:42:40 +00:00
Marko Zec	1ed81b739e	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00
Alexander Kabaev	024a4bd626	If KTR_SUBSYS is compiled in, it does not necessarily mean that user is interested in being spammed by mcast-related printfs. Use proper check against ktr_mask instead KTR_COMPILE.	2009-04-05 23:25:06 +00:00
Bruce M Simpson	448895b7fc	Fix mbuf chain layout pessimization: in the case where a single mbuf is allocated due to m_getcl() returning NULL, we already call MH_ALIGN, so do not increment m->m_data in this case. Found during MLDv2 port.	2009-04-04 15:32:23 +00:00
Bruce M Simpson	0fd99912de	Do not obliterate QQI with MAXRESP. Found during MLDv2 port.	2009-04-04 15:26:32 +00:00
Randall Stewart	8933fa13b6	Many bug fixes (from the IETF hack-fest): - PR-SCTP had major issues when skipping through a multi-part message. o Did not look at socket buffer. o Did not properly handle the reassmebly queue. o The MARKED segments could interfere and un-skip a chunk causing a problem with the proper FWD-TSN. o No FR of FWD-TSN's was being done. - NR-Sack code was basically disabled. It needed fixes that never got into the real code. - CMT code had issues when the two paths were NOT the same b/w. We found a few small bugs, but also the critcal one here was not dividing the rwnd amongst the paths. Obtained from: Michael Tuexen and myself at the IETF hack-fest ;-)	2009-04-04 11:43:32 +00:00
Paolo Pisati	eb2e411915	Implement an ipfw action to reassemble ip packets: reass.	2009-04-01 20:23:47 +00:00
Bruce M Simpson	5b35d05538	Don't call m_freem() after ip_output(), as it always consumes the mbuf chain provided to it. Found by: Pierre Guinoiseau	2009-03-24 01:22:12 +00:00
Juli Mallett	34f27ade44	Remove local in6_addr variables for local and foreign addresses in sysctl_drop, they were passed uninitialized to in6_pcblookup_hash. Instead, do as is done for IPv4 and use the addresses within the sockaddr structure, which are correctly populated. This fixes tcpdrop(8) for IPv6 address pairs. Reviewed by: bz	2009-03-22 00:45:47 +00:00
Bruce M Simpson	545dff6fd1	Fix brainos introduced during mechanical KTR change. Pointy hat to: bms	2009-03-20 13:13:50 +00:00
Bruce M Simpson	98b59af731	Cleanup: Nuke debug.mrtdebug, and replace it with KTR.	2009-03-19 14:14:21 +00:00
Bruce M Simpson	443fc3176d	Introduce a number of changes to the MROUTING code. This is purely a forwarding plane cleanup; no control plane code is involved. Summary: * Split IPv4 and IPv6 MROUTING support. The static compile-time kernel option remains the same, however, the modules may now be built for IPv4 and IPv6 separately as ip_mroute_mod and ip6_mroute_mod. * Clean up the IPv4 multicast forwarding code to use BSD queue and hash table constructs. Don't build our own timer abstractions when ratecheck() and timevalclear() etc will do. * Expose the multicast forwarding cache (MFC) and virtual interface table (VIF) as sysctls, to reduce netstat's dependence on libkvm for this information for running kernels. * bandwidth meters however still require libkvm. * Make the MFC hash table size a boot/load-time tunable ULONG, net.inet.ip.mfchashsize (defaults to 256). * Remove unused members from struct vif and struct mfc. * Kill RSVP support, as no current RSVP implementation uses it. These stubs could be moved to raw_ip.c. * Don't share locks or initialization between IPv4 and IPv6. * Don't use a static struct route_in6 in ip6_mroute.c. The v6 code is still using a cached struct route_in6, this is moved to mif6 for the time being. * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c. v4 path tested using ports/net/mcast-tools. v6 changes are mostly mechanical locking and have not been tested. As these changes partially break some kernel ABIs, they will not be MFCed. There is a lot more work to be done here. Reviewed by: Pavlin Radoslavov	2009-03-19 01:43:03 +00:00
Bruce M Simpson	1975dc405a	Comment IGMP_PIM as being very historic, as in, don't use.	2009-03-19 01:15:26 +00:00
Bruce M Simpson	56663a40eb	Deal with the case where ifma_protospec may be NULL, during any IPv4 multicast operations which reference it. There is a potential race because ifma_protospec is set to NULL when we discover the underlying ifnet has gone away. This write is not covered by the IF_ADDR_LOCK, and it's difficult to widen its scope without making it a recursive lock. It isn't clear why this manifests more quickly with 802.11 interfaces, but does not seem to manifest at all with wired interfaces. With this change, the 802.11 related panics reported by sam@ and cokane@ should go away. It is not the right fix, that requires more thought before 8.0. Idea from: sam Tested by: cokane	2009-03-17 14:41:54 +00:00
Robert Watson	e5adda3d51	Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines. Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers: if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd Drivers that were already disabled because of tty changes: if_ppp if_sl Discussed on: arch@	2009-03-15 14:21:05 +00:00
Robert Watson	ad71fe3c35	Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers): INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account. MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz	2009-03-15 09:58:31 +00:00
Randall Stewart	49633f4b36	Opps.. I missed a file on the commit :-)	2009-03-14 23:13:16 +00:00
David Schultz	b3c11b5b91	Namespace: Defining htonl() and friends here instead of arpa/inet.h is a BSD extension.	2009-03-14 20:16:54 +00:00
Randall Stewart	0c0982b80c	Fixes several PR-SCTP releated bugs. - When sending large PR-SCTP messages over a lossy link we would incorrectly calculate the fwd-tsn - When receiving large multipart pr-sctp packets we would incorrectly send back a SACK that would renege improperly on already received packets thus causing unneeded retransmissions.	2009-03-14 13:42:13 +00:00
Robert Watson	111d57a69c	Add INP_INHASHLIST flag for inpcb->inp_flags to indicate whether or not the inpcb is currenty on various hash lookup lists, rather than using (lport != 0) to detect this. This means that the full 4-tuple of a connection can be retained after close, which should lead to more sensible netstat output in the window between TCP close and socket close. MFC after: 2 weeks	2009-03-11 00:29:22 +00:00
Robert Watson	4cf172fd65	Remove unused v6 macro aliases for inpcb fields: in6p_ip6_nxt in6p_vflag in6p_flags in6p_socket in6p_lport in6p_fport in6p_ppcb Remove unused v6 macro aliases for inpcb flags: IN6P_HIGHPORT IN6P_LOWPORT IN6P_ANONPORT IN6P_RECVIF IN6P_MTUDISC IN6P_FAITH IN6P_CONTROLOPTS References to in6p_lport and in6_fport in sockstat are also replaced with normal inp_lport and inp_fport references. MFC after: 3 days Reviewed by: bz	2009-03-10 17:57:41 +00:00
Bruce M Simpson	30e239fe64	Don't print inm_print() chatter when KTR_IGMPV3 is not enabled in the KTR_COMPILE mask. Found by: gnn	2009-03-10 17:48:49 +00:00
Robert Watson	b9bbb597b1	Remove now-unused INP_UNMAPPABLEOPTS. MFC after: 3 days Discussed with: bz	2009-03-10 11:04:19 +00:00
Bruce M Simpson	c75aa3548f	Fix uninitialized use of ifp for ii. Found by: Peter Holm	2009-03-09 22:54:17 +00:00
Bruce M Simpson	d10910e6ce	Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD IPv4 stack. Diffs are minimized against p4. PCS has been used for some protocol verification, more widespread testing of recorded sources in Group-and-Source queries is needed. sizeof(struct igmpstat) has changed. __FreeBSD_version is bumped to 800070.	2009-03-09 17:53:05 +00:00
Marius Strobl	c89c8a1029	On architectures with strict alignment requirements compensate the misalignment of the IP header that prepending the EtherIP header might have caused. PR: 131921 MFC after: 1 week	2009-03-07 19:08:58 +00:00
Randall Stewart	5171328bd6	Fixes for window probes: 1) WP should never be marked unless flight size is 0 2) When recovering from wp if the peer ack's it we don't mark for retran 3) When recovering, we must assure a timer is still running.	2009-03-06 11:03:52 +00:00
Randall Stewart	dfb11ef895	- PR-SCTP bug, where the CUM-ACK was not being updated into the advance_peer_ack point so we would incorrectly send a wrong value in the FWD-TSN - PR-SCTP bug, where an PR packet is used for a window probe which could incorrectly get the packet moved back into the send_queue, which will cause major issues and should not happen. - Fix a trace to use the proper macro.	2009-03-04 20:54:42 +00:00
Bruce M Simpson	8b889dbb9e	In ip_output(), do not acquire the IN_MULTI_LOCK(), and do not attempt to perform a group lookup. This is a socket layer lock, and the bottom half of IP really has no business taking it. Use the value of the in_mcast_loop sysctl to determine if we should loop back by default, in the absence of any multicast socket options. Because the check on group membership is now deferred to the input path, an m_copym() is now required. This should increase multicast send performance where the source has not requested loopback, although this has not been benchmarked or measured. It is also a necessary change for IN_MULTI_LOCK to become non-recursive, which is required in order to implement IGMPv3 in a thread-safe way.	2009-03-04 03:45:34 +00:00
Bruce M Simpson	dd7fd7c07c	Add sysctl net.inet.ip.mcast.loop. This controls whether or not IPv4 multicast sends are looped back to senders by default on a stack-wide basis, rather than relying on the socket option. Note that the sysctl only applies to newly created multicast sockets.	2009-03-04 03:40:02 +00:00
Bruce M Simpson	346e3178ea	Merge header file definitions used by the new IGMPv3 implementation. This is a partial merge. Compatibility defines are retained for the existing IGMPv2 implementation.	2009-03-04 03:22:03 +00:00
Bruce M Simpson	b554b6ca91	Add various defines/macros required by IGMPv3: * MCAST_UNDEFINED state. * in_allhosts() macro (group is 224.0.0.1). This uses a const endian comparison. * IP_MAX_GROUP_SRC_FILTER, IP_MAX_SOCK_SRC_FILTER default resource limits.	2009-03-04 03:01:05 +00:00
Bruce M Simpson	f0dcb78326	Add function ip_checkrouteralert(), which will be used by IGMPv3 to check for the IPv4 Router Alert [RFC2113] option in a pulled-up IP mbuf chain.	2009-03-04 02:51:22 +00:00
Bjoern A. Zeeb	1263305f0c	Start removing IPv6 Type 0 Routing header code. RH0 was deprecated by RFC 5095. While most of the code had been disabled by #if 0 already, leave a bit of infrastructure for possible RH2 code and a log message under BURN_BRIDGES in case a user still tries to send RH0 packets. Reviewed by: gnn (a bit back, earlier version)	2009-03-03 13:12:12 +00:00
Luigi Rizzo	ac6bb60e0a	curr_time is a 64 bit variable so SYSCTL_LONG is not appropriate as a handler. The variable was exported only for debugging, but there is little reason to do it now that the timekeeping is supported by various other variables. For the time being just comment out the sysctl, but I think this should go away.	2009-03-02 22:16:50 +00:00
Luigi Rizzo	0906f40fd8	fw_debug has been unused for ages, so remove it from the list of sysctl_variables. I would also remove it from the VNET record but I am unsure if there is any ABI issue -- so for the time being just mark it as unused in ip_fw.h, and then we will collect the garbage at some appropriate time in the future. MFC after: 3 days	2009-03-02 22:11:48 +00:00
Bjoern A. Zeeb	2bebb49117	Add size-guards evaluated at compile-time to the main struct vnet_* which are not in a module of their own like gif. Single kernel compiles and universe will fail if the size of the struct changes. Th expected values are given in sys/vimage.h. See the comments where how to handle this. Requested by: peter	2009-03-01 11:01:00 +00:00
Robert Watson	8e5057ed20	Remove unreachable code for generating RST segments from tcp_twcheck(); this code became stale when T/TCP support was removed. Discussed with: bz, sam MFC after: 1 month	2009-02-28 22:58:52 +00:00
Randall Stewart	8aae94933f	Fix the add stream feature of strm-reset to really work: - Fix the copy, we can't do a blind copy but must transfer the data from the old to the new. - Fix the ACK processing so we properly stop retransmitting the thing. - Fix it so if we get a retran we will properly reply with the saved response without doing anything. MFC after: 1 month	2009-02-27 20:54:45 +00:00
Bjoern A. Zeeb	33553d6e99	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.	2009-02-27 14:12:05 +00:00
Roman Divacky	af83f5d77c	Change the functions to ANSI in those cases where it breaks promotion to int rule. See ISO C Standard: SS6.7.5.3:15. Approved by: kib (mentor) Reviewed by: warner Tested by: silence on -current	2009-02-24 18:09:31 +00:00
Robert Watson	ce2ae9ab4b	In tcp_usr_shutdown() and tcp_usr_send(), I missed converting NULL checks for the tcpcb, previously used to detect complete disconnection, with INP_DROPPED checks. Correct that, preventing shutdown() from improperly generating a TCP segment with destination IP and port of 0.0.0.0:0. PR: kern/132050 Reported by: david gueluy <david.gueluy at netasq.com> MFC after: 3 weeks	2009-02-24 11:17:50 +00:00
Robert Watson	63d0295c2f	In in_rtqkill(), assert the radix head lock, and pass RTF_RNH_LOCKED to in_rtrequest(); the radix head lock is already acquired before rnh_walktree is called in in_rtqtimo_one(). This avoids a recursive acquisition that is no longer permitted in 8.x due to use of an rwlock for the radix head lock. Reported by: dikshie <dikshie at gmail.com> MFC after: 3 days	2009-02-23 22:57:55 +00:00
Randall Stewart	ea44232b3a	Add the add-stream capability. Still needs more testing.. MFC after: 1 month	2009-02-20 15:03:54 +00:00
Randall Stewart	186414058a	Fix a bug. The sending was being restricted improperly by the max_burst. It should only be gated by cwnd in the lower level send. Obtained from: Michael Tuexen MFC after: 1 week.	2009-02-20 14:33:45 +00:00
Luigi Rizzo	d8d42f3f4e	correct some #include	2009-02-16 15:10:51 +00:00
Luigi Rizzo	35b78b7520	remove dependency on eventhandler.h, we only need a forward declaration	2009-02-16 15:08:41 +00:00
Luigi Rizzo	281c8daea2	remove dependency on net/if.h of this header	2009-02-16 15:07:40 +00:00
Luigi Rizzo	2eef235973	use a const format string in the log message so we can check the arguments (if/when we enable those checks)	2009-02-16 12:09:52 +00:00
Luigi Rizzo	ada55ca0b7	remove unnecessary #include from vnet.h and vinet.h Approved by: Marko Zec	2009-02-15 00:28:28 +00:00
Randall Stewart	eef9e53e55	This commit fixes the issue with alias_sctp.c. No longer do we require SCTP to be in the kernel for the lib to be able to handle SCTP. We do this by moving the CRC32c checksum into libkern/crc32.c and then adjusting all routines to use the common methods. Note that this will improve the performance of iSCSI since they were using the old single 256 bit table lookup versus the slicing 8 algorithm (which gives a 4x speed up in CRC32c calculation :-D) Reviewed by:rwatson, gnn, scottl, paolo MFC after: 4 week? (assuming we MFC the alias_sctp changes)	2009-02-14 11:34:57 +00:00
Randall Stewart	c3b8c73cf1	Have the jail code use the error returned to pass not constant errors. Obtained from: jamie@freebsd.org	2009-02-13 18:44:30 +00:00
Luigi Rizzo	8f2f943e8f	remove unnecessary #include, and document some of the others	2009-02-13 15:37:14 +00:00
Luigi Rizzo	d685b6ee05	Use uint32_t instead of n_long and n_time, and uint16_t instead of n_short. Add a note next to fields in network format. The n_* types are not enough for compiler checks on endianness, and their use often requires an otherwise unnecessary #include <netinet/in_systm.h> The typedef in in_systm.h are still there.	2009-02-13 15:14:43 +00:00
Randall Stewart	4f6b49338e	Move the new rwnd field down to the very end of the xsctp structure. This is where all new fields belong (not that we will be ABI compatiable with 7.x anyway.. sigh).	2009-02-13 14:43:46 +00:00
Randall Stewart	11b14db397	Add padding to then end of the xsctp_xxx structures to allow future changes to be able to maintain ABI compatibility	2009-02-09 17:37:17 +00:00
Randall Stewart	74246b2734	Fix minor spacing problem found by s9indent from last commit.	2009-02-09 11:42:23 +00:00
Randall Stewart	a1f2f7a5a0	Fix INET only build breakage with SCTP - pointy hat to me :-)	2009-02-09 11:41:54 +00:00
Bjoern A. Zeeb	97aa4a517a	Try to remove/assimilate as much of formerly IPv4/6 specific (duplicate) code in sys/netipsec/ipsec.c and fold it into common, INET/6 independent functions. The file local functions ipsec4_setspidx_inpcb() and ipsec6_setspidx_inpcb() were 1:1 identical after the change in r186528. Rename to ipsec_setspidx_inpcb() and remove the duplicate. Public functions ipsec[46]_get_policy() were 1:1 identical. Remove one copy and merge in the factored out code from ipsec_get_policy() into the other. The public function left is now called ipsec_get_policy() and callers were adapted. Public functions ipsec[46]_set_policy() were 1:1 identical. Rename file local ipsec_set_policy() function to ipsec_set_policy_internal(). Remove one copy of the public functions, rename the other to ipsec_set_policy() and adapt callers. Public functions ipsec[46]_hdrsiz() were logically identical (ignoring one questionable assert in the v6 version). Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(), the public function to ipsec_hdrsiz(), remove the duplicate copy and adapt the callers. The v6 version had been unused anyway. Cleanup comments. Public functions ipsec[46]_in_reject() were logically identical apart from statistics. Move the common code into a file local ipsec46_in_reject() leaving vimage+statistics in small AF specific wrapper functions. Note: unfortunately we already have a public ipsec_in_reject(). Reviewed by: sam Discussed with: rwatson (renaming to *_internal) MFC after: 26 days X-MFC: keep wrapper functions for public symbols?	2009-02-08 09:27:07 +00:00
Paolo Pisati	e13710afbd	Silent LINT: add 2 stubs (update_crc32 and sctp_finalize_crc32) to fix LIBALIAS + SCTP_NO_CSUM case.	2009-02-08 03:03:55 +00:00
Paolo Pisati	37ce2656ec	Add SCTP NAT support. Submitted by: CAIA (http://caia.swin.edu.au)	2009-02-07 18:49:42 +00:00
Jamie Gritton	7c2f3cb964	Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of prison_local_ip6 in in6_pcbbind. Approved by: bz (mentor)	2009-02-05 14:25:53 +00:00
Jamie Gritton	b89e82dd87	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)	2009-02-05 14:06:09 +00:00
Randall Stewart	be27fdd0c4	LOR fix - Lock only when calling the actual code that is messing with the UDP tunnel. This means that if two users actually tried to change the tunnel port at the same time interesting things COULD result, but its probably very unlikely to happen :-)	2009-02-03 20:33:28 +00:00
Randall Stewart	a99b67833a	- Cleanup checksum code. - Prepare for CRC offloading, add MIB counters (RS/MT). - Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT). - Bugfix: Handle close() with SO_LINGER correctly when notifications are generated during the close() call(MT). - Bugfix: Generate DRY event when sender is dry during subscription. Only for 1-to-1 style sockets (RS/MT) - Bugfix: Put vtags for the correct amount of time into time-wait (MT). - Bugfix: Clear vtag entries correctly on expiration (MT). - Bugfix: shutdown() indicates ENOTCONN when called for unconnected 1-to-1 style sockets (MT). - Bugfix: In sctp Auth code (PL). - Add support for devices that support SCTP csum offload (igb). - Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS) Obtained from: With help from Peter Lei and Michael Tuexen	2009-02-03 11:04:03 +00:00
Randall Stewart	2f4afd2125	Adds support for SCTP checksum offload. This means we, like TCP and UDP, move the checksum calculation into the IP routines when there is no hardware support we call into the normal SCTP checksum routine. The next round of SCTP updates will use this functionality. Of course the IGB driver needs a few updates to support the new intel controller set that actually does SCTP csum offload too. Reviewed by: gnn, rwatson, kmacy	2009-02-03 11:00:43 +00:00
Luigi Rizzo	6e152a7539	initialize a couple of variables, gcc 4.2.4-4 (linux) reports some possible uninitialized uses and the warning does make sense.	2009-01-28 13:39:01 +00:00
Luigi Rizzo	36cb0db476	For some reason (probably dating ages ago) an #ifdef SYSCTL_NODE / #endif section included a lot of stuff that did not belong there. So split the block in multiple components each around the relevant stuff. This said, I wonder if building a kernel where SYSCTL_NODE is not defined is supported at all. Submitted by: Marta Carbone	2009-01-28 13:11:22 +00:00
Bjoern A. Zeeb	1cecba0fcd	For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week	2009-01-25 10:11:58 +00:00
Bjoern A. Zeeb	de4fbddd5b	Add externs to fix build with VIMAGE_GLOBALS after r187289.	2009-01-22 10:29:09 +00:00
Sam Leffler	cbd1844537	remove too noisy DIAGNOSTIC code Reviewed by: qingli	2009-01-18 07:20:02 +00:00
Paolo Pisati	dd14bc5dca	Silent userland warnings about missing prototypes. Submitted by: Roman Divacky <rdivacky@freebsd.org>	2009-01-15 19:35:23 +00:00
Lawrence Stewart	24cb0f2232	Add TCP Appropriate Byte Counting (RFC 3465) support to kernel. The new behaviour is on by default, and can be disabled by setting the net.inet.tcp.rfc3465 sysctl to 0 to obtain previous behaviour. The patch changes struct tcpcb in sys/netinet/tcp_var.h which breaks the ABI. Bump __FreeBSD_version to 800061 accordingly. User space tools that rely on the size of struct tcpcb (e.g. sockstat) need to be recompiled. Reviewed by: rpaulo, gnn Approved by: gnn, kmacy (mentors) Sponsored by: FreeBSD Foundation	2009-01-15 06:44:22 +00:00
Robert Watson	87e0451806	Since we allow conditional allocation of labels on syncache entries, remove historic assertion that labels are always present.	2009-01-11 20:01:43 +00:00
Bjoern A. Zeeb	813dd6ae5e	Restrict arp, ndp and theoretically the FIB listing (if not read with libkvm) to the addresses of a prison, when inside a jail. [1] As the patch from the PR was pre-'new-arp', add checks to the llt_dump handlers as well. While touching RTM_GET in route_output(), consistently use curthread credentials rather than the creds from the socket there. [2] PR: kern/68189 Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1] Discussed with: rwatson [2] Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 21:57:49 +00:00
Adrian Chadd	8696873dae	Fix fat-fingered comment. Noticed-by: julian	2009-01-09 18:38:57 +00:00
Adrian Chadd	cef2729493	Fix indentation; add FALLTHROUGH. Thanks Max!	2009-01-09 17:21:22 +00:00
Adrian Chadd	4f2e6bfdd8	Better comment what the socket option does. Thanks to Sam Leffler for suggesting this.	2009-01-09 17:18:17 +00:00
Adrian Chadd	4209e01ad7	Comment some potentially confusing logic. Nitpicking by: mlaier MFC after: 2 weeks	2009-01-09 17:16:18 +00:00
Adrian Chadd	be9347e3fe	Implement a new IP option (not compiled/enabled by default) to allow applications to specify a non-local IP address when bind()'ing a socket to a local endpoint. This allows applications to spoof the client IP address of connections if (obviously!) they somehow are able to receive the traffic normally destined to said clients. This patch doesn't include any changes to ipfw or the bridging code to redirect the client traffic through the PCB checks so TCP gets a shot at it. The normal behaviour is that packets with a non-local destination IP address are not handled locally. This can be dealth with some IPFW hackery; modifications to IPFW to make this less hacky will occur in subsequent commmits. Thanks to Julian Elischer and others at Ironport. This work was approved and donated before Cisco acquired them. Obtained from: Julian Elischer and others MFC after: 2 weeks	2009-01-09 16:02:19 +00:00
Bjoern A. Zeeb	5ce0eb7f08	Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related jail-aware. Up to now we returned the first address of the interface for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for programs querying for an address but running inside a jail, as the address returned usually did not belong to the jail. Like for v6, if there was an ifr_addr given on v4, you could probe for more addresses on the interfaces that you were not allowed to see from inside a jail. Return an error (EADDRNOTAVAIL) in that case now unless the address is on the given interface and valid for the jail. PR: kern/114325 Reviewed by: rwatson MFC after: 4 weeks	2009-01-09 13:06:56 +00:00
Hartmut Brandt	c0e9a8a154	Set a minimum of information in the routing message (like version and type) so that generic routing message parsing code can parse the messages for L2 info that are retrieved via the sysctl interface.	2009-01-09 10:58:59 +00:00
Randall Stewart	bbb0e3d9d5	Addresses Roberts comments on comments. Also adds the KASSERT and checks suggested. Reviewed by: The udp tunneling was discussed on net@ under the thread entitled "Heads up -- Thinking about UDP and tunneling"	2009-01-06 13:27:56 +00:00
Randall Stewart	c7c7ea4b5a	Add the ability of an alternate transport protocol to easily tunnel over udp by providing a hook function that will be called instead of appending to the socket buffer.	2009-01-06 12:13:40 +00:00
Robert Watson	a603c811f8	Allow the IP_MINTTL socket option to be set to 0 so that it can be disabled entirely, which is its default state before set to a non-zero value. PR: 128790 Submitted by: Nick Hilliard <nick at foobar dot org> MFC after: 3 weeks	2009-01-03 11:35:31 +00:00
Qing Li	dc49549713	Some modules such as SCTP supplies a valid route entry as an input argument to ip_output(). The destionation is represented in a sockaddr{} object that may contain other pieces of information, e.g., port number. This same destination sockaddr{} object may be passed into L2 code, which could be used to create a L2 entry. Since there exists a L2 table per address family, the L2 lookup function can make address family specific comparison instead of the generic bcmp() operation over the entire sockaddr{} structure. Note in the IPv6 case the sin6_scope_id is not compared because the address is currently stored in the embedded form inside the kernel. The in6_lltable_lookup() has to account for the scope-id if this storage format were to change in the future.	2009-01-03 00:27:28 +00:00
Bjoern A. Zeeb	42d866dd69	For consistency use LLE_IS_VALID() in this 4th place that is actually interested in the (void *)-1 return value hack. This way we can easily identify those special parts of the code.	2008-12-28 21:18:01 +00:00
Qing Li	8eca593c5a	This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages. 2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic. Reviewed by: julian	2008-12-26 19:45:24 +00:00
Kip Macy	5e96c0a13e	Fix missed unlock and reference drop of lle Found by: pho	2008-12-24 05:31:26 +00:00
Bjoern A. Zeeb	f3b28b6bfb	Remove long unused netinet/ipprotosw.h (basically since r82884). Discussed with: rwatson MFC after: 4 weeks	2008-12-23 16:52:03 +00:00
Qing Li	ce9122fd3e	Don't create a bogus ARP entry for 0.0.0.0.	2008-12-23 03:33:32 +00:00
Qing Li	897d75c98e	The proxy-arp code was broken and responds to ARP requests for addresses that are not proxied locally.	2008-12-19 11:07:34 +00:00
Bjoern A. Zeeb	97590249ad	Another step assimilating IPv[46] PCB code: normalize IN6P_* compat flags usage to their equialent INP_* counterpart. Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 13:00:18 +00:00
Bjoern A. Zeeb	dcdb4371ca	Use inc_flags instead of the inc_isipv6 alias which so far had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition. While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1] Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks	2008-12-17 12:52:34 +00:00
Kip Macy	00a46b3122	default to doing lla_lookup with shared afdata lock and returning a shared lock on the lle - thus restoring parallel performance to pre-arpv2 level	2008-12-17 00:14:28 +00:00
Robert Watson	ec313afa3f	IPFW's pfil hook/unhook code ignores the return values of pfil_add_hook() and pfil_remove_hook(), so cast them to (void). MFC after: pretty soon	2008-12-16 15:05:35 +00:00
Kip Macy	848552f31f	ipfw doesn't use the radix node head lock to protect the radix tree - remove acquisition	2008-12-16 11:06:30 +00:00
Kip Macy	3bb87a6c70	check pointer against NULL add new line after declaration for style	2008-12-16 03:18:59 +00:00
Kip Macy	86cd829d64	don't unlock lle if it is NULL	2008-12-16 02:48:12 +00:00
Kip Macy	fbc2ca1bef	unlock and destroy an llentry's lock before freeing Found by: sam	2008-12-16 00:20:49 +00:00
Bjoern A. Zeeb	fc384fa5d6	Another step assimilating IPv[46] PCB code - directly use the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb(). Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change. Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)	2008-12-15 21:50:54 +00:00
Qing Li	6e6b3f7cbc	This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion	2008-12-15 06:10:57 +00:00
Bjoern A. Zeeb	03d8b6fd1b	Add a check, that is currently under discussion for 8 but that we need to keep for 7-STABLE when MFCing in_pcbladdr() to not change the behaviour there. With this a destination route via a loopback interface is treated as a valid and reachable thing for IPv4 source address selection, even though nothing of that network is ever directly reachable, but it is more like a blackhole route. With this the source address will be selected and IPsec can grab the packets before we would discard them at a later point, encapsulate them and send them out from a different tunnel endpoint IP. Discussed on: net Reported by: Frank Behrens <frank@harz.behrens.de> Tested by: Frank Behrens <frank@harz.behrens.de> MFC after: 4 weeks (just so that I get the mail)	2008-12-14 17:47:33 +00:00
Bjoern A. Zeeb	bccd413962	De-virtualize the MD5 context for TCP initial seq number generation and make it a function local variable like we do almost everywhere inside the kernel. Discussed with: rwatson, silby MFC after: 4 weeks	2008-12-13 21:59:18 +00:00
Kip Macy	cdacee3468	version that will compile	2008-12-13 20:34:41 +00:00
Kip Macy	fe6320b468	radix node head lock needs to be held when calling rnh_addaddr	2008-12-13 20:18:05 +00:00
Kip Macy	979245af95	don't acquire lock recursively	2008-12-13 20:16:03 +00:00
Bjoern A. Zeeb	1b193af610	Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation	2008-12-13 19:13:03 +00:00
Bjoern A. Zeeb	86413abf5f	Put a global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL. Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. While there garbage collect a few dead externs from ip6_var.h. Sponsored by: The FreeBSD Foundation	2008-12-11 16:26:38 +00:00
Bjoern A. Zeeb	0750c2ed96	Use the correct INIT_VNET_INET() as the virtualized variable here are in vinet.h not in vinet6.h Sponsored by: The FreeBSD Foundation	2008-12-11 16:05:07 +00:00
Marko Zec	385195c062	Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-12-10 23:12:39 +00:00
Robert Watson	cd416355a8	Remove inconsistent white space from in_pcballoc(). MFC after: pretty soon	2008-12-10 13:24:38 +00:00
Robert Watson	5d04565101	Move syncache flag definitions below data structure, compress some vertical whitespace. MFC after: pretty soon	2008-12-10 11:11:43 +00:00
Robert Watson	c3ce7a790c	Move flag definitions for t_flags and t_oobflags below the definition of struct tcpcb so that the structure definition is a bit more vertically compact. Can't yet fit it on one printed page, though. MFC after: pretty soon	2008-12-10 11:03:16 +00:00
Kip Macy	65954fda79	unlock when done	2008-12-10 08:23:47 +00:00
Kip Macy	e08ab8576d	don't reference if_addr_mtx directly	2008-12-10 08:22:51 +00:00
Robert Watson	0ca989b376	Update comment on INP_TIMEWAIT to say what it's about, as we caution regarding the misplacement of flags in inp_vflag in an earlier comment. MFC after: pretty soon	2008-12-09 23:57:09 +00:00
Robert Watson	d15fb96522	Enhance one comment relating to recent TCP locking changes, and fix a typo in another. MFC after: 6 weeks	2008-12-09 15:49:02 +00:00
Robert Watson	a5654bb2ae	Move macros defining flags and shortcus to nested structure fields in inpcbinfo below the structure definition in order to make inpcbinfo fit on a single printed page; related style tweaks. MFC after: pretty soon	2008-12-09 10:21:38 +00:00
Robert Watson	252ca42863	Move from solely write-locking the global tcbinfo in tcp_input() to read-locking in the TCP input path, allowing greater TCP input parallelism where multiple ithreads or ithread and netisr are able to run in parallel. Previously, most TCP input paths held a write lock on the global tcbinfo lock, effectively serializing TCP input. Before looking up the connection, acquire a write lock if a potentially state-changing flag is set on the TCP segment header (FIN, RST, SYN), and otherwise a read lock. We may later have to upgrade to a write lock in certain cases (ACKs received by the syncache or during TIMEWAIT) in order to support global state transitions, but this is never required for steady-state packets. Upgrading from a write lock to a read lock must be done as a trylock operation to avoid deadlocks, and actually violates the lock order as the tcbinfo lock preceeds the inpcb lock held at the time of upgrade. If the trylock fails, we bump the refcount on the inpcb, drop both locks, and re-acquire in-order. If another thread has freed the connection while the locks are dropped, we free the inpcb and repeat the lookup (this should hardly ever or never happen in practice). For now, maintain a number of new counters measuring how many times various cases execute, and in particular whether various optimistic assumptions about when read locks can be used, whether upgrades are done using the fast path, and whether connections close in practice in the above-described race, actually occur. MFC after: 6 weeks Discussed with: kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy	2008-12-08 20:27:00 +00:00
Robert Watson	28696211d6	Add a reference count to struct inpcb, which may be explicitly incremented using in_pcbref(), and decremented using in_pcbfree() or inpcbrele(). Protocols using only current in_pcballoc() and in_pcbfree() calls will see the same semantics, but it is now possible for TCP to call in_pcbref() and in_pcbrele() to prevent an inpcb from being freed when both tcbinfo and per-inpcb locks are released. This makes it possible to safely transition from holding only the inpcb lock to both tcbinfo and inpcb lock without re-looking up a connection in the input path, timer path, etc. Notice that in_pcbrele() does not unlock the connection after decrementing the refcount, if the connection remains, so that the caller can continue to use it; in_pcbrele() returns a flag indicating whether or not the inpcb pointer is still valid, and in_pcbfee() is now a simple wrapper around in_pcbrele(). MFC after: 1 month Discussed with: bz, kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy	2008-12-08 20:18:50 +00:00
Christian S.J. Peron	4e57bc3338	in_rtalloc1(9) returns a locked route, so make sure that we use RTFREE_LOCKED() here. This macro makes sure the reference count on the route is being managed properly. This elimates another case which results in the following message being printed to the console: rtfree: 0xc841ee88 has 1 refs Reviewed by: bz MFC after: 2 weeks	2008-12-06 19:09:38 +00:00
Randall Stewart	830d754d52	Code from the hack-session known as the IETF (and a bit of debugging afterwards): - Fix protection code for notification generation. - Decouple associd from vtag - Allow vtags to have less strigent requirements in non-uniqueness. o don't pre-hash them when you issue one in a cookie. o Allow duplicates and use addresses and ports to discriminate amongst the duplicates during lookup. - Add support for the NAT draft draft-ietf-behave-sctpnat-00, this is still experimental and needs more extensive testing with the Jason Butt ipfw changes. - Support for the SENDER_DRY event to get DTLS in OpenSSL working with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon). - Update the support of SCTP-AUTH by Peter Lei. - Use macros for refcounting. - Fix MTU for UDP encapsulation. - Fix reporting back of unsent data. - Update assoc send counter handling to be consistent with endpoint sent counter. - Fix a bug in PR-SCTP. - Fix so we only send another FWD-TSN when a SACK arrives IF and only if the adv-peer-ack point progressed. However we still make sure a timer is running if we do have an adv_peer_ack point. - Fix PR-SCTP bug where chunks were retransmitted if they are sent unreliable but not abandoned yet. With the help of: Michael Teuxen and Peter Lei :-) MFC after: 4 weeks	2008-12-06 13:19:54 +00:00
Gleb Smirnoff	0b476f1cce	In a case of CARP status change run through the if_link_state_change() routine, so that devd(8) and others are notified about link state change.	2008-12-05 14:37:14 +00:00
Bjoern A. Zeeb	4b79449e2f	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation	2008-12-02 21:37:28 +00:00
Bjoern A. Zeeb	413628a7e3	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible	2008-11-29 14:32:14 +00:00
Marko Zec	5c890d3c4f	Add an essential .h file that skipped from the last commit (r185419). Pointy hat #1 on... Pointed out by: bz	2008-11-28 23:39:25 +00:00
Marko Zec	f02493cbbd	Unhide declarations of network stack virtualization structs from underneath #ifdef VIMAGE blocks. This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues. In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-28 23:30:51 +00:00
Dag-Erling Smørgrav	3b6fe5fcd9	missing V_	2008-11-28 13:13:44 +00:00
Bjoern A. Zeeb	5cd54324ee	Replace most INP_CHECK_SOCKAF() uses checking if it is an IPv6 socket by comparing a constant inp vflag. This is expected to help to reduce extra locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks	2008-11-27 13:19:42 +00:00
Bjoern A. Zeeb	6aee2fc550	Merge in6_pcbfree() into in_pcbfree() which after the previous IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place. Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-27 12:04:35 +00:00
Bjoern A. Zeeb	6974bd9e75	Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy. Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical. Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks () () possibly need to leave a stub wrappers in 7 to keep the symbols.	2008-11-27 10:43:08 +00:00
Marko Zec	97021c2464	Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-26 22:32:07 +00:00
Bjoern A. Zeeb	0206cdb846	Remove in6_pcbdetach() as it is exactly the same function as in_pcbdetach() and we don't need the code twice. Reviewed by: rwatson MFC after: 6 weeks () () possibly need to leave a stub wrapper in 7 to keep the symbol.	2008-11-26 20:52:26 +00:00
Bjoern A. Zeeb	a7df09e8c9	Unify the v4 and v6 versions of pcbdetach and pcbfree as good as possible so that they are easily diffable. No functional changes. Reviewed by: rwatson MFC after: 6 weeks	2008-11-26 12:54:31 +00:00
Julian Elischer	bc97ba5100	Fix a scope problem in the multiple routing table code that stopped the SO_SETFIB socket option from working correctly. Obtained from: Ironport MFC after: 3 days	2008-11-19 19:19:30 +00:00
Marko Zec	44e33a0758	Change the initialization methodology for global variables scheduled for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-11-19 09:39:34 +00:00
Randall Stewart	a1e132720b	-Improvement: Add '\n' on debug output in sctp_lower_sosend(). -Improvement: panic() on INVARIANTS kernels if memory allocation fails for a tagblock in sctp_add_vtag_to_timewait(). -Bugfix: Protect code in sctp_is_in_timewait() by SCTP_INP_INFO_WLOCK/SCTP_INP_INFO_WUNLOCK. -Cleanup: Get rid of unused variable now in sctp_init_asoc(). -Bugfix: Reuse the correct vtag in sctp_add_vtag_to_timewait(). -Cleanup: Get rid of unused constant SCTP_TIME_WAIT_SHORT in sctp_constants.h. -Improvement: Use all hash buckets of the vtag hash table. -Cleanup: Get rid of then unused constant SCTP_STACK_VTAG_HASH_SIZE_A. -Bugfix: Handle SHUTDOWN;SACK packet correctly. -Bugfix: Last TSN in a gap ack block was not being "ack'd" in the internal scoreboard. Obtained from: (with help from Michael Tuexen)	2008-11-12 14:16:39 +00:00
Bjoern A. Zeeb	687a9b4738	For consistency work on the local object passed into the function for the lock operation instead using the global name. Submitted by: ganbold MFC after: 2 months	2008-11-09 14:06:44 +00:00
Bjoern A. Zeeb	8e5c87f4b6	Fix typo and while here another one. Reviewed by: keramida Reported by: keramida MFC after: 2 months (with r184720)	2008-11-06 16:30:20 +00:00
Bjoern A. Zeeb	91d6cfa6b1	Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. Move the TSO logic back to tcp_mss() and out of tcp_mss_update(). We tried to avoid that initially but if were are called from tcp_output() with EMSGSIZE, we cleared the TSO flag on the tcpcb there, called into tcp_mtudisc() and tcp_mss_update() which then would reenable TSO on the tcpcb based on TSO capabilities of the interface as learnt in tcp_maxmtu/6(). So if TSO was enabled on the (possibly new) outgoing interface it was turned back on, which lead to an endless loop between tcp_output() and tcp_mtudisc() until we overflew the stack. Reported by: kmacy MFC after: 2 months (along with r182851)	2008-11-06 13:25:59 +00:00
Bjoern A. Zeeb	4b3f4d3818	Adopt the comment for tcp_maxmtu(); we are returning a number not a pointer. While here update the rest of the comment to better match what we have these days. MFC after: 2 months	2008-11-06 12:59:00 +00:00
Bjoern A. Zeeb	6f01cac68a	Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. In case we return early and got a metricptr to pass the hostcache info back to the caller we need to initialize the data to a defined state (zero it) as tcp_hc_get() would do if there was no hit. Without that the caller would check on random stack garbage which could lead to undefined results. This only affected tcp_mss() if there was no routing entry for the peer, tcp_mtudisc() was not affected. MFC after: 2 months (along with r182851)	2008-11-06 12:33:33 +00:00
Oleg Bulyzhin	02d09f7901	Type of q_time (start of queue idle time) has changed: uint32_t -> uint64_t. This should fix q_time overflow, which happens after 2^32/(86400*hz) days of uptime (~50days for hz = 1000). q_time overflow cause following: - traffic shaping may not work in 'fast' mode (not enabled by default). - incorrect average queue length calculation in RED/GRED algorithm. NB: due to ABI change this change is not applicable to stable. PR: kern/128401	2008-10-28 14:14:57 +00:00
Randall Stewart	73adc48f49	More issues with pre-blocking: a) Need for EEOR mode to take the min of the socket buffer size and the add more threshold, otherwise if you are so silly as to set a send buf size less than the add-more you could block forever in eeor mode. b) We were incorrectly using the sysctl vs the calculated value. This causes us to block forever if the addmore theshold is larger than then the socket buffer size.	2008-10-27 14:49:12 +00:00
Randall Stewart	35e4161b1f	Two inter-related bugs. - If we send EXACTLY the size left in the send buffer and then send again, we end up with exactly 0 bytes and don't hit the pre-block code to wait for more space. - If we fall into the loop with our max_len == 0 (the bug above) we then call in to copy out the data, setup the length of the waiting to transmit data to 0 and call the mbuf copy routine which 0 indicates copy all the data to the mbuf chain.. which it does. This then leaves a "stuck" message on the stream queue with its size exactly 0 bytes but all the data there and thus nothing left in the uio structure. We then reach a stuck forever state never being able to send data.	2008-10-27 14:01:23 +00:00
Randall Stewart	a4c651183e	Get rid of ifdef for vimage on version 8 comparison. Now the scrubbing program properly takes care of this.	2008-10-27 13:54:54 +00:00
Randall Stewart	83416c885d	Invariants changes that make more sense.	2008-10-27 13:53:31 +00:00
Robert Watson	dd8ac7f990	In both dropwithreset paths in tcp_input.c, drop the tcbinfo lock sooner to decomplicate locking and eliminate the need for a rather chatty comment about why we have to handle the global lock in a special way for the benefit of ipfw and pf cred rules. MFC after: 3 days	2008-10-26 22:03:52 +00:00
Robert Watson	4c95fd23d6	Remove endearing but syntactically unnecessary "return;" statements directly before the final closeing brackets of some TCP functions. MFC after: 3 days	2008-10-26 19:33:22 +00:00
Bjoern A. Zeeb	460473a071	Style changes only: - Consistently add parentheses to return statements. - Use NULL instead of 0 when comparing pointers, also avoiding unnecessary casts. - Do not use pointers as booleans. Reviewed by: rwatson (earlier version) MFC after: 2 months	2008-10-26 19:17:25 +00:00
Dag-Erling Smørgrav	e11e3f187d	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.	2008-10-23 20:26:15 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Bjoern A. Zeeb	7e1bc2729c	Update a comment which to my reading had been misplaced in rev. 1.12 already (but probably had been way above as the code was there twice) and describe what was last changed in rev. 1.199 there (which now is in sync with in6_src.c r184096). Pointed at by: mlaier MFC after: 2 mmonths	2008-10-20 18:56:00 +00:00
Bjoern A. Zeeb	dc3c09c89f	Bring over the change switching from using sequential to random ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143 (initially from OpenBSD) and follow-up commits during the last four and a half years including rev. 1.157, 1.162 and 1.199. This now is relying on the same infrastructure as has been implemented in in_pcb.c since rev. 1.199. Reviewed by: silby, rpaulo, mlaier MFC after: 2 months	2008-10-20 18:43:59 +00:00
Randall Stewart	1b9f62a044	The flags value was not always being copied out in the recv routine like it should be. Obtained from: Michael Tuexen	2008-10-18 15:56:52 +00:00
Randall Stewart	ac29704161	New sockets (accepted) were not inheriting the proper snd/rcv buffer value. Obtained from: Michael Tuexen	2008-10-18 15:56:12 +00:00
Randall Stewart	1862b24533	- Peers rwnd is now available for the MIB. Obtained from: Michael Tuexen	2008-10-18 15:55:15 +00:00
Randall Stewart	fc69c30240	- Adapt layer indication was always being given (it should only be given when the user has enabled it). (Michael Tuexen) - Sack Immediately was not being set properly on the actual chunk, it was only put in the rcvd_flags which is incorrect. (Michael Tuexen) - added an ifndef userspace to one of the already present macro's for inet (Brad Penoff) Obtained from: Michael Tuexen and Brad Penoff MFC after: 4 weeks	2008-10-18 15:54:25 +00:00
Randall Stewart	fcea7c2ed3	Reported by Yehuda Weinraub (yehudasa@gamil.com) - CRC32C algorithm uses incorrect init_bytes value. It SHOULD have the number of bytes to get to a 4 byte boundary. PR: 128134 MFC after: 4 weeks	2008-10-18 15:53:31 +00:00
Bjoern A. Zeeb	f08ef6c595	Add cr_canseeinpcb() doing checks using the cached socket credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock. Reviewed by: rwatson MFC after: 3 months (set timer; decide then)	2008-10-17 16:26:16 +00:00
Marko Zec	3ff0b2135b	Remove a useless global static variable. Approved by: bz (ad-hoc mentor)	2008-10-16 12:31:03 +00:00
Maxim Konovalov	0279bb29a0	o Remove unnecessary parentheses and restore identation. Prodded by: mlaier	2008-10-14 17:47:29 +00:00
Maxim Konovalov	8e6c0f8cfd	o Reformat ipfw nat get\|setsockopt code to look it more style(9) compliant. No functional changes.	2008-10-14 12:26:55 +00:00
Robert Watson	1f6ef666b5	Fix content and spelling of comment on _ipfw_insn.len -- a count of 32-bit words, not 32-byte words. MFC after: 3 days	2008-10-10 14:33:47 +00:00
Robert Watson	6c8286e42d	Don't pass curthread to sbreserve_locked() in tcp_do_segment(), as the netisr or ithread's socket buffer size limit is not the right limit to use. Instead, pass NULL as the other two calls to sbreserve_locked() in the TCP input path (tcp_mss()) do. In practice, this is a no-op, as ithreads and the netisr run without a process limit on socket buffer use, and a NULL thread pointer leads to not using the process's limit, if any. However, if tcp_input() is called in other contexts that do have limits, this may prevent the incorrect limit from being used. MFC after: 3 days	2008-10-07 09:41:07 +00:00
Bjoern A. Zeeb	c6ddb94cf2	Remove an INP_RUNLOCK() missed in SVN r183606, cvs rev. 1.195 raw_ip.c when transitioning from so_cred to inp_cred. MFC after: 6 weeks	2008-10-04 16:48:09 +00:00
Bjoern A. Zeeb	86d02c5c63	Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred	2008-10-04 15:06:34 +00:00
Bjoern A. Zeeb	0895aec30c	Implement IPv4 source address selection for unbound sockets. For the jail case we are already looping over the interface addresses before falling back to the only IP address of a jail in case of no match. This is in preparation for the upcoming multi-IPv4/v6/no-IP jail patch this change was developed with initially. This also changes the semantics of selecting the IP for processes within a jail as it now uses the same logic as outside the jail (with additional checks) but no longer is on a mutually exclusive code path. Benchmarks had shown no difference at 95.0% confidence for neither the plain nor the jail case (even with the additional overhead). See: http://lists.freebsd.org/pipermail/freebsd-net/2008-September/019531.html Inpsired by a patch from: Yahoo! (partially) Tested by: latest multi-IP jail patch users (implictly) Discussed with: rwatson (general things around this) Reviewed by: mostly silence (feedback from bms) Help with benchmarking from: kris MFC after: 2 months	2008-10-03 12:21:21 +00:00
Marko Zec	8b615593fc	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation	2008-10-02 15:37:58 +00:00
Robert Watson	c0a211c51f	Expand comments relating various detach/free/drop inpcb routines. MFC after: 3 days	2008-09-29 13:50:17 +00:00
Robert Watson	fc18af966f	Fix typo in comment. MFC after: 3 days	2008-09-29 13:48:48 +00:00
Robert Watson	47505890d6	When an inpcb doesn't have a socket but the inpcb is passed to ipfw in the transmit path, such as TCPS_TIMEWAIT, fail the credential extraction immediately rather than acquiring locks and looking up the inpcb on the global lists in order to reach the conclusion that the credential extraction has failed. This is more efficient, but more importantly, it avoids lock recursion on the inpcbinfo, which is no longer allowed with rwlocks. This appears to have been responsible for at least two reported panics. MFC after: 3 days Reported by: ganbold	2008-09-27 19:28:28 +00:00
Robert Watson	d83412e791	Rather than shadowing global variable 'lookup' in check_uidgid(), rename it to ugid_lookupp. This should make debugging issues with ipfw uid rules easier. MFC after: 3 days	2008-09-27 10:14:02 +00:00

... 3 4 5 6 7 ...

3646 Commits