freebsd-dev

Author	SHA1	Message	Date
Robert Watson	09d547787f	In ARP input, more consistently acquire and release ifaddr references. MFC after: 6 weeks	2009-06-24 10:33:35 +00:00
Bjoern A. Zeeb	88d166bf19	Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory to save the selected source address rather than returning an unreferenced copy to a pointer that might long be gone by the time we use the pointer for anything meaningful. Asked for by: rwatson Reviewed by: rwatson	2009-06-23 22:08:55 +00:00
Robert Watson	8c0fec805f	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
Bjoern A. Zeeb	5736e6fb9d	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
Andre Oppermann	ef760e6ad2	Add soreceive_stream(), an optimized version of soreceive() for stream (TCP) sockets. It is functionally identical to generic soreceive() but has a number stream specific optimizations: o does only one sockbuf unlock/lock per receive independent of the length of data to be moved into the uio compared to soreceive() which unlocks/locks per mbuf. o uses m_mbuftouio() instead of its own copy(out) variant. o much more compact code flow as a large number of special cases is removed. o much improved reability. It offers significantly reduced CPU usage and lock contention when receiving fast TCP streams. Additional gains are obtained when the receiving application is using SO_RCVLOWAT to batch up some data before a read (and wakeup) is done. This function was written by "reverse engineering" and is not just a stripped down variant of soreceive(). It is not yet enabled by default on TCP sockets. Instead it is commented out in the protocol initialization in tcp_usrreq.c until more widespread testing has been done. Testers, especially with 10GigE gear, are welcome. MFP4: r164817 //depot/user/andre/soreceive_stream/	2009-06-22 23:08:05 +00:00
Marko Zec	fa057b15bd	V_irtualize flowtable state. This change should make options VIMAGE kernel builds usable again, to some extent at least. Note that the size of struct vnet_inet has changed, though in accordance with one-bump-per-day policy we didn't update the __FreeBSD_version number, given that it has already been touched by r194640 a few hours ago. Reviewed by: bz Approved by: julian (mentor)	2009-06-22 21:19:24 +00:00
Robert Watson	8896f83a58	Add a new function, ifa_ifwithaddr_check(), which rather than returning a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference. MFC after: 3 weeks	2009-06-22 10:59:34 +00:00
Bjoern A. Zeeb	173de0f9cc	Remove a hack from r186086 so that IPsec via loopback routes continued working. It was targeted for stable/7 compatibility and actually never did anything in HEAD. Reminded by: rwatson X-MFC after: never	2009-06-22 09:24:46 +00:00
Robert Watson	1099f828b3	Clean up common ifaddr management: - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks	2009-06-21 19:30:33 +00:00
Roman Divacky	e40bae9a45	Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang. Reviewed by: rwatson Approved by: ed (mentor)	2009-06-21 10:29:31 +00:00
Brooks Davis	838d985825	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
Bjoern A. Zeeb	7654a365db	Add the explicit include of vimage.h to another five .c files still missing it. Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.	2009-06-17 12:44:11 +00:00
Randall Stewart	d50c1d79d0	Changes to the NR-Sack code so that: 1) All bit disappears 2) The two sets of gaps (nr and non-nr) are disjointed, you don't have gaps struck in both places. This adjusts us to coorespond to the new draft. Still to-do, cleanup the code so that there are only one set of sack routines (original NR-Sack done by E cloned all sack code).	2009-06-17 12:34:56 +00:00
John Baldwin	6b0c5521b5	Trim extra sets of ()'s. Requested by: bde	2009-06-16 19:00:48 +00:00
John Baldwin	6dfb8b316c	Fix edge cases with ticks wrapping from INT_MAX to INT_MIN in the handling of the per-tcpcb t_badtrxtwin. Submitted by: bde	2009-06-16 19:00:12 +00:00
John Baldwin	9f78a87a06	- Change members of tcpcb that cache values of ticks from int to u_int: t_rcvtime, t_starttime, t_rtttime, t_bw_rtttime, ts_recent_age, t_badrxtwin. - Change t_recent in struct timewait from u_long to u_int32_t to match the type of the field it shadows from tcpcb: ts_recent. - Change t_starttime in struct timewait from u_long to u_int to match the t_starttime field in tcpcb. Requested by: bde (1, 3)	2009-06-16 18:58:50 +00:00
Jamie Gritton	9ed47d01eb	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
Oleg Bulyzhin	1917ef996d	Since dn_pipe.numbytes is int64_t now - remove unnecessary overflow detection code in ready_event_wfq().	2009-06-15 17:14:47 +00:00
Bjoern A. Zeeb	53be8fca00	Move the kernel option FLOWTABLE chacking from the header file to the actual implementation. Remove the accessor functions for the compiled out case, just returning "unavail" values. Remove the kernel conditional from the header file as it is no longer needed, only leaving the externs. Hide the improperly virtualized SYSCTL/TUNABLE for the flowtable size under the kernel option as well. Reviewed by: rwatson	2009-06-12 20:46:36 +00:00
VANHULLEBUS Yvan	7b495c4494	Added support for NAT-Traversal (RFC 3948) in IPsec stack. Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc... X-MFC: never Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ	2009-06-12 15:44:35 +00:00
John Baldwin	a13c655c64	Correct printf format type mismatches.	2009-06-11 14:37:18 +00:00
John Baldwin	1a0e7cfc42	Trim extra ()'s. Submitted by: bde	2009-06-11 14:36:13 +00:00
John Baldwin	0e8cc7e748	Change a few members of tcpcb that store cached copies of ticks to be ints instead of unsigned longs. This fixes a few overflow edge cases on 64-bit platforms. Specifically, if an idle connection receives a packet shortly before 2^31 clock ticks of uptime (about 25 days with hz=1000) and the keep alive timer fires after 2^31 clock ticks, the keep alive timer will think that the connection has been idle for a very long time and will immediately drop the connection instead of sending a keep alive probe. Reviewed by: silby, gnn, lstewart MFC after: 1 week	2009-06-10 18:27:15 +00:00
Warner Losh	f61c07e12d	These are no longer referenced in the tree, so can be safely removed. Reviewed by: bms@	2009-06-10 18:12:15 +00:00
Luigi Rizzo	6167e6c88f	in ip_dn_ctl(), do not allocate a large structure on the stack, and use malloc() instead if/when it is necessary. The problem is less relevant in previous versions because the variable involved (tmp_pipe) is much smaller there. Still worth fixing though. Submitted by: Marta Carbone (GSOC) MFC after: 3 days	2009-06-10 10:47:31 +00:00
Bjoern A. Zeeb	d93a13cb23	Remove the "The option TCPDEBUG requires option INET." requirement. In case of !INET we will not have a timestamp on the trace for now but that might only affect spx debugging as long as INET6 requires INET. Reviewed by: rwatson (earlier version)	2009-06-10 10:39:41 +00:00
Luigi Rizzo	1a5d0c2bf0	small simplifications to the code in charge of reaping deleted rules: - clear the head pointer immediately before using it, so there is no chance of mistakes; - call reap_rules() unconditionally. The function can handle a NULL argument just fine, and the cost of the extra call is hardly significant given that we do it rarely and outside the lock. MFC after: 3 days	2009-06-10 10:34:59 +00:00
Oleg Bulyzhin	dda10d624c	Close long existed race with net.inet.ip.fw.one_pass = 0: If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc) it carries pointer to matching ipfw rule. If this packet then reinjected back to ipfw, ruleset processing starts from that rule. If rule was deleted meanwhile, due to existed race condition panic was possible (as well as other odd effects like parsing rules in 'reap list'). P.S. this commit changes ABI so userland ipfw related binaries should be recompiled. MFC after: 1 month Tested by: Mikolaj Golub	2009-06-09 21:27:11 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Marko Zec	bc29160df3	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
Hiroki Sato	dbe5926046	Fix and add a workaround on an issue of EtherIP packet with reversed version field sent via gif(4)+if_bridge(4). The EtherIP implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had an interoperability issue because it sent the incorrect EtherIP packets and discarded the correct ones. This change introduces the following two flags to gif(4): accept_rev_ethip_ver: accepts both correct EtherIP packets and ones with reversed version field, if enabled. If disabled, the gif accepts the correct packets only. This flag is enabled by default. send_rev_ethip_ver: sends EtherIP packets with reversed version field intentionally, if enabled. If disabled, the gif sends the correct packets only. This flag is disabled by default. These flags are stored in struct gif_softc and can be set by ifconfig(8) on per-interface basis. Note that this is an incompatible change of EtherIP with the older FreeBSD releases. If you need to interoperate older FreeBSD boxes and new versions after this commit, setting "send_rev_ethip_ver" is needed. Reviewed by: thompsa and rwatson Spotted by: Shunsuke SHINOMIYA PR: kern/125003 MFC after: 2 weeks	2009-06-07 23:00:40 +00:00
Marko Zec	403f4aa059	Unbreak options VIMAGE build. Submitted by: julian (mentor) Approved by: julian (mentor)	2009-06-06 12:43:13 +00:00
Pawel Jakub Dawidek	42a3613305	Only four out of nine arguments for ip_ipsec_output() are actually used. Kill unused arguments except for 'ifp' as it might be used in the future for detecting IPsec-capable interfaces.	2009-06-05 23:53:17 +00:00
Luigi Rizzo	908e960ea6	move kernel ipfw-related sources to a separate directory, adjust conf/files and modules' Makefiles accordingly. No code or ABI changes so this and most of previous related changes can be easily MFC'ed MFC after: 5 days	2009-06-05 19:22:47 +00:00
Luigi Rizzo	b87ce5545b	Several ipfw options and actions use a 16-bit argument to indicate pipes, queues, tags, rule numbers and so on. These are all different namespaces, and the only thing they have in common is the fact they use a 16-bit slot to represent the argument. There is some confusion in the code, mostly for historical reasons, on how the values 0 and 65535 should be used. At the moment, 0 is forbidden almost everywhere, while 65535 is used to represent a 'tablearg' argument, i.e. the result of the most recent table() lookup. For now, try to use explicit constants for the min and max allowed values, and do not overload the default rule number for that. Also, make the MTAG_IPFW declaration only visible to the kernel. NOTE: I think the issue needs to be revisited before 8.0 is out: the 2^16 namespace limit for rule numbers and pipe/queue is annoying, and we can easily bump the limit to 2^32 which gives a lot more flexibility in partitioning the namespace. MFC after: 5 days	2009-06-05 16:16:07 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Robert Watson	88a9a9a61c	Unifdef MAC label pointer in syncache entries -- in general, ifdef'd structure contents are a bad idea in the kernel for binary compatibility reasons, and this is a single pointer that is now included in compiles by default anyway due to options MAC being in GENERIC.	2009-06-05 14:31:03 +00:00
Luigi Rizzo	115a40c7bf	More cleanup in preparation of ipfw relocation (no actual code change): + move ipfw and dummynet hooks declarations to raw_ip.c (definitions in ip_var.h) same as for most other global variables. This removes some dependencies from ip_input.c; + remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly; + remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly; + move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it; To be merged together with rev 193497 MFC after: 5 days	2009-06-05 13:44:30 +00:00
Luigi Rizzo	b4043122fa	Small changes (no actual code changes) in preparation of moving ipfw-related stuff to its own directory, and cleaning headers and dependencies: In this commit: + remove one use of a typedef; + document dn_rule_delete(); + replace one usage of the DUMMYNET_LOADED macro with its value; No MFC planned until the cleanup is complete.	2009-06-05 12:49:54 +00:00
Luigi Rizzo	c421791e77	fix a bug introduced in rev.190865 related to the signedness of the credit of a pipe. On passing, also use explicit signed/unsigned types for two other fields. Noticed by Oleg Bulyzhin and Maxim Ignatenko long ago, i forgot to commit the fix. Does not affect RELENG_7.	2009-06-04 12:27:57 +00:00
Robert Watson	3de4046939	Continue work to optimize performance of "options MAC" when no MAC policy modules are loaded by avoiding mbuf label lookups when policies aren't loaded, pushing further socket locking into MAC policy modules, and avoiding locking MAC ifnet locks when no policies are loaded: - Check mac_policies_count before looking for mbuf MAC label m_tags in MAC Framework entry points. We will still pay label lookup costs if MAC policies are present but don't require labels (typically a single mbuf header field read, but perhaps further indirection if IPSEC or other m_tag consumers are in use). - Further push socket locking for socket-related access control checks and events into MAC policies from the MAC Framework, so that sockets are only locked if a policy specifically requires a lock to protect a label. This resolves lock order issues during sonewconn() and also in local domain socket cross-connect where multiple socket locks could not be held at once for the purposes of propagatig MAC labels across multiple sockets. Eliminate mac_policy_count check in some entry points where it no longer avoids locking. - Add mac_policy_count checking in some entry points relating to network interfaces that otherwise lock a global MAC ifnet lock used to protect ifnet labels. Obtained from: TrustedBSD Project	2009-06-03 18:46:28 +00:00
Robert Watson	f93bfb23dc	Add internal 'mac_policy_count' counter to the MAC Framework, which is a count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project	2009-06-02 18:26:17 +00:00
John Baldwin	74fb0ba732	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem	2009-06-01 21:17:03 +00:00
Bjoern A. Zeeb	c2c2a7c11e	Convert the two dimensional array to be malloced and introduce an accessor function to get the correct rnh pointer back. Update netstat to get the correct pointer using kvm_read() as well. This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now. Reviewed by: julian, rwatson, zec X-MFC: not possible	2009-06-01 15:49:42 +00:00
Bruce M Simpson	b01c90a31a	Merge fixes from p4: * Tighten v1 query input processing. * Borrow changes from MLDv2 for how general queries are processed. * Do address field validation upfront before accepting input. * Do NOT switch protocol version if old querier present timer active. * Always clear IGMPv3 state in igmp_v3_cancel_link_timers(). * Update comments. Tested by: deeptech71 at gmail dot com	2009-06-01 15:30:18 +00:00
Robert Watson	d4b5cae49b	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
Pawel Jakub Dawidek	f44270e764	- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option as there is no more space for more SOL_SOCKET options, but this option also fits better as an IP socket option, it seems. - Implement this functionality also for IPv6 and RAW IP sockets. - Always compile it in (don't use additional kernel options). - Remove sysctl to turn this functionality on and off. - Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this functionality (currently only unjail root can use it). Discussed with: julian, adrian, jhb, rwatson, kmacy	2009-06-01 10:30:00 +00:00
Randall Stewart	a16ccdcead	Adds missing sysctl to manage the vtag_time_wait time. This will even allow disabling time-wait all together if you set the value to 0 (not advisable actually). The default remains the same i.e. 60 seconds.	2009-05-30 11:14:41 +00:00
Randall Stewart	bf1be57101	Fix a small memory leak from the nr-sack code - the mapping array was not being freed at term of association. Also get rid of the MICHAELS_EXP code.	2009-05-30 10:56:27 +00:00
Randall Stewart	667d2d3cbb	Make sctp_uio user to kernel structure match the socket-api draft. Two fields were uint32_t when they should have been uint16_t. Reported by Jonathan Leighton at U-del.	2009-05-30 10:50:40 +00:00
Zachary Loafman	81ad7eb017	Correct handling of SYN packets that are to the left of the current window of an ESTABLISHED connection. Reviewed by: net@, gnn Approved by: dfr (mentor)	2009-05-27 17:02:10 +00:00
Jamie Gritton	0304c73163	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
Edward Tomasz Napierala	efbad25934	Don't discard packets with 'Destination Unreachable' at the beginning of ip_forward(), if the IPSEC is compiled in. It is possible that there is an SPD that this packets will go through, even if there is no matching route. If not, ICMP will be sent anyway, after ip_output(). This is somewhat similar in purpose to r191621, except that one was for the packets sent from the host, while this one is for packets being forwarded by the host. Reviewed by: bz@ Sponsored by: Wheel Sp. z o.o. (http://www.wheel.pl)	2009-05-27 12:44:36 +00:00
John Baldwin	256d7a8a16	Correct the sense of a test so that this filter always waits for the full request to arrive. Previously it would end up returning as soon as the request length stored in the first two bytes had arrived. Reviewed by: dwmalone MFC after: 1 week	2009-05-26 20:00:30 +00:00
Robert Watson	cce6e15556	Remove comment about moving tcp_reass() to its own file named tcp_reass.c, that happened a while ago. MFC after: 3 days	2009-05-25 14:51:47 +00:00
Bjoern A. Zeeb	6d453b1001	For UDP with introducing the UDP control block, the uma zone had to be named "udp_inpcb" to avoid a naming conflict with tcp[1]. For consistency rename the uma zone for TCP from "inpcb" to "tcp_inpcb". Found by: rwatson [1] Discussed with: rwatson	2009-05-23 17:02:30 +00:00
Bjoern A. Zeeb	6a9148fe92	Implement UDP control block support. So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel tunneling callbacks. Move that into the udpcb and add a field for flags there to be used by upcoming changes instead of sticking udp only flags into in_pcb flags2. Bump __FreeBSD_version for ports to detect it and because of vnet* struct size changes. Submitted by: jhb (7.x version) Reviewed by: rwatson	2009-05-23 16:51:13 +00:00
Bjoern A. Zeeb	db2e47925e	Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL kernel option. This also permits tuning of the option per virtual network stack, as well as separately per inet, inet6. The kernel option is left for a transition period, marked deprecated, and will be removed soon. Initially requested by: phk (1 year 1 day ago) MFC after: 4 weeks	2009-05-23 16:42:38 +00:00
Bjoern A. Zeeb	f81a8a320c	If including vnet.h one has to include opt_route.h as well. This is because struct vnet_net holds the rt_tables[][] for MRT and array size is compile time dependent. If you had ROUTETABLES set to >1 after r192011 V_loif was pointing into nonsense leading to strange results or even panics for some people. Reviewed by: mz	2009-05-22 23:03:15 +00:00
Robert Watson	62e1ba833c	Consolidate and clean up the first section of ip_output.c in light of the last year or two's work on routing: - Combine iproute initialization and flowtable lookup blocks, eliminating unnecessary tests for known-zero'd iproute fields. - Add a comment indicating (a) why the route entry returned by the flowtable is considered stable and (b) that the flowtable lookup must occur after the setup of the mbuf flow ID. - Assert the inpcb lock before any use of inpcb fields. Reviewed by: kmacy	2009-05-21 09:45:47 +00:00
Qing Li	c9d763bf41	When an interface address is removed and the last prefix route is also being deleted, the link-layer address table (arp or nd6) will flush those L2 llinfo entries that match the removed prefix. Reviewed by: kmacy	2009-05-20 21:07:15 +00:00
Bjoern A. Zeeb	75ac4f3d32	Revert the logical change of r192341. net.inet.ip.fw.one_pass is a classic ip_input.c variable and is used in the pfil and bridge code as well. As ipfw is loadable we need to always provide it. That is the reason why it lives in struct vnet_inet and not in struct vnet_ipfw.	2009-05-18 22:34:44 +00:00
John Baldwin	10f5c8be92	- Fix typo in description of 'net.inet.ip.fw.autoinc_step'. - Use 'vnet_ipfw' instead of 'vnet_inet' for 'net.inet.ip.fw.one_pass'.	2009-05-18 21:46:46 +00:00
Bjoern A. Zeeb	1600c117d8	Unbreak options VIMAGE builds, in a followup to r192011 which did not introduce INIT_VNET_NET() initializers necessary for accessing V_loif. Submitted by: zec Reviewed by: julian	2009-05-17 20:53:10 +00:00
Robert Watson	6d888973c8	Staticize two functions not used outside of in_pcb.c: in_pcbremlists() and db_print_inpcb(). MFC after: 1 month	2009-05-14 20:59:36 +00:00
Qing Li	92fac99477	Ignore the INADDR_ANY address inserted/deleted by DHCP when installing a loopback route to the interface address.	2009-05-14 05:27:09 +00:00
Qing Li	ebc90701ac	This patch adds a host route to an interface address (that is assigned to a non loopback/ppp link types) through the loopback interface. Prior to the new L2/L3 rewrite, this host route is implicitly added by the L2 code during RTM_RESOLVE of that interface address. This host route is deleted when that interface is removed. Reviewed by: kmacy	2009-05-12 07:41:20 +00:00
Warner Losh	573a04c930	Remove bogus comment.	2009-05-09 18:50:01 +00:00
John Baldwin	5f17ebf94d	Convert IPFW_DEFAULT_TO_ACCEPT into a loader tunable 'net.inet.ip.fw.default_to_accept'. The current value can also be queried via a read-only sysctl of the same name. Requested by: plosher MFC after: 1 week	2009-05-09 05:07:36 +00:00
Marko Zec	2114e063f0	A NOP change: style / whitespace cleanup of the noise that slipped into r191816. Spotted by: bz Approved by: julian (mentor) (an earlier version of the diff)	2009-05-08 14:34:25 +00:00
Marko Zec	ddd50c3439	Remove a bogus check that unintentionally slipped in r191816. This change has no functional impact on nooptions VIMAGE builds. Submitted by: bz	2009-05-08 14:28:06 +00:00
Randall Stewart	096ed42dad	repository sync to multi-OS repo ... spaceing change	2009-05-07 16:43:49 +00:00
Randall Stewart	892f1c7141	ABI expansions to hopefully future-proof our MIB/netstat code for 8.0	2009-05-07 16:42:45 +00:00
Marko Zec	94e9f5a1c2	Remove unnecessary CURVNET_SET() calls where curvnet context is (i.e. seems to be) already set. This should reduce console noise due to curvnet recursion reports. This change has no impact on nooptions VIMAGE builds. Approved by: julian (mentor)	2009-05-06 13:30:46 +00:00
Marko Zec	743da3bcdb	Unbreak options VIMAGE kernel builds. Approved by: julian (mentor)	2009-05-06 08:49:39 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Marko Zec	5f416f8e84	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
Marko Zec	d7fcc52895	Unbreak options VIMAGE + nooptions INVARIANTS kernel builds. Submitted by: julian Approved by: julian (mentor)	2009-05-02 05:02:28 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Bruce M Simpson	33cde13046	Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.	2009-04-29 19:19:13 +00:00
Bruce M Simpson	9efc1a1bbf	Add MLDv2 prototypes and defines.	2009-04-29 10:20:17 +00:00
Bruce M Simpson	5cf93e5d2c	Use KTR_INET for MROUTING CTRs.	2009-04-29 10:17:08 +00:00
Bruce M Simpson	c566b47669	Cut over to KTR_INET for CTR. For clarity, put pointer incremement/size decrement on own line when copying out in-mode source filters to userland.	2009-04-29 10:14:16 +00:00
Bruce M Simpson	1096332a4a	Do not assume that ip6_moptions is always set, it is a lazy-allocated structure.	2009-04-29 10:13:22 +00:00
Bruce M Simpson	31a3e65dc2	Fix a problem whereby enqueued IGMPv3 filter list changes would be incorrectly output, if the RB-tree enumeration happened to reuse the same chain for a mode switch: that is, both ALLOW and BLOCK records were appended for the same group, in the same mbuf packet chain. This was introduced during an mbuf chain layout bug fix involving m_getptr(), which obviously cannot count from offset 0 on the second pass through the RB-tree when serializing the IGMPv3 group records into the pending mbuf chain. Cut over to KTR_INET for IGMPv3 CTR usage.	2009-04-29 10:12:01 +00:00
Edward Tomasz Napierala	1a4998162e	Don't require packet to match a route (any route; this information wasn't used anyway, so a typical workaround was to add a dummy route) if it's going to be sent through IPSec tunnel. Reviewed by: bz	2009-04-28 11:10:33 +00:00
Oleg Bulyzhin	a3a981b7f9	Optimize packet flow: if net.inet.ip.fw.one_pass != 0 and packet was processed by ipfw once - avoid second ipfw_chk() call. This saves us from unnecessary IPFW_RLOCK(), m_tag_find() calls and ip/tcp/udp header parsing. MFC after: 2 month	2009-04-27 17:37:36 +00:00
Marko Zec	093f25f8c8	In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)	2009-04-26 22:06:42 +00:00
Robert Watson	db091502fb	Acquire IF_ADDR_LOCK() around most iterations over ifp->if_addrhead (colloquially known as if_addrlist). Currently not acquired around interface address loops that call out to the routing code due to potential lock order issues. MFC after: 3 weeks	2009-04-26 19:05:40 +00:00
Robert Watson	588885f2f5	Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial lookup of 'ia' from if_addrhead through most use. Note that we currently have to drop it prematurely in some cases due to calls out to the routing and interface code while using 'ia', but this closes many races. Annotate several potential races that persist after this change. Move to using M_NOWAIT for allocating new interface addresses due to lock(s) being held. MFC after: 3 weeks	2009-04-25 23:02:57 +00:00
Robert Watson	07cde5e92c	In in_purgemaddrs(), remove the inm being freed from the address list before freeing it, rather than vice version, to avoid potential use after free. Reviewed by: bms	2009-04-24 22:11:53 +00:00
Robert Watson	cf7b18f15e	Relocate permissions checking code in in_control() to before the body of the implementation of ioctls. This makes the mapping of ioctls to specific privileges more explicit, and also simplifies the implementation by reducing the use of FALLTHROUGH handling in switch. While this is not intended to be a functional change, it does mean that certain privilege checks are now performed earlier, so EPERM might be returned in preference to EADDRNOTAVAIL for management ioctls that could have failed for both reasons. MFC after: 3 weeks	2009-04-24 09:54:46 +00:00
Robert Watson	bbb3fb6194	Reorganize in_control() so that invariants are more obvious, and so that it is easier to lock: - Handle the unsupported ioctl case at the beginning of in_control(), handing off to ifp->if_ioctl, rather than looking up interfaces and addresses unnecessarily in this case. - Make it an invariant that ifp is always non-NULL when running in_control()-implemented ioctls, simplifying the code structure. MFC after: 3 weeks	2009-04-23 21:41:37 +00:00
Bruce M Simpson	86979280fc	Bracket struct mfc and struct rtdetq with #ifdef _KERNEL. Match the bracketing in netstat. Since the cleanup of MROUTING, ports have broken because they expect to include <netinet/ip_mroute.h> without including <sys/queue.h>. Fix breakage at source. The real fix, of course, is to fix the MROUTING APIs by blowing them away and replacing them with something else...	2009-04-21 12:47:09 +00:00
Bruce M Simpson	5def3edcad	remove IFF_ASSERTGIANT	2009-04-21 09:43:51 +00:00
Robert Watson	4ed6f8c1f1	Prefer actual field names (if_addrhead, ifa_link) to macros aliasing those field names in FreeBSD code. MFC after: 2 weeks	2009-04-20 22:40:44 +00:00
Robert Watson	0aade26e6d	In ip_input(), cache the received mbuf's network interface in a local variable. Acquire the interface address list lock when iterating over the interface address list searching for a matching received broadcast address. MFC after: 2 weeks	2009-04-20 14:35:42 +00:00
Robert Watson	33c4f96d88	In icmp_reflect(), acquire the inteface address list lock when searching for a source address to use. MFC after: 2 weeks Reviewed by: bz	2009-04-20 13:45:39 +00:00
Robert Watson	072b8f8ea7	Lock the interface address list when searching for a matching interface by address, or when implementing 'me' rules on IPv6. Prefer the field name if_addrhead to the macro if_addrlist. MFC after: 2 weeks	2009-04-19 22:34:35 +00:00
Robert Watson	b132600ab2	In divert_packet(), lock the interface address list before iterating over it in search of an address. MFC after: 2 weeks	2009-04-19 22:29:16 +00:00
Robert Watson	9317b04e46	Lock interface address lists in in_pcbladdr() when searching for a source address for a connection and there's no route or now interface for the route. MFC after: 2 weeks	2009-04-19 22:25:09 +00:00
Robert Watson	8021456a24	Protect against some writer-writer races in in_control() by acquiring the interface address list lock around interface address list modifications. More to do here. MFC after: 2 weeks	2009-04-19 22:16:19 +00:00
Bruce M Simpson	b5fbc0b98f	Now that IFF_NEEDSGIANT has been removed from the network stack, catch up with this in IGMPv3 and remove dead code. This has the side-effect of not being back-portable to RELENG_7 w/o further changes.	2009-04-19 08:14:21 +00:00
Kip Macy	65111ec7aa	- Allocate a small flowtable in ip_input.c (changeable by tuneable) - Use for accelerating ip_output	2009-04-19 04:44:05 +00:00
Kip Macy	ab25fa3558	s/void/void */	2009-04-16 23:02:56 +00:00
Kip Macy	114f15c686	restore spare pointers for MFCing	2009-04-16 22:47:43 +00:00
Kip Macy	279aa3d419	Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson	2009-04-16 20:30:28 +00:00
Kip Macy	8b12a7c2a6	- convert pspare pointers in inpcb to an llentry and rtentry cache - add flags to indicate their validity	2009-04-15 22:22:00 +00:00
Kip Macy	773b573a96	- add second flags field to to inpcb - update comments in vflag	2009-04-15 22:09:42 +00:00
Kip Macy	82c33e73f2	provide additional convenience macros for inpcb locking (upgrade, downgrade, exclusive)	2009-04-15 21:39:56 +00:00
Kip Macy	582b6122ab	make LLTABLE visible to netinet	2009-04-15 20:49:59 +00:00
Kip Macy	de4ab55e43	add an llentry to struct route{_in6} to allow it to be passed around with the rtentry	2009-04-15 20:34:19 +00:00
Randall Stewart	e261340ef7	Add missing address lock when we look at the ifa list	2009-04-14 19:20:27 +00:00
Randall Stewart	544e35bd97	Move the flight size reduction to right after we recognize its a retransmit, ahead of the PR-SCTP work. Without this fix, we end up NOT reducing flight size and causing an miscalculation when PR-SCTP is active and data is skipped. Obtained from: Michael Tuexen.	2009-04-14 07:50:29 +00:00
Robert Watson	de231a063a	Put TCPSTAT_ADD() and TCPSTAT_INC() behind _KERNEL. MFC after: 3 days	2009-04-12 21:28:35 +00:00
Robert Watson	6bf65bcf3a	Update stats in struct carpstats using two new macros: CARPSTATS_ADD() and CARPSTATS_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure. MFC after: 3 days	2009-04-12 14:19:37 +00:00
Robert Watson	07cf7ab29c	Update stats in struct pimstat using two new macros: PIMSTAT_ADD() and PIMSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure. MFC after: 3 days	2009-04-12 14:06:26 +00:00
Robert Watson	fb83a36856	Update stats in struct mrtstat using two new macros: MRTSTAT_ADD() and MRTSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure. MFC after: 3 days	2009-04-12 14:00:36 +00:00
Robert Watson	bd88cce2ed	Update stats in struct igmpstat using two new macros: IGMPSTAT_ADD() and IGMPSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-12 13:41:13 +00:00
Robert Watson	e27b0c8775	Update stats in struct icmpstat and icmp6stat using four new macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and ICMP6STAT_INC(), rather than directly manipulating the fields of these structures across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. In on case, icmp6stat members are manipulated indirectly, by icmp6_errcount(), and this will require further work to fix for per-CPU stats. MFC after: 3 days	2009-04-12 13:22:33 +00:00
Robert Watson	026decb8f3	Update stats in struct udpstat using two new macros, UDPSTAT_ADD() and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-12 11:42:40 +00:00
Robert Watson	86425c62a0	Update stats in struct ipstat using four new macros, IPSTAT_ADD(), IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-11 23:35:20 +00:00
Robert Watson	78b5071407	Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days	2009-04-11 22:07:19 +00:00
Paolo Pisati	50d25dda1b	What's the point of adjusting a checksum if we are going to toss the packet? Anticipate the check/return code.	2009-04-11 15:26:31 +00:00
Paolo Pisati	ea80b0ac03	Plug two bugs introduced with modules conversion: -UdpAliasIn(): correctly check return code after modules ran. -alias_nbt: in case of malformed packets (or some other unrecoverable error), toss the packet.	2009-04-11 15:19:09 +00:00
Paolo Pisati	1cd68a24c7	Remove stale comments.	2009-04-11 15:05:19 +00:00
Marko Zec	bfe1aba468	Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement. With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order. The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time. The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework. For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet. A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided. Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality. The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option. Reviewed by: bz Approved by: julian (mentor)	2009-04-11 05:58:58 +00:00
Kip Macy	80cb9f211a	Import "flowid" support for serializing flows across transmit queues Reviewed by: rwatson and jeli	2009-04-10 06:16:14 +00:00
Luigi Rizzo	4bb7ae9deb	Add emulation of delay profiles, which lets you model various types of MAC overheads such as preambles, link level retransmissions and more. Note- this commit changes the userland/kernel ABI for pipes (but not for ordinary firewall rules) so you need to rebuild kernel and /sbin/ipfw to use dummynet features. Please check the manpage for details on the new feature. The MFC would be trivial but it breaks the ABI, so it will be postponed until after 7.2 is released. Interested users are welcome to apply the patch manually to their RELENG_7 tree. Work supported by the European Commission, Projects Onelab and Onelab2 (contract 224263).	2009-04-09 12:46:00 +00:00
Randall Stewart	abe15ad66c	Fix a FR bug. When doing PR-SCTP with number rtx set to a low number. The check for skipping was in the incorrect place. Which meant we would FR chunks we should not. MFC after: 1 Month	2009-04-08 12:52:05 +00:00
Randall Stewart	e29d4aa6bd	Add more padding and a new variable. This will help us be able to keep ABI compatibility between 8 and 9. MFC after: Never	2009-04-08 12:49:36 +00:00
Paolo Pisati	43197d291a	-don't pass down, to module's fingerprint function, unused data like a pointer to the ip header. -style -spacing	2009-04-08 11:56:49 +00:00
Bjoern A. Zeeb	970caf60dd	With the right comparison we get a proper wscale value and thus more adequate TCP performance with IPv6. Changes for IPv4, r166403 and r172795, both ignored the IPv6 counterpart and left it in the state of art of year 2000. The same logic in syncache already shares code between v4 and v6 so things do not need to be adapted there. Reported by: Steinar Haug (sthaug nethelp.no) Tested by: Steinar Haug (sthaug nethelp.no) MFC after: 3 days	2009-04-07 14:42:40 +00:00
Marko Zec	1ed81b739e	First pass at separating per-vnet initializer functions from existing functions for initializing global state. At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change. Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true). While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered. Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net. Approved by: julian (mentor)	2009-04-06 22:29:41 +00:00
Alexander Kabaev	024a4bd626	If KTR_SUBSYS is compiled in, it does not necessarily mean that user is interested in being spammed by mcast-related printfs. Use proper check against ktr_mask instead KTR_COMPILE.	2009-04-05 23:25:06 +00:00
Bruce M Simpson	448895b7fc	Fix mbuf chain layout pessimization: in the case where a single mbuf is allocated due to m_getcl() returning NULL, we already call MH_ALIGN, so do not increment m->m_data in this case. Found during MLDv2 port.	2009-04-04 15:32:23 +00:00
Bruce M Simpson	0fd99912de	Do not obliterate QQI with MAXRESP. Found during MLDv2 port.	2009-04-04 15:26:32 +00:00
Randall Stewart	8933fa13b6	Many bug fixes (from the IETF hack-fest): - PR-SCTP had major issues when skipping through a multi-part message. o Did not look at socket buffer. o Did not properly handle the reassmebly queue. o The MARKED segments could interfere and un-skip a chunk causing a problem with the proper FWD-TSN. o No FR of FWD-TSN's was being done. - NR-Sack code was basically disabled. It needed fixes that never got into the real code. - CMT code had issues when the two paths were NOT the same b/w. We found a few small bugs, but also the critcal one here was not dividing the rwnd amongst the paths. Obtained from: Michael Tuexen and myself at the IETF hack-fest ;-)	2009-04-04 11:43:32 +00:00
Paolo Pisati	eb2e411915	Implement an ipfw action to reassemble ip packets: reass.	2009-04-01 20:23:47 +00:00
Bruce M Simpson	5b35d05538	Don't call m_freem() after ip_output(), as it always consumes the mbuf chain provided to it. Found by: Pierre Guinoiseau	2009-03-24 01:22:12 +00:00
Juli Mallett	34f27ade44	Remove local in6_addr variables for local and foreign addresses in sysctl_drop, they were passed uninitialized to in6_pcblookup_hash. Instead, do as is done for IPv4 and use the addresses within the sockaddr structure, which are correctly populated. This fixes tcpdrop(8) for IPv6 address pairs. Reviewed by: bz	2009-03-22 00:45:47 +00:00
Bruce M Simpson	545dff6fd1	Fix brainos introduced during mechanical KTR change. Pointy hat to: bms	2009-03-20 13:13:50 +00:00
Bruce M Simpson	98b59af731	Cleanup: Nuke debug.mrtdebug, and replace it with KTR.	2009-03-19 14:14:21 +00:00
Bruce M Simpson	443fc3176d	Introduce a number of changes to the MROUTING code. This is purely a forwarding plane cleanup; no control plane code is involved. Summary: * Split IPv4 and IPv6 MROUTING support. The static compile-time kernel option remains the same, however, the modules may now be built for IPv4 and IPv6 separately as ip_mroute_mod and ip6_mroute_mod. * Clean up the IPv4 multicast forwarding code to use BSD queue and hash table constructs. Don't build our own timer abstractions when ratecheck() and timevalclear() etc will do. * Expose the multicast forwarding cache (MFC) and virtual interface table (VIF) as sysctls, to reduce netstat's dependence on libkvm for this information for running kernels. * bandwidth meters however still require libkvm. * Make the MFC hash table size a boot/load-time tunable ULONG, net.inet.ip.mfchashsize (defaults to 256). * Remove unused members from struct vif and struct mfc. * Kill RSVP support, as no current RSVP implementation uses it. These stubs could be moved to raw_ip.c. * Don't share locks or initialization between IPv4 and IPv6. * Don't use a static struct route_in6 in ip6_mroute.c. The v6 code is still using a cached struct route_in6, this is moved to mif6 for the time being. * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c. v4 path tested using ports/net/mcast-tools. v6 changes are mostly mechanical locking and have not been tested. As these changes partially break some kernel ABIs, they will not be MFCed. There is a lot more work to be done here. Reviewed by: Pavlin Radoslavov	2009-03-19 01:43:03 +00:00
Bruce M Simpson	1975dc405a	Comment IGMP_PIM as being very historic, as in, don't use.	2009-03-19 01:15:26 +00:00
Bruce M Simpson	56663a40eb	Deal with the case where ifma_protospec may be NULL, during any IPv4 multicast operations which reference it. There is a potential race because ifma_protospec is set to NULL when we discover the underlying ifnet has gone away. This write is not covered by the IF_ADDR_LOCK, and it's difficult to widen its scope without making it a recursive lock. It isn't clear why this manifests more quickly with 802.11 interfaces, but does not seem to manifest at all with wired interfaces. With this change, the 802.11 related panics reported by sam@ and cokane@ should go away. It is not the right fix, that requires more thought before 8.0. Idea from: sam Tested by: cokane	2009-03-17 14:41:54 +00:00
Robert Watson	e5adda3d51	Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines. Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers: if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd Drivers that were already disabled because of tty changes: if_ppp if_sl Discussed on: arch@	2009-03-15 14:21:05 +00:00
Robert Watson	ad71fe3c35	Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers): INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account. MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz	2009-03-15 09:58:31 +00:00
Randall Stewart	49633f4b36	Opps.. I missed a file on the commit :-)	2009-03-14 23:13:16 +00:00

1 2 3 4 5 ...

3603 Commits