freebsd-dev

Author	SHA1	Message	Date
Robert Watson	938448cd87	Changes to support crashdump analysis of netisr: - Rename the netisr protocol registration array, 'np' to 'netisr_proto', in order to reduce the chances of symbol name collisions. It remains statically defined, but it will be looked up by netstat(1). - Move certain internal structure definitions from netisr.c to netisr_internal.h so that netstat(1) can find them. They remain private, and should not be used for any other purpose (for example, they should not be used by kernel modules, which must instead use the public interfaces in netisr.h). - Store a kernel-compiled version of NETISR_MAXPROT in the global variable netisr_maxprot, and export via a sysctl, so that it is available for use by netstat(1). This is especially important for crashdump interpretation, where the size of the workstream structure is determined by the maximum number of protocols compiled into the kernel. MFC after: 1 week Sponsored by: Juniper Networks	2010-03-01 00:42:36 +00:00
Robert Watson	7f450feb07	Fix edge cases in several KASSERTs: use <= rather than < when testing that counters have not gone about MAXCPU or NETISR_MAXPROT. These problems caused panics on UP kernels with INVARIANTS when using sysctl -a, but would also have caused problems for 32-core boxes or if the netisr protocol vector was fully populated. Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com> MFC after: 4 days	2010-02-25 09:51:14 +00:00
Robert Watson	2d22f334ea	Export netisr configuration and statistics to userspace via sysctl(9). MFC after: 1 week Sponsored by: Juniper Networks	2010-02-22 15:03:16 +00:00
Pawel Jakub Dawidek	784949026c	Mark various sysctls also as tunables. Reviewed by: rwatson MFC after: 1 week	2010-02-15 09:19:07 +00:00
Robert Watson	912f6323cd	When warning about possible netisr configuration problems during boot, report using "netisr_init" rather than "netisr2", which was the development name for the project. MFC after: 3 days	2009-12-23 12:33:59 +00:00
Robert Watson	0a32e29f59	Refine netisr.c comments a bit.	2009-12-23 12:31:27 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Bjoern A. Zeeb	ba3b25b35a	In case we cannot queue a packet reaching the queue limit, retain the semantics netisr_queue() always had and free the mbuf along with returning the error. Reviewed by: rwatson Approved by: re (kensmith)	2009-06-30 05:21:00 +00:00
Robert Watson	9e6e01ebf6	In light of DPCPU use by netisr, revise various for loops from using MAXCPU to mp_maxid, and handling and reporting of requests to use more threads than we have CPUs to run them on. Reviewed by: bz Approved by: re (kib) MFC after: 6 weeks	2009-06-26 20:39:36 +00:00
Robert Watson	534027673b	Convert netisr to use dynamic per-CPU storage (DPCPU) instead of sizing arrays to [MAXCPU], offering moderate memory savings. In some places, this requires using CPU_ABSENT() to handle less common platforms with sparse CPU IDs. In several places, assert that the selected CPUID for work placement or statistics is not CPU_ABSENT() to be on the safe side. Discussed with: bz, jeff	2009-06-26 00:19:25 +00:00
Bjoern A. Zeeb	ed655c8c07	Add an optional callback function that will be invoked when a per-CPU queue was drained. It will never fire for a directly dispatched packet. You will most likely never want to use this for any ordinary netisr usage and you will never blame netisr in case you try to use it and it does not work as expected. Reviewed by: rwatson	2009-06-14 17:15:18 +00:00
Robert Watson	d363c61766	Revert a recent netisr2 change: when billing packets to the current CPU, don't lock the workstream, as its mutexes may not have been initialized if there are fewer workstreams than CPUs. Run into by: hps, ps	2009-06-01 18:38:36 +00:00
Robert Watson	ed54411c19	Garbage collect NETISR_POLL and NETISR_POLLMORE, which are no longer required for options DEVICE_POLLING. De-fragment the NETISR_ constant space and lower NETISR_MAXPROT from 32 to 16 -- when sizing queue arrays using this compile-time constant, significant amounts of memory are saved. Warn on the console when tunable values for netisr are automatically adjusted during boot due to exceeding limits, invalid values, or as a result of DEVICE_POLLING.	2009-06-01 15:03:58 +00:00
Robert Watson	d4b5cae49b	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
Robert Watson	2f120c90a7	Garbage collect now-unused NETISR_FORCEQUEUE, which overrode the global direct dispatch policy for specific protocols (NETISR_USB). We leave the additional 'flags' argument to netisr_register() for the time being, even though it is no longer required.	2009-05-13 17:22:33 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Robert Watson	59dd72d040	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.	2008-07-04 00:21:38 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
Robert Watson	33d2bb9ca3	First in a series of changes to remove the now-unused Giant compatibility framework for non-MPSAFE network protocols: - Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE. Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits. Reviewed by: bz, jhb Approved by: re (kensmith)	2007-07-27 11:59:57 +00:00
Robert Watson	1f87450e8b	Change net.isr.direct from defaulting to 0 to 1 in 7-CURRENT. This enables direct dispatch of the network stack from the device driver ithread, enabling input path parallelism by default when multiple interfaces are present. The strategy for network stack parallelism is something being actively discussed, and this is just one of several possible (and perfectly reasonable) strategies, but has the distinct advantage of reducing the number of context switches and preemptions significantly, resulting in higher efficiency in many cases. In some caes, this may reduce network stack parallelism due to work not being deferred from the ithread to the netisr. Therefore, the strategy may change in the future, but this offers a reasonable first pass and enabling parallelism while maintaining strong ordering. Hopefully this will trigger lots of nice new bugs. This change is not intended for MFC.	2006-11-28 11:19:36 +00:00
Gleb Smirnoff	f0796cd26c	- Don't pollute opt_global.h with DEVICE_POLLING and introduce opt_device_polling.h - Include opt_device_polling.h into appropriate files. - Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that can be compiled as loadable modules. Reviewed by: bde	2005-10-05 10:09:17 +00:00
Robert Watson	cea2165b10	Rename net.isr.enable to net.isr.dispatch. No compatibility code is provided, as this will be the production name as of 6.0. MFC after: 3 days Requested by: scottl	2005-10-04 07:59:28 +00:00
Andre Oppermann	de10fe70e1	Correctly unregister a netisr by clearing the ni->ni_queue field to NULL as well. This field is actually used by various netisr functions to determine the availablility of the specified netisr. This uncomplete unregister leads directly to a crash when the KLD unregistering the netisr is unloaded. Submitted by: Sam <sah@softcardsystems.com> MFC after: 3 days	2004-10-11 20:01:43 +00:00
Robert Watson	ccaae37ab1	Correct a comment typo: s/Note/Not/. Pointed out by: kensmith	2004-09-03 01:37:02 +00:00
Robert Watson	ace437c3c6	Correct typo in printf() warning. Submitted by: Pawel Worach <pawel.worach at telia.com>	2004-08-28 19:27:25 +00:00
Robert Watson	1d8cd39e71	Change the default disposition of debug.mpsafenet from 0 to 1, which will cause the network stack to operate without the Giant lock by default. This change has the potential to improve performance by increasing parallelism and decreasing latency in network processing. Due to the potential exposure of existing or new bugs, the following compatibility functionality is maintained: - It is still possible to disable Giant-free operation by setting debug.mpsafenet to 0 in loader.conf. - Add "options NET_WITH_GIANT", which will restore the default value of debug.mpsafenet to 0, and is intended for use on systems compiled with known unsafe components, or where a more conservative configuration is desired. - Add a new declaration, NET_NEEDS_GIANT("componentname"), which permits kernel components to declare dependence on Giant over the network stack. If the declaration is made by a preloaded module or a compiled in component, the disposition of debug.mpsafenet will be set to 0 and a warning concerning performance degraded operation printed to the console. If it is declared by a loadable kernel module after boot, a warning is displayed but the disposition cannot be changed. This is implemented by defining a new SYSINIT() value, SI_SUB_SETTINGS, which is intended for the processing of configuration choices after tunables are read in and the console is available to generate errors, but before much else gets going. This compatibility behavior will go away when we've finished the last of the locking work and are confident that operation is correct.	2004-08-28 15:11:13 +00:00
Andre Oppermann	3161f583ca	Apply error and success logic consistently to the function netisr_queue() and its users. netisr_queue() now returns (0) on success and ERRNO on failure. At the moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full) are supported. Previously it would return (1) on success but the return value of IF_HANDOFF() was interpreted wrongly and (0) was actually returned on success. Due to this schednetisr() was never called to kick the scheduling of the isr. However this was masked by other normal packets coming through netisr_dispatch() causing the dequeueing of waiting packets. PR: kern/70988 Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 3 days	2004-08-27 18:33:08 +00:00
Robert Watson	08f85b089e	Comment clarifying debug_mpsafenet.	2004-07-18 21:50:22 +00:00
Sam Leffler	7902224c6b	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation	2003-11-08 22:28:40 +00:00
Sam Leffler	d3be1471c7	o make debug_mpsafenet globally visible o move it from subr_bus.c to netisr.c where it more properly belongs o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to grab Giant as needed when MPSAFE operation is enabled Supported by: FreeBSD Foundation	2003-11-05 23:42:51 +00:00
Robert Watson	5fd04e380f	When direct dispatching an netisr (net.isr.enable=1), if there are already any queued packets for the isr, process those packets before the newly submitted packet, maintaining ordering of all packets being delivered to the netisr. Remove the bypass counter since we don't bypass anymore. Leave the comment about possible problems and options since later performance optimization may change the strategy for addressing ordering problems here. Specifically, this maintains the strong isr ordering guarantee; additional parallelism and lower latency may be possible by moving to weaker guarantees (per-interface, for example). We will probably at some point also want to remove the one instance netisr dispatch limit currently enforced by a mutex, but it's not clear that's 100% safe yet, even in the netperf branch. Reviewed by: sam, others	2003-10-03 18:27:24 +00:00
Robert Watson	e590eca2ad	Create a tunable for net.isr.enable so that it may be set from inception, rather than having to wait for the boot to finish.	2003-10-02 02:54:10 +00:00
Robert Watson	3164565d39	Temporarily turn net.isr.enable back off again until patches to correct potential nits in packet ordering are resolved.	2003-10-01 22:15:16 +00:00
Robert Watson	19288f738a	Enable net.isr.enable by default, causing "delivery to completion" (direct dispatch) in interrupt threads when the netisr in question isn't already active. If a netisr is already active, or direct dispatch is already in progress, we queue the packet for later delivery. Previously, this option was disabled by default. I have measured 20%+ performance improvements in IP packet forwarding with this enabled. Please report any problems ASAP, especially relating to stack depth or out-of-order packet processing. Discussed with: jlemon, peter Sponsored by: DARPA, Network Associates Laboratories	2003-10-01 21:31:09 +00:00
Jonathan Lemon	fb68148f4a	Discard the packet if the netisr queue is null instead of panicing, for the benefit of modules which are compiled differently than the kernel.	2003-03-08 22:12:32 +00:00
Jonathan Lemon	1cafed3941	Update netisr handling; Each SWI now registers its queue, and all queue drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off. Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs	2003-03-04 23:19:55 +00:00
Jake Burkholder	e3b6e33c07	Moved netisr code from kern/kern_intr.c to net/netisr.c as threatened in a comment.	2002-09-22 05:56:41 +00:00

38 Commits