freebsd-skq

Author	SHA1	Message	Date
Bjoern A. Zeeb	974524bf59	ip_gif_ttl/GIF_TTL are only used by the inet part in in_gif.c, so put the initialization under #ifdef INET.	2009-06-10 13:39:51 +00:00
Bjoern A. Zeeb	74900cdf7f	The llentry *lle is only used in cases of INET or INET6. Put the variable declaration under proper #ifdefs. In case variables are only needed for one of the two AFs more them into proper scope.	2009-06-10 09:07:05 +00:00
Kip Macy	3576e2f4a2	revert to opt-in flowtable	2009-06-09 21:55:28 +00:00
Oleg Bulyzhin	dda10d624c	Close long existed race with net.inet.ip.fw.one_pass = 0: If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc) it carries pointer to matching ipfw rule. If this packet then reinjected back to ipfw, ruleset processing starts from that rule. If rule was deleted meanwhile, due to existed race condition panic was possible (as well as other odd effects like parsing rules in 'reap list'). P.S. this commit changes ABI so userland ipfw related binaries should be recompiled. MFC after: 1 month Tested by: Mikolaj Golub	2009-06-09 21:27:11 +00:00
Kip Macy	15d13a59a3	make flowtable opt-out	2009-06-09 20:27:30 +00:00
Kip Macy	ee117d034c	move jenkins hash to its own header in libkern	2009-06-09 20:21:40 +00:00
Kip Macy	a913be0917	- add drbr routines for accessing #qentries and conditionally dequeueing - track bytes enqueued in buf_ring	2009-06-09 19:19:16 +00:00
Bjoern A. Zeeb	c2f16e371e	Remove one INET dependency by calling the general AF agnostic version for doing the routing lookup. Reviewed by: kmacy	2009-06-09 09:50:43 +00:00
Hiroki Sato	4cd5f57d6b	Style fix. Submitted by: bz	2009-06-09 08:09:30 +00:00
Hiroki Sato	fb70e72b0c	- Fix sanity check of GIFSOPTS ioctl. - Rename option mask s/GIF_FULLOPTS/GIF_OPTMASK/ Spotted by: Eygene Ryabinkin, delphij	2009-06-09 02:27:59 +00:00
Bjoern A. Zeeb	5c40c5e989	Remove two unneeded, hidden includes.	2009-06-08 20:04:46 +00:00
Bjoern A. Zeeb	8d8bc0182e	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.	2009-06-08 19:57:35 +00:00
Marko Zec	bc29160df3	Introduce an infrastructure for dismantling vnet instances. Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework. While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits. Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time. Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)	2009-06-08 17:15:40 +00:00
Hiroki Sato	dbe5926046	Fix and add a workaround on an issue of EtherIP packet with reversed version field sent via gif(4)+if_bridge(4). The EtherIP implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had an interoperability issue because it sent the incorrect EtherIP packets and discarded the correct ones. This change introduces the following two flags to gif(4): accept_rev_ethip_ver: accepts both correct EtherIP packets and ones with reversed version field, if enabled. If disabled, the gif accepts the correct packets only. This flag is enabled by default. send_rev_ethip_ver: sends EtherIP packets with reversed version field intentionally, if enabled. If disabled, the gif sends the correct packets only. This flag is disabled by default. These flags are stored in struct gif_softc and can be set by ifconfig(8) on per-interface basis. Note that this is an incompatible change of EtherIP with the older FreeBSD releases. If you need to interoperate older FreeBSD boxes and new versions after this commit, setting "send_rev_ethip_ver" is needed. Reviewed by: thompsa and rwatson Spotted by: Shunsuke SHINOMIYA PR: kern/125003 MFC after: 2 weeks	2009-06-07 23:00:40 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Luigi Rizzo	115a40c7bf	More cleanup in preparation of ipfw relocation (no actual code change): + move ipfw and dummynet hooks declarations to raw_ip.c (definitions in ip_var.h) same as for most other global variables. This removes some dependencies from ip_input.c; + remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly; + remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly; + move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it; To be merged together with rev 193497 MFC after: 5 days	2009-06-05 13:44:30 +00:00
Sam Leffler	c9dd371765	move ifq_detach from if_detach to if_free; this permits callers to reference if_snd in the period between detach+free which helps simplify detach code Reviewed by: jhb, rwatson	2009-06-02 18:53:21 +00:00
Robert Watson	d363c61766	Revert a recent netisr2 change: when billing packets to the current CPU, don't lock the workstream, as its mutexes may not have been initialized if there are fewer workstreams than CPUs. Run into by: hps, ps	2009-06-01 18:38:36 +00:00
Bjoern A. Zeeb	c2c2a7c11e	Convert the two dimensional array to be malloced and introduce an accessor function to get the correct rnh pointer back. Update netstat to get the correct pointer using kvm_read() as well. This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now. Reviewed by: julian, rwatson, zec X-MFC: not possible	2009-06-01 15:49:42 +00:00
Robert Watson	ed54411c19	Garbage collect NETISR_POLL and NETISR_POLLMORE, which are no longer required for options DEVICE_POLLING. De-fragment the NETISR_ constant space and lower NETISR_MAXPROT from 32 to 16 -- when sizing queue arrays using this compile-time constant, significant amounts of memory are saved. Warn on the console when tunable values for netisr are automatically adjusted during boot due to exceeding limits, invalid values, or as a result of DEVICE_POLLING.	2009-06-01 15:03:58 +00:00
Robert Watson	d4b5cae49b	Reimplement the netisr framework in order to support parallel netisr threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz	2009-06-01 10:41:38 +00:00
Marko Zec	feb08d06b9	Introduce an interm userland-kernel API for creating vnets and assigning ifnets from one vnet to another. Deletion of vnets is not yet supported. The interface is implemented as an ioctl extension so that no syscalls had to be introduced. This should be acceptable given that the new interface will be used for a short / interim period only, until the new jail management framwork gains the capability of managing vnets. This method for managing vimages / vnets has been in use for the past 7 years without any observable issues. The userland tool to be used in conjunction with the interim API can be found in p4: //depot/projects/vimage-commit2/src/usr.sbin/vimage/... and will most probably never get commited to svn. While here, bump copyright notices in kern_vimage.c and vimage.h to cover work done in year 2009. Approved by: julian (mentor) Discussed with: bz, rwatson	2009-05-31 12:10:04 +00:00
Attilio Rao	1abcdbd127	When user_frac in the polling subsystem is low it is going to busy the CPU for too long period than necessary. Additively, interfaces are kept polled (in the tick) even if no more packets are available. In order to avoid such situations a new generic mechanism can be implemented in proactive way, keeping track of the time spent on any packet and fragmenting the time for any tick, stopping the processing as soon as possible. In order to implement such mechanism, the polling handler needs to change, returning the number of packets processed. While the intended logic is not part of this patch, the polling KPI is broken by this commit, adding an int return value and the new flag IFCAP_POLLING_NOCOUNT (which will signal that the return value is meaningless for the installed handler and checking should be skipped). Bump __FreeBSD_version in order to signal such situation. Reviewed by: emaste Sponsored by: Sandvine Incorporated	2009-05-30 15:14:44 +00:00
Robert Watson	1a109c1cb0	Make the rmlock(9) interface a bit more like the rwlock(9) interface: - Add rm_init_flags() and accept extended options only for that variation. - Add a flags space specifically for rm_init_flags(), rather than borrowing the lock_init() flag space. - Define flag RM_RECURSE to use instead of LO_RECURSABLE. - Define flag RM_NOWITNESS to allow an rmlock to be exempt from WITNESS checking; this wasn't possible previously as rm_init() always passed LO_WITNESS when initializing an rmlock's struct lock. - Add RM_SYSINIT_FLAGS(). - Rename embedded mutex in rmlocks to make it more obvious what it is. - Update consumers. - Update man page.	2009-05-29 10:52:37 +00:00
Jamie Gritton	0304c73163	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
Sam Leffler	5ce8d9708c	rev bpf attach/detach event api to include the dlt	2009-05-25 16:34:35 +00:00
Marko Zec	37f17770e0	V_irtualize the if_clone framework, thus allowing for clonable ifnets to optionally have overlapping unit numbers if attached in different vnets. At this stage if_loop is the only clonable ifnet class that has been extended to allow for such overlapping allocation of unit numbers, i.e. in each vnet it is possible to have a lo0 interface. Other clonable ifnet classes remain to operate with traditional semantics, i.e. each instance of a clonable ifnet will be assigned a globally unique unit number, regardless in which vnet such an ifnet becomes instantiated. While here, garbage collect unused _lo_list field in struct vnet_net, as well as improve indentation for #defines in sys/net/vnet.h. The layout of struct vnet_net has changed, therefore bump __FreeBSD_version. This change has no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz, brooks Approved by: julian (mentor)	2009-05-23 21:43:44 +00:00
Marko Zec	67da1f3d8d	Set ifp->if_afdata_initialized to 0 while holding IF_AFDATA_LOCK on ifp, not after the lock has been released. Reviewed by: bz Discussed with: rwatson	2009-05-22 22:22:21 +00:00
Marko Zec	e0c14af9b3	Introduce the if_vmove() function, which will be used in the future for reassigning ifnets from one vnet to another. if_vmove() works by calling a restricted subset of actions normally executed by if_detach() on an ifnet in the current vnet, and then switches to the target vnet and executes an appropriate subset of if_attach() actions there. if_attach() and if_detach() have become wrapper functions around if_attach_internal() and if_detach_internal(), where the later variants have an additional argument, a flag indicating whether a full attach or detach sequence is to be executed, or only a restricted subset suitable for moving an ifnet from one vnet to another. Hence, if_vmove() will not call if_detach() and if_attach() directly, but will call the if_detach_internal() and if_attach_internal() variants instead, with the vmove flag set. While here, staticize ifnet_setbyindex() since it is not referenced from outside of sys/net/if.c. Also rename ifccnt field in struct vimage to ifcnt, and do some minor whitespace garbage collection where appropriate. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz, rwatson, brooks? Approved by: julian (mentor)	2009-05-22 22:09:00 +00:00
Qing Li	c9d763bf41	When an interface address is removed and the last prefix route is also being deleted, the link-layer address table (arp or nd6) will flush those L2 llinfo entries that match the removed prefix. Reviewed by: kmacy	2009-05-20 21:07:15 +00:00
Sam Leffler	b743c31009	add bpf_track eventhandler for monitoring bpf taps attached/detached Reviewed by: csjp	2009-05-18 17:18:40 +00:00
Robert Watson	a765f96051	Garbage collect unused NETISR_{ATM,NETGRAPH,PPP} netisr constants.	2009-05-18 10:33:23 +00:00
Robert Watson	2f120c90a7	Garbage collect now-unused NETISR_FORCEQUEUE, which overrode the global direct dispatch policy for specific protocols (NETISR_USB). We leave the additional 'flags' argument to netisr_register() for the time being, even though it is no longer required.	2009-05-13 17:22:33 +00:00
Robert Watson	270b609935	Remove now-unused NETISR_USB.	2009-05-13 17:17:05 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Marko Zec	5f416f8e84	Make indentation more uniform accross vnet container structs. This is a purely cosmetic / NOP change. Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output	2009-05-02 08:16:26 +00:00
Marko Zec	d7fcc52895	Unbreak options VIMAGE + nooptions INVARIANTS kernel builds. Submitted by: julian Approved by: julian (mentor)	2009-05-02 05:02:28 +00:00
Andrew Thompson	3f11aba75f	Reorder the bridge add and delete routines to avoid calling ifpromisc() with the bridge lock held.	2009-05-01 19:46:42 +00:00
Andrew Thompson	5c6026e91f	Use the flowid if its available for selecting the tx port.	2009-04-30 14:25:44 +00:00
Marko Zec	f6dfe47a14	Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build: 1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes: options VIMAGE: ((struct vnet_net ) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet 2) INIT_VNET_ macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace: INIT_VNET_NET(ifp->if_vnet); becomes struct vnet_net vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET]; 3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures. 4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required. 5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet. 6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_ family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing. Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted. Reviewed by: bz, rwatson Approved by: julian (mentor)	2009-04-30 13:36:26 +00:00
Kip Macy	6e3513f523	replace IFQ_ENQUEUE + if_start with if_transmit	2009-04-27 22:46:26 +00:00
Kip Macy	2635206d3a	replace IFQ_HANDOFF with if_transmit	2009-04-27 22:45:56 +00:00
Kip Macy	68bd638360	remove gratuitous memory barrier, a remnant of unified L2 / L3	2009-04-27 22:45:19 +00:00
Kip Macy	1bc27a1ce3	remove call to IFQ_HANDOFF is it called by if_transmit in the default case and doing so allows the ifnet driver to define its own queueing mechanism	2009-04-27 22:44:26 +00:00
Sam Leffler	5d3220403f	use if_transmit intead of direct frobbing of the if_snd q; this is no longer allowed Identified by: rwatson Reviewed by: kmacy	2009-04-27 22:06:49 +00:00
Marko Zec	093f25f8c8	In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace. Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)	2009-04-26 22:06:42 +00:00
Robert Watson	8bd015a1ca	As with ifnet_byindex_ref(), don't return IFF_DYING interfaces from ifunit_ref(). ifunit() continues to return them. MFC after: 3 weeks	2009-04-23 15:56:01 +00:00
Robert Watson	6064c5d362	Add ifunit_ref(), a version of ifunit(), that returns not just an interface pointer, but also a reference to it. Modify ifioctl() to use ifunit_ref(), holding the reference until all ioctls, etc, have completed. This closes a class of reader-writer races in which interfaces could be removed during long-running ioctls, leading to crashes. Many other consumers of ifunit() should now use ifunit_ref() to avoid similar races. MFC after: 3 weeks	2009-04-23 13:08:47 +00:00
Robert Watson	111c6b617b	During if_detach(), invoke if_dead() to set the ifnet's function pointers to "dead" implementations that no-op rather than invoking the device driver. This would generally be unexpected and possibly quite badly handled by most device drivers after if_detach() has completed. Reviewed by: bms MFC after: 3 weeks	2009-04-23 11:51:53 +00:00
Robert Watson	d6f157ea9a	Move portions of data structure initialization from if_attach() to if_alloc(), and portions of data structure destruction from if_detach() to if_free(). These changes leave more of the struct ifnet in a safe-to-access condition between alloc and attach, and between detach and free, and focus on attach/detach as stack usage events rather than data structure initialization. Affected fields include the linkstate task queue, if_afdata lock, address lists, kqueue state, and MAC labels. ifq_attach() ifq_detach() are not moved as ifq_attach() may use a queue length set by the device driver between if_alloc() and if_attach(). MFC after: 3 weeks	2009-04-23 10:59:40 +00:00

1 2 3 4 5 ...

2445 Commits