freebsd-dev

Author	SHA1	Message	Date
Nathan Whitehorn	6415e9aafb	Add variable declaration missing in r302372. Submitted by: andrew Approved by: re (gjb, kib)	2016-07-06 17:46:49 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Sepherosa Ziehau	36ad8372d4	net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties Reviewed by: hps, erj, tuexen Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6688	2016-06-07 04:51:50 +00:00
George V. Neville-Neil	6d76822688	This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate. Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262	2016-06-02 17:51:29 +00:00
Alan Somers	8907f744ff	Improve performance and functionality of the bitstring(3) api Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow for efficient searching of set or cleared bits starting from any bit offset within the bit string. Performance is improved by operating on longs instead of bytes and using ffsl() for searches within a long. ffsl() is a compiler builtin in both clang and gcc for most architectures, converting what was a brute force while loop search into a couple of instructions. All of the bitstring(3) API continues to be contained in the header file. Some of the functions are large enough that perhaps they should be uninlined and moved to a library, but that is beyond the scope of this commit. sys/sys/bitstring.h: Convert the majority of the existing bit string implementation from macros to inline functions. Properly protect the implementation from inadvertant macro expansion when included in a user's program by prefixing all private macros/functions and local variables with '_'. Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and bit_ffc() in terms of their "at" counterparts. Provide a kernel implementation of bit_alloc(), making the full API usable in the kernel. Improve code documenation. share/man/man3/bitstring.3: Add pre-exisiting API bit_ffc() to the synopsis. Document new APIs. Document the initialization state of the bit strings allocated/declared by bit_alloc() and bit_decl(). Correct documentation for bitstr_size(). The original code comments indicate the size is in bytes, not "elements of bitstr_t". The new implementation follows this lead. Only hastd assumed "elements" rather than bytes and it has been corrected. etc/mtree/BSD.tests.dist: tests/sys/Makefile: tests/sys/sys/Makefile: tests/sys/sys/bitstring.c: Add tests for all existing and new functionality. include/bitstring.h Include all headers needed by sys/bitstring.h lib/libbluetooth/bluetooth.h: usr.sbin/bluetooth/hccontrol/le.c: Include bitstring.h instead of sys/bitstring.h. sbin/hastd/activemap.c: Correct usage of bitstr_size(). sys/dev/xen/blkback/blkback.c Use new bit_alloc. sys/kern/subr_unit.c: Remove hard-coded assumption that sizeof(bitstr_t) is 1. Get rid of unrb.busy, which caches the number of bits set in unrb.map. When INVARIANTS are disabled, nothing needs to know that information. callapse_unr can be adapted to use bit_ffs and bit_ffc instead. Eliminating unrb.busy saves memory, simplifies the code, and provides a slight speedup when INVARIANTS are disabled. sys/net/flowtable.c: Use the new kernel implementation of bit-alloc, instead of hacking the old libc-dependent macro. sys/sys/param.h Update __FreeBSD_version to indicate availability of new API Submitted by: gibbs, asomers Reviewed by: gibbs, ngie MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D6004	2016-05-04 22:34:11 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Randall Stewart	d1a6f62c45	Fix three flowtable bugs, a) one lookup issue, b) a two cleaner issue. MFC after: 3 days Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D4014	2015-11-02 21:21:00 +00:00
Hans Petter Selasky	c25290420e	Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies	2014-12-01 11:45:24 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Gleb Smirnoff	f788ab5f0b	Pacify gcc.	2014-03-05 02:35:15 +00:00
Gleb Smirnoff	ec0ad11ed2	Fix incorrect assertions.	2014-02-18 14:21:26 +00:00
Gleb Smirnoff	044ae9dbef	Add my copyright to flowtable.	2014-02-17 12:07:17 +00:00
Gleb Smirnoff	47c8550576	Whitespace.	2014-02-17 12:02:44 +00:00
Gleb Smirnoff	5ec03c2bff	Bring copyright notice to standard style.	2014-02-17 12:01:50 +00:00
Gleb Smirnoff	0ff96b4f55	o Remove at compile time the HASH_ALL code, that was never tested and is unfinished. However, I've tested my version, it works okay. As before it is unfinished: timeout aren't driven by TCP session state. To enable the HASH_ALL mode, one needs in kernel config: options FLOWTABLE_HASH_ALL o Reduce the alignment on flentry to 64 bytes. Without the FLOWTABLE_HASH_ALL option, twice less memory would be consumed by flows. o API to ip_output()/ip6_output() got even more thin: 1 liner. o Remove unused unions. Simply use fle->f_key[]. o Merge all IPv4 code into flowtable_lookup_ipv4(), and do same flowtable_lookup_ipv6(). Stop copying data to on stack sockaddr structures, simply use key[] on stack. o Move code from flowtable_lookup_common() that actually works on insertion into flowtable_insert(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-17 11:50:56 +00:00
Adrian Chadd	a2c5809961	Make sure that the flowtable flowid is only set to m_flowid if there isn't one already supplied. The previous flowtable code also did this. Reviewed by: glebius Sponsored by: Netflix, Inc.	2014-02-15 07:57:01 +00:00
Gleb Smirnoff	f0e49f6631	Whenever flowtable lookup fails, we do route lookup and then try to insert flow entry. During the route lookup the critical section is exited. It may happen, that after route lookup we will be executed on an other CPU that already has such flowentry. Before this change we simply freed the flowentry and returned to ip_output() with failure. Actually there is nothing wrong with using previously allocated flow entry, updating it properly. Thus, make flowentry_insert() return the new either old fle, and make use of it. Count reuses as "collisions" and real inserts as "inserts". Reviewed by: adrian Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-14 10:56:26 +00:00
Adrian Chadd	0e778c88c9	Don't insert a flowtable entry if the lle isn't yet valid. Some of the collisions that are occuring are due to flowtable lookups that succeed but have an invalid lle - typically because the L2 adjacency lookup hasn't completed. This would lead to a follow-up insert which would then fail (ie, collision) and the code would fall through to doing a slow-path L2/L3 lookup in the netinet/netinet6 code. This patch simply aborts storing a new flowtable entry if the lle isn't yet valid. Whilst I'm here, add a new pcpu counter for the item so the number of failures can be tracked separately from generic "collisions." Reviewed by: glebius MFC after: 10 days Sponsored by: Netflix, Inc.	2014-02-14 00:05:09 +00:00
Gleb Smirnoff	25c03b9100	Remove unused FL_NOAUTO.	2014-02-13 05:19:09 +00:00
Gleb Smirnoff	4343b5fa49	o Axe non-pcpu flowtable implementation. It wasn't enabled or used, and probably is a leftover from first prototyping by Kip. The non-pcpu implementation used mutexes, so it doubtfully worked better than simple routing lookup. o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays, use zpcpu_get() to access data in there. o Substitute own single list implementation with SLIST(). This has two functional side effects: - new flows go into head of a list, before they went to tail. - a bug when incorrect flow was deleted in flow cleaner is fixed. o Due to cache line alignment, there is no reason to keep different zones for IPv4 and IPv6 flows. Both consume one cache line, real size of allocation is equal. o Rely on that f_hash, f_rt, f_lle are stable during fle lifetime, remove useless volatile quilifiers. o More INET/INET6 splitting. Reviewed by: adrian Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-13 04:59:18 +00:00
Gleb Smirnoff	0d3009b37c	Remove ft_rtalloc and choose rtalloc function at compile time.	2014-02-08 22:12:00 +00:00
Gleb Smirnoff	2333893af3	Spacing.	2014-02-08 22:10:53 +00:00
Gleb Smirnoff	603819bc74	Remove never set flag FL_OVERWRITE. The only place where it was checked led to lock/critnest leak.	2014-02-08 09:56:26 +00:00
Gleb Smirnoff	f83f97fcbc	Remove unused defines.	2014-02-07 21:56:16 +00:00
Gleb Smirnoff	5d6d7e756b	o Revamp API between flowtable and netinet, netinet6. - ip_output() and ip_output6() simply call flowtable_lookup(), passing mbuf and address family. That's the only code under #ifdef FLOWTABLE in the protocols code now. o Revamp statistics gathering and export. - Remove hand made pcpu stats, and utilize counter(9). - Snapshot of statistics is available via 'netstat -rs'. - All sysctls are moved into net.flowtable namespace, since spreading them over net.inet isn't correct. o Properly separate at compile time INET and INET6 parts. o General cleanup. - Remove chain of multiple flowtables. We simply have one for IPv4 and one for IPv6. - Flowtables are allocated in flowtable.c, symbols are static. - With proper argument to SYSINIT() we no longer need flowtable_ready. - Hash salt doesn't need to be per-VNET. - Removed rudimentary debugging, which use quite useless in dtrace era. The runtime behavior of flowtable shouldn't be changed by this commit. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-02-07 15:18:23 +00:00
Gleb Smirnoff	7cf4986ba9	Spacing.	2014-02-07 10:05:12 +00:00
Maksim Yevmenkin	9ca42425c6	In the flowtable scanner, restart the scan at the last found position, not at position 0. Changes the scanner from O(N^2) to O(N). Submitted by: scottl Obtained from: Netflix, Inc MFC after: 3 weeks	2013-10-15 21:28:51 +00:00
Gleb Smirnoff	62208ca5d2	- Move jenkins.h to jenkins_hash.c - Provide missing function that can do hashing of arbitrary sized buffer. - Refetch lookup3.c and do only minimal edits to it, so that diff between our jenkins_hash.c and lookup3.c is minimal. - Add declarations for jenkins_hash(), jenkins_hash32() to sys/hash.h. - Document these functions in hash(9) Obtained from: http://burtleburtle.net/bob/c/lookup3.c	2012-09-04 12:07:33 +00:00
Gleb Smirnoff	b1d86af706	The llentry_update() is used only by flowtable and the latter always passes NULL pointer to it. Thus, code can be simplified and function renamed to llentry_alloc() to match rtalloc().	2012-08-02 13:20:44 +00:00
Gleb Smirnoff	bf9840512a	When ip_output()/ip6_output() is supplied a struct route *ro argument, it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)	2012-07-04 07:37:53 +00:00
Bjoern A. Zeeb	096f27864f	Fix FLOWTABLE IPv6 handling in route.c missed in r205066. While doing so, for consistency with the rtalloc_ign_fib(9) interface called, remove the "in_" prefix from rtalloc_ign_wrapper() no longer indicating that it would only handle the INET case. Sponsored by: Cisco Systems, Inc.	2012-02-03 10:17:34 +00:00
Kip Macy	0fe48d670f	A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly referenced within its timeout window. This change clears the LLE_VALID flag when an llentry is removed from an interface's hash table and adds an extra check to the flowtable code for the LLE_VALID flag in llentry to avoid retaining and using a stale reference. Reviewed by: qingli@ MFC after: 2 weeks	2012-01-26 20:02:40 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
John Baldwin	a8f4344f08	- Restore dropping the priority of syncer down to PPAUSE when it is idle. This was lost when it was converted to using a condition variable instead of lbolt. - Drop the priority of flowtable down to PPAUSE when it is idle as well since it is a similar background task. MFC after: 2 weeks	2011-01-06 22:17:07 +00:00
Bjoern A. Zeeb	c9a2711a54	Print the vnet pointer under DDB when iterating over flowtables of each virtual network stack instance. Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC after: 1 week [1] Early 2010.	2010-12-31 21:20:32 +00:00
Bjoern A. Zeeb	f0a56b0678	Move the increment operation under the lock and split the condition variable into two so that we can see on which one we are waiting. This might also more properly propagate the update of the flowclean_cycles flag and avoid "hangs" people were seeing. Suggested by: rwatson [1] Sponsored by: ISPsystem [1] Reviewed by: julian [1] Updated by: Mikolaj Golub (to.my.trociny gmail.com) Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC After: 1 week [1] Early 2010, initial version.	2010-12-31 21:06:52 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Kip Macy	83e711ec14	allocate ipv6 flows from the ipv6 flow zone reported by: rrs@ MFC after: 3 days	2010-05-16 21:48:39 +00:00
Kip Macy	19d0491585	workaround bug with ipv6 where a flow can have a null rtentry	2010-05-12 04:51:20 +00:00
Kip Macy	3e8b572db4	need to initialize the lock before it is used MFC after: 3 days	2010-04-27 23:48:50 +00:00
Kip Macy	3059584e2a	- boot-time size the ipv4 flowtable and the maximum number of flows - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5% MFC after: 7 days	2010-03-22 23:04:12 +00:00
Kip Macy	8847ae28f5	flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB pointed out by jkim@	2010-03-12 19:58:51 +00:00
Kip Macy	a398ca9cea	re-update copyright to 2010 pointed out by danfe@	2010-03-12 19:26:45 +00:00
Qing Li	688ba6823b	The flow-table module retrieves the destination and source address as well as the transport protocol port information from the outbound packets. The routing code is generic and compares every byte in the given sockaddr object. Therefore the temporary sockaddr objects must be cleared due to padding bytes. In addition, the port information must be stripped or the route search will either fail or return the incorrect route entry. Unit testing is done using OpenVPN over the if_tun interface. MFC after: 7 days	2010-03-12 10:24:58 +00:00
Kip Macy	112125d206	fix stats reporting sysctl	2010-03-12 06:31:19 +00:00
Kip Macy	d4121a02c0	- restructure flowtable to support ipv6 - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate MFC after: 7 days	2010-03-12 05:03:26 +00:00
Qing Li	c7ea0aa648	One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days	2010-03-09 01:11:45 +00:00
Martin Blapp	c2ede4b379	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00

1 2

73 Commits