freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Andrew Turner	5f901c92a8	Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147	2018-07-24 16:35:52 +00:00
Andrey V. Elsukov	6d8fdfa9d5	Rework IP encapsulation handling code. Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617	2018-06-05 20:51:01 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Matt Macy	f6960e207e	netinet silence warnings	2018-05-19 05:56:21 +00:00
Conrad Meyer	222daa421f	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035	2018-01-25 22:25:13 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Eric van Gyzen	40769242ed	Add some ntohl() love to r315277 inet_ntoa() and inet_ntoa_r() take the address in network byte-order. When I removed those calls, I should have replaced them with ntohl() to make the hex addresses slightly less unreadable. Here they are. See r315277 regarding classic blunders. vangyzen: you're deep in "no good deed" territory, it seems --badger Reported by: ian MFC after: 3 days MFC when: I finally get it right Sponsored by: Dell EMC	2017-03-14 20:57:54 +00:00
Eric van Gyzen	47d803ea71	KTR: log IPv4 addresses in hex rather than dotted-quad When I made the changes in r313821, I fell victim to one of the classic blunders, the most famous of which is: never get involved in a land war in Asia. But only slightly less well known is this: Keep your brain turned on and engaged when making a tedious, sweeping, mechanical change. KTR can correctly log the immediate integral values passed to it, as well as constant strings, but not non-constant strings, since they might change by the time ktrdump retrieves them. Reported by: glebius MFC after: 3 days Sponsored by: Dell EMC	2017-03-14 18:27:48 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Eric van Gyzen	8144690af4	Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. Suggested by: glebius, emaste Reviewed by: gnn MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:47:41 +00:00
Kevin Lo	c3bef61e58	Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878	2016-09-15 07:41:48 +00:00
Bjoern A. Zeeb	89856f7e2d	Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated. Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC. Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet. For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown. Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers. For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()). Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level. Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747	2016-06-21 13:48:49 +00:00
Pedro F. Giffuni	99d628d577	netinet: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle. Reviewed by: ae. tuexen	2016-04-15 15:46:41 +00:00
Alexander V. Chernikov	10e0e23528	Remove now-unused wrappers for various routing functions.	2016-01-14 08:54:44 +00:00
Alexander V. Chernikov	ea8d14925c	Remove sys/eventhandler.h from net/route.h Reviewed by: ae	2016-01-09 09:34:39 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Gleb Smirnoff	6d947416cc	o Use new function ip_fillid() in all places throughout the kernel, where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes	2015-04-01 22:26:39 +00:00
Gleb Smirnoff	6df8a71067	Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.	2014-11-07 09:39:05 +00:00
Robert Watson	f0cace5d94	When deciding whether to call m_pullup() even though there is adequate data in an mbuf, use M_WRITABLE() instead of a direct test of M_EXT; the latter both unnecessarily exposes mbuf-allocator internals in the protocol stack and is also insufficient to catch all cases of non-writability. (NB: m_pullup() does not actually guarantee that a writable mbuf is returned, so further refinement of all of these code paths continues to be required.) Reviewed by: bz MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D900	2014-10-12 15:49:52 +00:00
Kevin Lo	73d76e77b6	Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb	2014-08-15 02:43:02 +00:00
Kevin Lo	8f5a8818f5	Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb	2014-08-08 01:57:15 +00:00
Gleb Smirnoff	547246a373	Fix fallout from r241923. Calculate length of payload in pim_input() properly. While here, remove extra variable and incorrect condition before m_pullup(). Reported by: Olivier Cochard-Labbé <olivier cochard.me> Sponsored by: Nginx, Inc.	2014-01-22 10:57:42 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
John Baldwin	86d93a15ff	Use LIST_FOREACH_SAFE() instead of doing it by hand.	2013-09-05 14:26:37 +00:00
John Baldwin	fa302f207f	Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[]. PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week	2013-09-05 14:16:37 +00:00
Andrey V. Elsukov	d983befd2f	Remove unused code and sort variables declarations. PR: kern/181822 MFC after: 1 week	2013-09-05 08:12:36 +00:00
Andrey V. Elsukov	5b7cb97c2b	Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU counters.	2013-07-09 09:50:15 +00:00
Xin LI	f2297451fe	Fix incomplete printf. PR: kern/177889 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week	2013-04-16 19:32:12 +00:00
Xin LI	c1031303f0	Don't leak lock when returning. PR: kern/177888 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week	2013-04-16 19:25:41 +00:00
Gleb Smirnoff	dc4ad05ecd	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Gleb Smirnoff	078468ede4	o Remove last argument to ip_fragment(), and obtain all needed information on checksums directly from mbuf flags. This simplifies code. o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in hardware. Some driver may not announce CSUM_IP in theur if_hwassist, although try to do checksums if CSUM_IP set on mbuf. Example is em(4). o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP. After this change CSUM_DELAY_IP vanishes from the stack. Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>	2012-10-26 21:06:33 +00:00
Gleb Smirnoff	8f134647ca	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Gleb Smirnoff	23e9c6dc1e	After r241245 it appeared that in_delayed_cksum(), which still expects host byte order, was sometimes called with net byte order. Since we are moving towards net byte order throughout the stack, the function was converted to expect net byte order, and its consumers fixed appropriately: - ip_output(), ipfilter(4) not changed, since already call in_delayed_cksum() with header in net byte order. - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order there and back. - mrouting code and IPv6 ipsec now need to switch byte order there and back, but I hope, this is temporary solution. - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum(). - pf_route() catches up on r241245 changes to ip_output().	2012-10-08 08:03:58 +00:00
Gleb Smirnoff	3df6468a2d	Remove route caching from IP multicast routing code. There is no reason to do that, and also, cached route never got unreferenced, which meant a reference leak. Reviewed by: bms	2012-07-02 19:44:18 +00:00
Marko Zec	1c7792950c	Change SYSINIT priorities so that ip_mroute_modevent() is executed before vnet_mroute_init(), since vnet_mroute_init() depends on mfchashsize tunable to be set, and that is done in in ip_mroute_modevent(). Apparently I broke that ordering with r208744 almost 2 years ago... PR: kern/162201 Submitted by: Stevan Markovic (mcafee.com) MFC after: 3 days	2012-03-04 18:59:38 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Marko Zec	7c4b8137cd	Virtualize the IPv4 multicast routing code. Submitted by: iprebeg Reviewed by: bms, bz, Pavlin Radoslavov MFC after: 30 days	2010-06-02 15:44:43 +00:00
Pawel Jakub Dawidek	957d68dd91	No need to include security/mac/mac_framework.h here.	2010-02-18 22:26:01 +00:00
Shteryana Shopova	7c90b0258f	Make sure the multicast forwarding cache entry's stall queue is properly initialized before trying to insert an entry into it. PR: kern/142052 Reviewed by: bms MFC after: now	2009-12-30 08:52:13 +00:00
Bruce M Simpson	6cbbe26f98	In expire_mfc(), add an assert on the multicast forwarding cache mutex. PR: 138666	2009-09-13 01:00:24 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Robert Watson	8c0fec805f	Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)	2009-06-23 20:19:09 +00:00
Roman Divacky	e40bae9a45	Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang. Reviewed by: rwatson Approved by: ed (mentor)	2009-06-21 10:29:31 +00:00

1 2 3 4 5

201 Commits