freebsd-nq

Author	SHA1	Message	Date
Will Andrews	c1887e9f09	pf: remove unused ioctls. Several ioctls are unused in pf, in the sense that no base utility references them. Additionally, a cursory review of pf-based ports indicates they're not used elsewhere either. Some of them have been unused since the original import. As far as I can tell, they're also unused in OpenBSD. Finally, removing this code removes the need for future pf work to take them into account. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D16076	2018-07-01 01:16:03 +00:00
Andrey V. Elsukov	6e081509db	Add NULL pointer check. encap_lookup_t method can be invoked by IP encap subsytem even if none of gif/gre/me interfaces are exist. Hash tables are allocated on demand, when first interface is created. So, make NULL pointer check before doing access to hash table. PR: 229378	2018-06-28 11:39:27 +00:00
Andrey V. Elsukov	ca3cd72b17	Move BPFIF_* macro definitions into .c file, where struct bpf_if is declared. They are only used in this file and there is no need to export them via bpfdesc.h.	2018-06-19 10:34:45 +00:00
Eric Joyner	dfae03b5b5	iflib: Style fixes MFC after: 1 week	2018-06-18 17:27:43 +00:00
Marius Strobl	e4defe55a8	Assorted fixes to MSI-X/MSI/INTx setup in iflib(9): - In iflib_msix_init(), VMMs with broken MSI-X activation are trying to be worked around by manually enabling PCIM_MSIXCTRL_MSIX_ENABLE before calling pci_alloc_msix(9). Apart from constituting a layering violation, this has the problem of leaving PCIM_MSIXCTRL_MSIX_ENABLE enabled when falling back to MSI or INTx when e. g. MSI-X is black- listed and initially also when disabled via hw.pci.enable_msix. The later in turn was incorrectly worked around in r325166. Since r310806, pci(4) itself has code to deal with broken MSI-X handling of VMMs, so all of these workarounds in iflib(9) can go, fixing non-working interrupts when falling back to MSI/INTx. In any case, possibly further adjustments to broken MSI-X activation of VMMs like enabling r310806 by default in VM environments need to be placed into pci(4), not iflib(9). [1] - Also remove the pci_enable_busmaster(9) call from iflib_msix_init(), which is already more properly invoked from iflib_device_attach(). - When falling back to MSI/INTx, release the MSI-X BAR resource again. - When falling back to INTx, ensure scctx->isc_vectors is set to 1 and not to something higher from a device with more than one MSI message supported. - Make the nearby ring_state(s) stuff (static) const. Discussed with: jhb at BSDCan 2018 [1] Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D15729	2018-06-17 20:33:02 +00:00
Andrey V. Elsukov	ae11b3829c	Fix typo. Reported by: rpokala	2018-06-16 19:21:09 +00:00
Andrey V. Elsukov	20efcfc602	Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). Using of rwlock with multiqueue NICs for IP forwarding on high pps produces high lock contention and inefficient. Rmlock fits better for such workloads. Reviewed by: melifaro, olivier Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15789	2018-06-16 08:26:23 +00:00
Andrey V. Elsukov	9597ff83b8	Add missing BPF_MTAP2() for outbound packets.	2018-06-14 15:04:30 +00:00
Andrey V. Elsukov	2addcba7d5	Convert if_me(4) driver to use encap_lookup_t method and be lockless on data path.	2018-06-14 14:53:24 +00:00
Jonathan T. Looney	0766f278d8	Make UMA and malloc(9) return non-executable memory in most cases. Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691	2018-06-13 17:04:41 +00:00
Andrey V. Elsukov	a5185adeb6	Rework if_gre(4) to use encap_lookup_t method to speedup lookup of needed interface when many gre interfaces are present. Remove rmlock from gre_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc.	2018-06-13 11:11:33 +00:00
Jonathan T. Looney	16a227c7c9	Fix a memory leak for the BIOCSETWF ioctl on kernels with the BPF_JITTER option. The BPF code was creating a compiled filter in the common filter-creation path. However, BPF only uses compiled filters in the read direction. When creating a write filter, the common filter-creation code was creating an unneeded write filter and leaking the memory used for that. MFC after: 2 weeks Sponsored by: Netflix	2018-06-11 23:32:06 +00:00
Andrey V. Elsukov	44bcc06816	Explicitly change the link state when we assingn an address. Since we are setting IFF_UP flag on SIOCSIFADDR, it is possible, that after this link state information still not initialized properly. This leads to problems with routing, since now interface has IFCAP_LINKSTATE capability and a route is considered as working only when interface's link state is in LINK_STATE_UP (see RT_LINK_IS_UP() macro). Reported by: Marek Zarychta MFC after: 3 days	2018-06-09 09:57:14 +00:00
Stephen Hurd	3ab4a96085	Remove tx task spinning added in r333686 This caused issues with PASTE. Just remove the reschedule since the DELAY() should be enough for use cases such as pkt-gen which were failing before the change. Reported by: Michio Honda Sponsored by: Limelight Networks	2018-06-08 21:49:19 +00:00
Mateusz Guzik	b8af2820f6	uma: fix up r334824 Turns out there is code which ends up passing M_ZERO to counters. Since counters zero unconditionally on their own, just ignore drop the flag in that place.	2018-06-08 05:40:36 +00:00
Matt Macy	58378a8971	rtentry_zinit: don't blindly pass through M_ZERO to counter alloc	2018-06-08 05:17:06 +00:00
Eric Joyner	a06424ddd3	iflib: Record TCP checksum info in iflib when TCP checksum is requested ixl(4) (when it switches over to using iflib) devices need the TCP header length in order to do TCP checksum offload. Reviewed by: gallatin@, shurd@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15558	2018-06-07 13:03:07 +00:00
Andrey V. Elsukov	b941bc1d6e	Rework if_gif(4) to use new encap_lookup_t method to speedup lookup of needed interface when many gif interfaces are present. Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead. Move more AF-related code into AF-related locations. Use hash table to speedup lookup of needed softc. Interfaces with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST. Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed 16 years ago, and actually it could work only for outbound direction. Each protocol, that can be handled by if_gif(4) interface is registered by separate encap handler, this helps avoid invoking the handler for unrelated protocols (GRE, PIM, etc.). This change allows dramatically improve performance when many gif(4) interfaces are used. Sponsored by: Yandex LLC	2018-06-05 21:24:59 +00:00
Andrey V. Elsukov	6d8fdfa9d5	Rework IP encapsulation handling code. Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617	2018-06-05 20:51:01 +00:00
Matt Macy	a6bc59f203	Reduce overhead of entropy collection - move harvest mask check inline - move harvest mask to frequently_read out of actively modified cache line - disable ether_input collection and describe its limitations in NOTES Typically entropy collection in ether_input was stirring zero in to the entropy pool while at the same time greatly reducing max pps. This indicates that perhaps we should more closely scrutinize how much entropy we're getting from a given source as well as what our actual entropy collection needs are for seeding Yarrow. Reviewed by: cem, gallatin, delphij Approved by: secteam Differential Revision: https://reviews.freebsd.org/D15526	2018-05-31 21:53:07 +00:00
Hans Petter Selasky	5fd1ea0810	Re-apply r190640. - Restore local change to include <net/bpf.h> inside pcap.h. This fixes ports build problems. - Update local copy of dlt.h with new DLT types. - Revert no longer needed <net/bpf.h> includes which were added as part of r334277. Suggested by: antoine@, delphij@, np@ MFC after: 3 weeks Sponsored by: Mellanox Technologies	2018-05-31 09:11:21 +00:00
Matt Macy	91d6c9b93e	if_setlladdr: don't call ioctl in epoch context PR: 228612 Reported by: markj	2018-05-30 21:46:10 +00:00
Kristof Provost	d25c25dc52	pf: Add missing include statement rmlocks require <sys/lock.h> as well as <sys/rmlock.h>. Unbreak mips build.	2018-05-30 12:40:37 +00:00
Kristof Provost	455969d305	pf: Replace rwlock on PF_RULES_LOCK with rmlock Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock. This change improves packet processing rate in high pps environments. Benchmarking by olivier@ shows a 65% improvement in pps. While here, also eliminate all appearances of "sys/rwlock.h" includes since it is not used anymore. Submitted by: farrokhi@ Differential Revision: https://reviews.freebsd.org/D15502	2018-05-30 07:11:33 +00:00
Stephen Hurd	3e0e6330b5	iflib: mark irq allocation name parameter as constant The name parameter passed to iflib_irq_alloc_generic and iflib_softirq_alloc_generic is never modified. Many places in code pass string literals and thus should not be modified. Mark the name parameter as a const char * instead, so that we enforce that the name is not modified before passing to bus_describe_intr() Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: kmacy Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15343	2018-05-29 21:56:39 +00:00
Matt Macy	6c3c319414	iflib: hold context lock across detach for drivers that need it	2018-05-29 18:03:43 +00:00
Matt Macy	134804c89a	rt_getifa_fib: don't use ifa but info->rti_ifa Reported by: kp	2018-05-29 07:14:57 +00:00
Matt Macy	1ebec5faf4	route: fix missed ref adds - ensure that we bump the ifa ref whenever we add a reference - defer freeing epoch protected references until after the if_purgaddrs loop	2018-05-29 00:53:53 +00:00
Eric Joyner	1d7ef1867a	iflib: Add new shared flag: IFLIB_ADMIN_ALWAYS_RUN ixl(4)'s nvmupdate utility expects the nvmupdate process to run while the interface is down; these nvm update commands use the admin queue, so the admin queue needs to be able to generate interrupts and be processed while the interface is down. So add a flag that ixl(4) sets that lets the entire admin task run even when the interface is marked down/IFF_DRV_RUNNING isn't set. With this change, nvmupdate should function like it did pre-iflib. Reviewed by: gallatin@, sbruno@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D15575	2018-05-26 00:46:08 +00:00
Matt Macy	9379029a92	rtrequest1_fib: we need to always bump the ifaddr refcount when we take a reference from an rtentry. r334118 introduced a case when this was not done. While we're here make the intent more obvious by moving the refcount bump down to when we know we'll actually need it. Reported by: markj	2018-05-25 19:48:26 +00:00
Matt Macy	0f8d79d977	CK: update consumers to use CK macros across the board r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors	2018-05-24 23:21:23 +00:00
Matt Macy	5328b11c95	if_delgroups: add missed unlock introduced by r334118	2018-05-24 17:54:08 +00:00
Matt Macy	4f6c66cc9c	UDP: further performance improvements on tx Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps Single stream throughput increases from 910kpps to 1.18Mpps Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg - Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg - Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg - Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW. Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409	2018-05-23 21:02:14 +00:00
Luca Pizzamiglio	11d416666d	Improve MAC address uniqueness on if_epair(4). As reported in PR184149, it can happen that epair devices can have the same MAC address. This solution is based on a 32-bit hash, obtained combining the if_index of the a interface and the hostid. If the hostid is zero, a random number is used. PR: 184149 Reviewed by: wollman, eugen Approved by: cognet Differential Revision: https://reviews.freebsd.org/D15329	2018-05-23 13:10:57 +00:00
Mark Johnston	db5a36bddf	Simplify lagg_input(). No functional change intended. MFC after: 2 weeks	2018-05-22 15:35:38 +00:00
Matt Macy	fd04260d3f	ck: simplify interface with libkvm consumers by defining ck_queue types as their queue.h equivalents if !_KERNEL	2018-05-21 01:53:23 +00:00
Matt Macy	f6cb0dea4c	net: fix uninitialized variable warning	2018-05-19 19:00:04 +00:00
Matt Macy	e335651e1e	mp_ring: fix i386 Even though 64-bit atomics are supported on i386 there are panics indicating that the code does not work correctly there. Switch to mutex based variant (and fix that while we're here). Reported by: pho, kib	2018-05-19 16:44:12 +00:00
Matt Macy	46d0f824be	net: fix set but not used	2018-05-19 05:27:49 +00:00
Matt Macy	d7c5a620e2	ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366	2018-05-18 20:13:34 +00:00
Matt Macy	f2d19f98c1	epoch(9): allocate net epochs earlier in boot	2018-05-18 18:48:00 +00:00
Matt Macy	d71e30de40	epoch: move epoch variables to read mostly section	2018-05-18 17:58:15 +00:00
Ed Maste	891cf3ed44	Use NULL for SYSINIT's last arg, which is a pointer type Sponsored by: The FreeBSD Foundation	2018-05-18 17:58:09 +00:00
Matt Macy	70398c2f86	epoch(9): Make epochs non-preemptible by default There are risks associated with waiting on a preemptible epoch section. Change the name to make them not be the default and document the issue under CAVEATS. Reported by: markj	2018-05-18 17:29:43 +00:00
Matt Macy	5e68a3dfe3	epoch: add non-preemptible "critical" variant adds: - epoch_enter_critical() - can be called inside a different epoch, starts a section that will acquire any MTX_DEF mutexes or do anything that might sleep. - epoch_exit_critical() - corresponding exit call - epoch_wait_critical() - wait variant that is guaranteed that any threads in a section are running. - epoch_global_critical - an epoch_wait_critical safe epoch instance Requested by: markj Approved by: sbruno	2018-05-18 01:52:51 +00:00
Matt Macy	2aa6f526de	Fix !netmap build post r333686 Approved by: sbruno	2018-05-16 22:25:47 +00:00
Stephen Hurd	5ee36c68e1	Work around lack of TX IRQs in iflib for netmap When poll() is called via netmap, txsync is initially called, and if there are no available buffers to reclaim, it waits for the driver to notify of new buffers. Since the TX IRQ is generally not used in iflib drivers, this ends up causing a timeout. Work around this by having the reclaim DELAY(1) if it's initially unable to reclaim anything, then schedule the tx task, which will spin by continuously rescheduling itself until some buffers are reclaimed. In general, the delay is enough to allow some buffers to be reclaimed, so spinning is minimized. Reported by: Johannes Lundberg <johalun0@gmail.com> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15455	2018-05-16 21:03:22 +00:00
Stephen Hurd	99031b8f7d	Replace rmlock with epoch in lagg Use the new epoch based reclamation API. Now the hot paths will not block at all, and the sx lock is used for the softc data. This fixes LORs reported where the rwlock was obtained when the sxlock was held. Submitted by: mmacy Reported by: Harry Schmalzbauer <freebsd@omnilan.de> Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15355	2018-05-14 20:06:49 +00:00
Matt Macy	09f6ff4f1a	iflib(9): Add support for cloning pseudo interfaces Part 3 of many ... The VPC framework relies heavily on cloning pseudo interfaces (vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc). This pulls in that piece. Some ancillary changes get pulled in as a side effect. Reviewed by: shurd@ Approved by: sbruno@ Sponsored by: Joyent, Inc. Differential Revision: https://reviews.freebsd.org/D15347	2018-05-11 20:08:28 +00:00
Andrey V. Elsukov	e287c474be	Apply the change from r272770 to if_ipsec(4) interface. It is guaranteed that if_ipsec(4) interface is used only for tunnel mode IPsec, i.e. decrypted and decapsultaed packet has its own IP header. Thus we can consider it as new packet and clear the protocols flags. This allows ICMP/ICMPv6 properly handle errors that may cause this packet. PR: 228108 MFC after: 1 week	2018-05-11 16:50:25 +00:00

1 2 3 4 5 ...

3961 Commits