freebsd-nq

Author	SHA1	Message	Date
Marius Strobl	a6611c938b	Fix the build with ALTQ after r344060.	2019-02-12 22:33:17 +00:00
Marius Strobl	f855ec814d	Make taskqgroup_attach{,_cpu}(9) work across architectures So far, intr_{g,s}etaffinity(9) take a single int for identifying a device interrupt. This approach doesn't work on all architectures supported, as a single int isn't sufficient to globally specify a device interrupt. In particular, with multiple interrupt controllers in one system as found on e. g. arm and arm64 machines, an interrupt number as returned by rman_get_start(9) may be only unique relative to the bus and, thus, interrupt controller, a certain device hangs off from. In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}() not work across architectures. Yet in turn, iflib(4) as gtaskqueue consumer so far doesn't fit architectures where interrupt numbers aren't globally unique. However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as employed by the gtaskqueue implementation to bind an interrupt to a particular CPU, using bus_bind_intr(9) instead is equivalent from a functional point of view, with bus_bind_intr(9) taking the device and interrupt resource arguments required for uniquely specifying a device interrupt. Thus, change the gtaskqueue implementation to employ bus_bind_intr(9) instead and intr_{g,s}etaffinity(9) to take the device and interrupt resource arguments required respectively. This change also moves struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL as userland likes to include <sys/_task.h> or indirectly drags it in - for better or worse also with _KERNEL defined -, which with device_t and struct resource dependencies otherwise is no longer as easily possible now. The userland inclusion problem probably can be improved a bit by introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the existing _WANT_PRISON etc., which is orthogonal to this change, though, and likely needs an exp-run. While at it: - Change the gt_cpu member in the grouptask structure to be of type int as used elswhere for specifying CPUs (an int16_t may be too narrow sooner or later), - move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to the gtaskqueue implementation as it's only used and needed there, - change the GTASK_INIT macro to use "gtask" rather than "task" as argument given that it actually operates on a struct gtask rather than a struct task, and - let subr_gtaskqueue.c consistently use __func__ to print functions names. Reported by: mmel Reviewed by: mmel Differential Revision: https://reviews.freebsd.org/D19139	2019-02-12 21:23:59 +00:00
Marius Strobl	95dcf343b7	Further correct and optimize the bus_dma(9) usage of iflib(4): o Correct the obvious bugs in the netmap(4) parts: - No longer check for the existence of DMA maps as bus_dma(9) is used unconditionally in iflib(4) since r341095. - Supply the correct DMA tag and map pairs to bus_dma(9) functions (see also the commit message of r343753). - In iflib_netmap_timer_adjust(), add synchronization of the TX descriptors before calling the ift_txd_credits_update method as the latter evaluates the TX descriptors possibly updated by the MAC. - In _task_fn_tx(), wrap the netmap(4)-specific bits in #ifdef DEV_NETMAP just as done in _task_fn_admin() and _task_fn_rx() respectively. o In iflib_fast_intr_rxtx(), synchronize the TX rather than the RX descriptors before calling the ift_txd_credits_update method (see also above). o There's no need to synchronize an RX buffer that is going to be recycled in iflib_rxd_pkt_get(), yet; it's sufficient to do that as late as passing RX buffers to the MAC via the ift_rxd_refill method. Hence, combine that synchronization with the synchronization of new buffers into a common spot in _iflib_fl_refill(). o There's no need to synchronize the RX descriptors of a free list in preparation of the MAC updating their statuses with every invocation of rxd_frag_to_sd(); it's enough to do this once before handing control over to the MAC, i. e. before calling ift_rxd_flush method in _iflib_fl_refill(), which already performs the necessary synchronization. o Given that the ift_rxd_available method evaluates the RX descriptors which possibly have been altered by the MAC, synchronize as appropriate beforehand. Most notably this is now done in iflib_rxd_avail(), which in turn means that we don't need to issue the same synchronization yet again before calling the ift_rxd_pkt_get method in iflib_rxeof(). o In iflib_txd_db_check(), synchronize the TX descriptors before handing them over to the MAC for transmission via the ift_txd_flush method. o In iflib_encap(), move the TX buffer synchronization after the invocation of the ift_txd_encap() method. If the MAC driver fails to encapsulate the packet and we retry with a defragmented mbuf chain or finally fail, the cycles for TX buffer synchronization have been wasted. Synchronizing afterwards matches what non-iflib(4) drivers typically do and is sufficient as the MAC will not actually start with the transmission before - in this case - the ift_txd_flush method is called. Moreover, for the latter reason the synchronization of the TX descriptors in iflib_encap() can go as it's enough to synchronize them before passing control over to the MAC by issuing the ift_txd_flush() method (see above). o In iflib_txq_can_drain(), only synchronize TX descriptors if the ift_txd_credits_update method accessing these is actually called. Differential Revision: https://reviews.freebsd.org/D19081	2019-02-12 21:08:44 +00:00
Patrick Kelsey	8f2ac65690	Reduce the time it takes the kernel to install a new PF config containing a large number of queues In general, the time savings come from separating the active and inactive queues lists into separate interface and non-interface queue lists, and changing the rule and queue tag management from list-based to hash-bashed. In HFSC, a linear scan of the class table during each queue destroy was also eliminated. There are now two new tunables to control the hash size used for each tag set (default for each is 128): net.pf.queue_tag_hashsize net.pf.rule_tag_hashsize Reviewed by: kp MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D19131	2019-02-11 05:17:31 +00:00
Marius Strobl	bfce461ee9	o As illustrated by e. g. figure 7-14 of the Intel 82599 10 GbE controller datasheet revision 3.3, in the context of Ethernet MACs the control data describing the packet buffers typically are named "descriptors". Each of these descriptors references one buffer, multiple of which a packet can be composed of. By contrast, in comments, messages and the names of structure members, iflib(4) refers to DMA resources employed for RX and TX buffers (rather than control data) as "desc(riptors)". This odd naming convention of iflib(4) made reviewing r343085 and identifying wrong and missing bus_dmamap_sync(9) calls in particular way harder than it already is. This convention may also explain why the netmap(4) part of iflib(4) pairs the DMA tags for control data with DMA maps of buffers and vice versa in calls to bus_dma(9) functions. Therefore, change iflib(4) to refer to buf(fers) when buffers and not the usual understanding of descriptors is meant. This change does not include corrections to the DMA resources used in the netmap(4) parts. However, it revises error messages to state which kind of allocation/creation failed. Specifically, the "Unable to allocate tx_buffer (map) memory" copy & pasted inappropriately on several occasions was replaced with proper messages. o Enhance some other error messages to indicate which half - RX or TX - they apply to instead of using identical text in both cases and generally canonicalize them. o Correct the descriptions of iflib_{r,t}xsd_alloc() to reflect reality; current code doesn't use {r,t}x_buffer structures. o In iflib_queues_alloc(): - Remove redundant BUS_DMA_NOWAIT of iflib_dma_alloc() calls, - change the M_WAITOK from malloc(9) calls into M_NOWAIT. The return values are already checked, deferred DMA allocations not being an option at this point, BUS_DMA_NOWAIT has to be used anyway and prior malloc(9) calls in this function also specify M_NOWAIT. Reviewed by: shurd Differential Revision: https://reviews.freebsd.org/D19067	2019-02-04 20:46:57 +00:00
Gleb Smirnoff	3ca1c423aa	Teach pfil_ioctl() about VIMAGE. Submitted by: gallatin	2019-02-03 08:28:02 +00:00
Vincenzo Maffione	5faab77822	netmap: upgrade sync-kloop support Add SYNC_KLOOP_MODE option, and add support for direct mode, where application executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback. MFC after: 5 days	2019-02-02 22:39:29 +00:00
Gleb Smirnoff	b252313f0b	New pfil(9) KPI together with newborn pfil API and control utility. The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses. In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together. New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters. Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points. Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated. Differential Revision: https://reviews.freebsd.org/D18951	2019-01-31 23:01:03 +00:00
John Baldwin	829c56fc08	Don't set IFCAP_TXRTLMT during lagg_clone_create(). lagg_capabilities() will set the capability once interfaces supporting the feature are added to the lagg. Setting it on a lagg without any interfaces is pointless as the if_snd_tag_alloc call will always fail in that case. Reviewed by: hselasky, gallatin MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19040	2019-01-31 21:35:37 +00:00
Gleb Smirnoff	f712b16127	Revert r316461: Remove "IPFW static rules" rmlock, and use pfil's global lock. The pfil(9) system is about to be converted to epoch(9) synchronization, so we need [temporarily] go back with ipfw internal locking. Discussed with: ae	2019-01-31 21:04:50 +00:00
Marius Strobl	b97de13ae0	- Stop iflib(4) from leaking MSI messages on detachment by calling bus_teardown_intr(9) before pci_release_msi(9). - Ensure that iflib(4) and associated drivers pass correct RIDs to bus_release_resource(9) by obtaining the RIDs via rman_get_rid(9) on the corresponding resources instead of using the RIDs initially passed to bus_alloc_resource_any(9) as the latter function may change those RIDs. Solely em(4) for the ioport resource (but not others) and bnxt(4) were using the correct RIDs by caching the ones returned by bus_alloc_resource_any(9). - Change the logic of iflib_msix_init() around to only map the MSI-X BAR if MSI-X is actually supported, i. e. pci_msix_count(9) returns > 0. Otherwise the "Unable to map MSIX table " message triggers for devices that simply don't support MSI-X and the user may think that something is wrong while in fact everything works as expected. - Put some (mostly redundant) debug messages emitted by iflib(4) and em(4) during attachment under bootverbose. The non-verbose output of em(4) seen during attachment now is close to the one prior to the conversion to iflib(4). - Replace various variants of spelling "MSI-X" (several in messages) with "MSI-X" as used in the PCI specifications. - Remove some trailing whitespace from messages emitted by iflib(4) and change them to consistently start with uppercase. - Remove some obsolete comments about releasing interrupts from drivers and correct a few others. Reviewed by: erj, Jacob Keller, shurd Differential Revision: https://reviews.freebsd.org/D18980	2019-01-30 13:21:26 +00:00
Marius Strobl	3db348b54a	- In _iflib_fl_refill(), don't mark an RX buffer as available in the corresponding bitmap before adding an mbuf has actually succeeded. Previously, m_gethdr(M_NOWAIT, ...) failing caused a "hole" in the RX ring but not in its bitmap. One implication of such a hole was that in a subsequent call to _iflib_fl_refill() with the RX buffer accounting still indicating another reclaimable buffer, bit_ffc(3) nevertheless returned -1 in frag_idx which in turn caused havoc when used as an index. Thus, additionally assert that frag_idx is 0 or greater. Another possible consequence of a hole in the RX ring was a NULL- dereference when trying to use the unallocated mbuf, for example in iflib_rxd_pkt_get(). While at it, make the variable declarations in _iflib_fl_refill() conform to style(9) and remove redundant checks already performed by bit_ffc{,_at}(3). - In iflib_queues_alloc(), don't pass redundant M_ZERO to bit_alloc(3). Reported and tested by: pho	2019-01-26 21:35:51 +00:00
Andrew Gallatin	77102fd6a2	Fix an iflib driver unload panic introduced in r343085 The new loop to sync and unload descriptors was indexed by "i", rather than "j". The panic was caused by "i" being advanced rather than "j", and eventually becoming out of bounds. Reviewed by: kib MFC after: 3 days Sponsored by: Netflix	2019-01-25 15:02:18 +00:00
Vincenzo Maffione	f79ba6d75b	netmap: improvements to the netmap kloop (CSB mode) Changelist: - Add the proper memory barriers in the kloop ring processing functions. - Fix memory barriers usage in the user helpers (nm_sync_kloop_appl_write, nm_sync_kloop_appl_read). - Fix nm_kr_txempty() helper to look at rhead rather than rcur. This is important since the kloop can read a value of rcur which is ahead of the value of rhead (see explanation in nm_sync_kloop_appl_write) - Remove obsolete ptnetmap_guest_write_kring_csb() and ptnet_guest_read_kring_csb(), and update if_ptnet(4) to use those. - Prepare in advance the arguments for netmap_sync_kloop_[tr]x_ring(), to make the kloop faster. - Provide kernel and user implementation for nm_ldld_barrier() and nm_ldst_barrier() MFC after: 2 weeks	2019-01-23 14:51:36 +00:00
Brooks Davis	bc6f170e30	Rework CASE_IOC_IFGROUPREQ() to require a case before the macro. This is more compatible with formatting tools and looks more normal. Reported by: jhb (on a different review) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18442	2019-01-22 17:39:26 +00:00
Patrick Kelsey	8f82136aec	onvert vmx(4) to being an iflib driver. Also, expose IFLIB_MAX_RX_SEGS to iflib drivers and add iflib_dma_alloc_align() to the iflib API. Performance is generally better with the tunable/sysctl dev.vmx.<index>.iflib.tx_abdicate=1. Reviewed by: shurd MFC after: 1 week Relnotes: yes Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D18761	2019-01-22 01:11:17 +00:00
Patrick Kelsey	7f3eb9dab3	Fix various resource leaks that can occur in the error paths of iflib_device_register() and iflib_pseudo_register(). Reviewed by: shurd MFC after: 1 week Sponsored by: RG Nets Differential Revision: https://reviews.freebsd.org/D18760	2019-01-22 00:56:44 +00:00
Konstantin Belousov	8a04b53dce	Improve iflib busdma(9) KPI use. - Specify BUS_DMA_NOWAIT for bus_dmamap_load() on rx refill, since callbacks are not supposed to be used. - Match tso/non-tso tags to corresponding tx map operations. Create separate tso maps for tx descriptors. In particular, do not use non-tso tag to load, unload, or destroy a map created with tso tag. - Add missed bus_dmamap_sync() calls. Submitted by: marius. Reported and tested by: pho Reviewed by: marius Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-01-16 05:44:14 +00:00
Gleb Smirnoff	cc31f9821c	Remove recursive NET_EPOCH_ENTER() from sysctl_ifmalist(), missed in r342872.	2019-01-11 00:45:22 +00:00
Gleb Smirnoff	7b7f772fa0	Bring the comment up to date.	2019-01-10 00:37:14 +00:00
Mark Johnston	72755d285f	Stop setting if_linkmib in vlan(4) ifnets. There are several reasons: - The structure being exported via IFDATA_LINKSPECIFIC doesn't appear to be a standard MIB. - The structure being exported is private to the kernel and always has been. - No other drivers in common use set the if_linkmib field. - Because IFDATA_LINKSPECIFIC can be used to overwrite the linkmib structure, a privileged user could use it to corrupt internal vlan(4) state. [1] PR: 219472 Reported by: CTurt <ecturt@gmail.com> [1] Reviewed by: kp (previous version) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18779	2019-01-09 16:47:16 +00:00
Gleb Smirnoff	a68cc38879	Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin	2019-01-09 01:11:19 +00:00
Gleb Smirnoff	21ee61a6cf	Remove part of comment that doesn't match reality.	2019-01-09 00:38:16 +00:00
Stephen Hurd	cd28ea929a	Use iflib_if_init_locked() during resume instead of iflib_init_locked(). iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for suspend. iflib_if_init_locked() calls stop then init, so fixes the problem. This was causing errors after a resume from suspend. PR: 224059 Reported by: zeising MFC after: 1 week Sponsored by: Limelight Networks	2019-01-07 23:46:54 +00:00
Matt Macy	5881181d8c	mp_ring: avoid items offset difference between iflib and mp_ring on architectures without 64-bit atomics Reported by: Augustin Cavalier <waddlesplash@gmail.com>	2019-01-03 23:06:05 +00:00
Konstantin Belousov	85f3b801e9	Fix typo, use boolean operator instead of bit-wise. Reviewed by: marius, shurd MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-01-03 01:01:03 +00:00
Mateusz Guzik	cc426dd319	Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation	2018-12-11 19:32:16 +00:00
Stephen Hurd	7124b5ba04	Fix !tx_abdicate error from r336560 r336560 was supposed to restore pre-r323954 behaviour when tx_abdicate is not set (the default case). However, it appears that rather than the drainage check being made conditional on tx_abdicate being set, it was duplicated so it occured twice if tx_abdicate was set and once if it was not. Now when !tx_abdicate, drainage is only checked if the doorbell isn't pending. Reported by: lev MFC after: 1 week Sponsored by: Limelight Networks	2018-12-11 17:46:01 +00:00
Vincenzo Maffione	1d9ec3ee50	netmap.h: include stdatomic.h The stdatomic.h header exports atomic_thread_fence(), that can be used to implement the nm_stst_barrier() macro needed by netmap. MFC after: 3 days	2018-12-05 15:38:52 +00:00
Vincenzo Maffione	b6e66be22b	netmap: align codebase to the current upstream (760279cfb2730a585) Changelist: - Replace netmap passthrough host support with a more general mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop. No kernel threads are used to use this feature: the application is required to spawn a thread (or a process) and issue a SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The kernel loop is executed by the ioctl implementation, which returns to userspace only when a different thread calls SYNC_KLOOP_STOP or the netmap file descriptor is closed. - Update the if_ptnet driver to cope with the new data structures, and prune all the obsolete ptnetmap code. - Add support for "null" netmap ports, useful to allocate netmap_if, netmap_ring and netmap buffers to be used by specialized applications (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect. - Various fixes and code refactoring. Sponsored by: Sunny Valley Networks Differential Revision: https://reviews.freebsd.org/D18015	2018-12-05 11:57:16 +00:00
Eric van Gyzen	9dae3b521e	altq: manual cleanup after r341507 Remove a file that became practically empty. Fix indentation. Like r341507, I do not plan to MFC, but anyone else can.	2018-12-04 23:53:42 +00:00
Eric van Gyzen	325fab802e	altq: remove ALTQ3_COMPAT code This code has apparently never compiled on FreeBSD since its introduction in 2004 (r130365). It has certainly not compiled since 2006, when r164033 added #elsif [sic] preprocessor directives. The code was left in the tree to reduce the diff from upstream (KAME). Since that upstream is no longer relevant, remove the long-dead code. This commit is the direct result of: unifdef -m -UALTQ3_COMPAT sys/net/altq/* A later commit will do some manual cleanup. I do not plan to MFC this. If that would help you, go for it.	2018-12-04 23:46:43 +00:00
Andrey V. Elsukov	851073551d	Adapt the fix in r341008 to correctly work with EBR. IFNET_RLOCK_NOSLEEP() is epoch_enter_preempt() in FreeBSD 12+. Holding it in sysctl_rtsock() doesn't protect us from ifnet unlinking, because unlinking occurs with IFNET_WLOCK(), that is rw_wlock+sx_xlock, and it doesn check that concurrent code is running in epoch section. But while we are in epoch section, we should be able to do access to ifnet's fields, even it was unlinked. Thus do not change if_addr and if_hw_addr fields in ifnet_detach_internal() to NULL, since rtsock code can do access to these fields and this is allowed while it is running in epoch section. This should fix the race, when ifnet_detach_internal() unlinks ifnet after we checked it for IFF_DYING in sysctl_dumpentry. Move free(ifp->if_hw_addr) into ifnet_free_internal(). Also remove the NULL check for ifp->if_description, since free(9) can correctly handle NULL pointer. MFC after: 1 week	2018-11-30 10:36:14 +00:00
Andrew Gallatin	fbec776de0	Use busdma unconditionally in iflib - Remove the complex mechanism to choose between using busdma and raw pmap_kextract at runtime. The reduced complexity makes the code easier to read and maintain. - Fix a bug in the small packet receive path where clusters were repeatedly mapped but never unmapped. We now store the cluster's bus address and avoid re-mapping the cluster each time a small packet is received. This patch fixes bugs I've seen where ixl(4) will not even respond to ping without seeing DMAR faults. I see a small improvement (14%) on packet forwarding tests using a Haswell based Xeon E5-2697 v3. Olivier sees a small regression (-3% to -6%) with lower end hardware. Reviewed by: mmacy Not objected to by: sbruno MFC after: 8 weeks Sponsored by: Netflix, Inc Differential Revision: https://reviews.freebsd.org/D17901	2018-11-27 20:01:05 +00:00
Andrey V. Elsukov	a716ad4a35	Fix possible panic during ifnet detach in rtsock. The panic can happen, when some application does dump of routing table using sysctl interface. To prevent this, set IFF_DYING flag in if_detach_internal() function, when ifnet under lock is removed from the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent ifnet detach during routes enumeration. In case, if some interface was detached in the time before we take the lock, add the check, that ifnet is not DYING. This prevents access to memory that could be freed after ifnet is unlinked. PR: 227720, 230498, 233306 Reviewed by: bz, eugen MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D18338	2018-11-27 09:04:06 +00:00
Mark Johnston	d25f8522be	Plug routing sysctl leaks. Various structures exported by sysctl_rtsock() contain padding fields which were not being zeroed. Reported by: Thomas Barabosch, Fraunhofer FKIE Reviewed by: ae MFC after: 3 days Security: kernel memory disclosure Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D18333	2018-11-26 13:42:18 +00:00
Oleg Bulyzhin	cac302483e	Unbreak kernel build with VLAN_ARRAY defined. MFC after: 1 week	2018-11-21 13:34:21 +00:00
Andrey V. Elsukov	ad43bf348b	Allow configuration of several ipsec interfaces with the same tunnel endpoints. This can be used to configure several IPsec tunnels between two hosts with different security associations. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC	2018-11-16 14:21:57 +00:00
Mike Karels	456e896d6d	Fix flags collision causing inability to enable CBQ in ALTQ The CBQ BORROW flag conflicts with the RMCF_CODEL flag; the two sets of definitions actually define the same things. The symptom is that a kernel with CBQ support and not CODEL fails to load a QoS policy with the obscure error "pfctl: DIOCADDALTQ: Cannot allocate memory." If ALTQ_DEBUG is enabled, the error becomes a little clearer: "rmc_newclass: CODEL not configured for CBQ!" is printed by the kernel. There really shouldn't be two sets of macros that have to be defined consistently, but the include structure isn't right for exporting CBQ flags to altq_rmclass.h. Re-align the definitions, and add CTASSERTs in the kernel to ensure that the definitions are consistent. PR: 215716 Reviewed by: pkelsey MFC after: 2 weeks Sponsored by: Forcepoint LLC Differential Revision: https://reviews.freebsd.org/D17758	2018-11-16 03:42:29 +00:00
Stephen Hurd	0efb1a464f	Clear RX completion queue state veriables in iflib_stop() iflib_stop() was not resetting the rxq completion queue state variables. This meant that for any driver that has receive completion queues, after a reinit, iflib would start asking what's available on the rx side starting at whatever the completion queue index was prior to the stop, instead of at 0. Submitted by: pkelsey Reported by: pkelsey MFC after: 3 days Sponsored by: Limelight Networks	2018-11-14 20:36:18 +00:00
Stephen Hurd	8d4ceb9cc4	Prevent POLA violation with TSO/CSUM offload Ensure that any time CSUM_IP_TSO or CSUM_IP6_TSO is set that the corresponding CSUM_IP6?_TCP / CSUM_IP flags are also set. Rather than requireing drivers to bake-in an understanding that TSO implies checksum offloads, make it explicit. This change requires us to move the IFLIB_NEED_ZERO_CSUM implementation to ensure it's zeroed for TSO. Reported by: Jacob Keller <jacob.e.keller@intel.com> MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D17801	2018-11-14 15:23:39 +00:00
Stephen Hurd	4d261ce278	Fix leaks caused by ifc_nhwtxqs never being initialized r333502 removed initialization of ifc_nhwtxqs, and it's not clear there's a need to copy it into the struct iflib_ctx at all. Use ctx->ifc_sctx->isc_ntxqs instead. Further, iflib_stop() did not clear the last ring in the case where isc_nfl != isc_nrxqs (such as when IFLIB_HAS_RXCQ is set). Use ctx->ifc_sctx->isc_nrxqs here instead of isc_nfl. Reported by: pkelsey Reviewed by: pkelsey MFC after: 3 days Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D17979	2018-11-14 15:16:45 +00:00
Gleb Smirnoff	b79aa45e0e	For compatibility KPI functions like if_addr_rlock() that used to have mutexes but now are converted to epoch(9) use thread-private epoch_tracker. Embedding tracker into ifnet(9) or ifnet derived structures creates a non reentrable function, that will fail miserably if called simultaneously from two different contexts. A thread private tracker will provide a single tracker that would allow to call these functions safely. It doesn't allow nested call, but this is not expected from compatibility KPIs. Reviewed by: markj	2018-11-13 22:58:38 +00:00
Stephen Hurd	a42546df88	Fix rxcsum issue introduced in r338838 r338838 attempted to fix issues with rxcsum and rxcsum6. However, the rxcsum bits were set as though if_setcapenablebit() was being called, not if_togglecapenable() which is in use. As a result, it was not possible to disable rxcsum when rxcsum6 was supported. PR: 233004 Reported by: lev Reviewed by: lev MFC after: 3 days Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D17881	2018-11-07 19:31:48 +00:00
Kristof Provost	fbbf436d56	pfsync: Handle syncdev going away If the syncdev is removed we no longer need to clean up the multicast entry we've got set up for that device. Pass the ifnet detach event through pf to pfsync, and remove our multicast handle, and mark us as no longer having a syncdev. Note that this callback is always installed, even if the pfsync interface is disabled (and thus it's not a per-vnet callback pointer). MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17502	2018-11-02 16:57:23 +00:00
Kristof Provost	25c6ab1b78	Notify that the ifnet will go away, even on vnet shutdown pf subscribes to ifnet_departure_event events, so it can clean up the ifg_pf_kif and if_pf_kif pointers in the ifnet. During vnet shutdown interfaces could go away without sending the event, so pf ends up cleaning these up as part of its shutdown sequence, which happens after the ifnet has already been freed. Send the ifnet_departure_event during vnet shutdown, allowing pf to clean up correctly. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17500	2018-11-02 16:50:17 +00:00
Kristof Provost	5f6cf24e2d	pfsync: Make pfsync callbacks per-vnet The callbacks are installed and removed depending on the state of the pfsync device, which is per-vnet. The callbacks must also be per-vnet. MFC after: 2 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D17499	2018-11-02 16:47:07 +00:00
Kristof Provost	790194cd47	pf: Limit the fragment entry queue length to 64 per bucket. So we have a global limit of 1024 fragments, but it is fine grained to the region of the packet. Smaller packets may have less fragments. This costs another 16 bytes of memory per reassembly and devides the worst case for searching by 8. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17734	2018-11-02 15:32:04 +00:00
Kristof Provost	fd2ea405e6	pf: Split the fragment reassembly queue into smaller parts Remember 16 entry points based on the fragment offset. Instead of a worst case of 8196 list traversals we now check a maximum of 512 list entries or 16 array elements. Obtained from: OpenBSD Differential Revision: https://reviews.freebsd.org/D17733	2018-11-02 15:26:51 +00:00
Bjoern A. Zeeb	c955c6cd08	With more excessive use of modules, more kernel parts working with VIMAGE, and feature richness and global state increasing the 8k of vnet module space are no longer sufficient for people and loading multiple modules, e.g., pf(4) and ipl(4) or ipsec(4) will fail on the second module. Increase the module space to 8 * PAGE_SIZE which should be enough to hold multiple firewalls, ipsec, multicast (as in the old days was a problem), epair, carp, and any kind of other vnet enabled modules. Sadly this is a global byte array part of the vnet_set, so we cannot dynamically change its size; otherwise a TUNABLE would have been a better solution. PR: 228854 Reported by: Ernie Luzar, Marek Zarychta Discussed with: rgrimes on current MFC after: 3 days	2018-10-30 20:45:15 +00:00

1 2 3 4 5 ...

4077 Commits