freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	6dcec895d9	vlan_config() isn't always called in epoch context. Reported by: kp	2019-10-13 15:15:09 +00:00
Gleb Smirnoff	45c1d51c39	Don't use if_maddr_rlock() in sppp(4), use epoch(9) directly instead.	2019-10-10 23:54:37 +00:00
Gleb Smirnoff	73c96bbeac	Don't use if_maddr_rlock() in tuntap(4), use epoch(9) directly instead.	2019-10-10 23:51:14 +00:00
Gleb Smirnoff	4b24e5b1ef	Interface output method must be executed in network epoch, so if_addr_rlock() isn't needed here.	2019-10-10 23:50:32 +00:00
Gleb Smirnoff	fb3fc771f6	Add two extra functions that basically give count of addresses on interface. Such function could been implemented on top of the if_foreach_llm?addr(), but several drivers need counting, so avoid copy-n-paste inside the drivers.	2019-10-10 23:44:56 +00:00
Gleb Smirnoff	826857c833	Provide new KPI for network drivers to access lists of interface addresses. The KPI doesn't reveal neither how addresses are stored, how the access to them is synchronized, neither reveal struct ifaddr and struct ifmaddr. Reviewed by: gallatin, erj, hselasky, philip, stevek Differential Revision: https://reviews.freebsd.org/D21943	2019-10-10 23:42:55 +00:00
Gleb Smirnoff	caeeeaa7c5	ifnet_byindex_ref() requires network epoch.	2019-10-09 16:21:50 +00:00
Gleb Smirnoff	1e80e4f26c	Remove epoch assertion from if_setlladdr(). Originally this function was protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous if_setlladdr() can't execute. Later it was switched to IF_ADDR_RLOCK(), likely by a mistake. Later it was switched to NET_EPOCH_ENTER(). Then I incorrectly added NET_EPOCH_ASSERT() here. In reality ifp->if_addr never goes away and never changes its length. So, doing bcopy() in it is always "safe", meaning it won't dereference a wrong pointer or write into someone's else memory. Of course doing two bcopy() in parallel would result in a mess of two addresses, but net epoch doesn't protect against that, neither IF_ADDR_RLOCK() did. So for now, just remove the assertion and leave for later a proper fix. Reported by: markj	2019-10-08 17:55:45 +00:00
Gleb Smirnoff	e9dc46cc30	In DIAGNOSTIC block of if_delmulti_ifma_flags() enter the network epoch. This quickly plugs the regression from r353292. The locking of multicast definitely needs a broader review today... Reported by: pho, dhw	2019-10-08 16:45:56 +00:00
Hans Petter Selasky	a362cf527e	Fix regression issue after r353274: Make sure the vnet_shutdown field is not set until after all VNET_SYSUNINIT()'s in the SI_SUB_VNET_DONE subsystem have been executed. Especially the vnet_if_return() functions requires that if_move() is still operational. Reported by: lwhsu@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-08 11:06:24 +00:00
Gleb Smirnoff	b8a6e03fac	Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111	2019-10-07 22:40:05 +00:00
Hans Petter Selasky	4715738b12	Compile time assert a valid subsystem for all VNET init and uninit functions. Using VNET init and uninit functions outside the given range has undefined behaviour. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-07 14:24:59 +00:00
Hans Petter Selasky	204e2f30d9	Factor out VNET shutdown check into an own vnet structure field. Remove the now obsolete vnet_state field. This greatly simplifies the detection of VNET shutdown and avoids code duplication. Discussed with: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-10-07 14:15:41 +00:00
Kyle Evans	291287667c	tuntap(4): loosen up tunclose restrictions Realistically, this cannot work. We don't allow the tun to be opened twice, so it must be done via fd passing, fork, dup, some mechanism like these. Applications demonstrably do not enforce strict ordering when they're handing off tun devices, so the parent closing before the child will easily leave the tun/tap device in a bad state where it can't be destroyed and a confused user because they did nothing wrong. Concede that we can't leave the tun/tap device in this kind of state because of software not playing the TUNSIFPID game, but it is still good to find and fix this kind of thing to keep ifconfig(8) up-to-date and help ensure good discipline in tun handling. MFC after: 3 days	2019-10-04 13:43:07 +00:00
Kyle Evans	59997c3c46	if_tuntap: create /dev aliases when a tuntap device gets renamed Currently, if you do: $ ifconfig tun0 create $ ifconfig tun0 name wg0 $ ls -l /dev \| egrep 'wg\|tun' You will see tun0, but no wg0. In fact, it's slightly more annoying to make the association between the new name and the old name in order to open the device (if it hadn't been opened during the rename). Register an eventhandler for ifnet_arrival_events and catch interface renames. We can determine if the ifnet is a tun easily enough from the if_dname, which matches the cevsw.d_name from the associated tuntap_driver. Some locking dance is required because renames don't require the device to be opened, so it could go away in the middle of handling the ioctl, but as soon as we've verified this isn't the case we can attempt to busy the tun and either bail out if the tun device is dying, or we can proceed with the rename. We only create these aliases on a best-effort basis. Renaming a tun device to "usbctl", which doesn't exist as an ifnet but does as a /dev, is clearly not that disastrous, but we can't and won't create a /dev for that.	2019-10-03 17:54:00 +00:00
Kyle Evans	c4cad1549e	if_tuntap: add a busy/unbusy mechanism, replace destroy OPEN check A future commit will create device aliases when a tuntap device is renamed so that it's still easily found in /dev after the rename. Said mechanism will want to keep the tun alive long enough to either realize that it's about to go away or complete the alias creation, even if the alias is about to get destroyed. While we're introducing it, using it to prevent open devices from going away makes plenty of sense and keeps the logic on waking up tun_destroy clean, so we don't have multiple places trying to cv_broadcast unless it's still in use elsewhere.	2019-10-03 17:46:27 +00:00
Mark Johnston	4166913371	Add IFLIB_SINGLE_IRQ_RX_ONLY. As of r347221 the iflib legacy interrupt mode setup assumes that drivers perform both receive and transmit processing from the interrupt handler. This assumption is invalid in the vmxnet3 driver, so introduce the IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the interrupt handler. PR: 239118 Reported and tested by: Juraj Lutter <otis@sk.freebsd.org> Obtained from: marius Reviewed by: gallatin MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D21831	2019-09-30 15:59:07 +00:00
Andrew Gallatin	6554362c66	kTLS support for TLS 1.3 TLS 1.3 requires a few changes because 1.3 pretends to be 1.2 with a record type of application data. The "real" record type is then included at the end of the user-supplied plaintext data. This required adding a field to the mbuf_ext_pgs struct to save the record type, and passing the real record type to the sw_encrypt() ktls backend functions. Reviewed by: jhb, hselasky Sponsored by: Netflix Differential Revision: D21801	2019-09-27 19:17:40 +00:00
Gleb Smirnoff	bf7700e44f	style(9): remove extraneous empty lines	2019-09-25 20:46:09 +00:00
Gleb Smirnoff	dd902d015a	Add debugging facility EPOCH_TRACE that checks that epochs entered are properly nested and warns about recursive entrances. Unlike with locks, there is nothing fundamentally wrong with such use, the intent of tracer is to help to review complex epoch-protected code paths, and we mean the network stack here. Reviewed by: hselasky Sponsored by: Netflix Pull Request: https://reviews.freebsd.org/D21610	2019-09-25 18:26:31 +00:00
Eric Joyner	53b5b9b049	iflib: Remove redundant VLAN events deregistration From Piotr: r351152 introduced iflib_deregister() function calling EVENTHANDLER_DEREGISTER() to unregister VLAN events. This patch removes duplicate of EVENTHANDLER_DEREGISTER() calls placed in iflib_device_deregister() as this function is now calling iflib_deregister(). This is to avoid deregistering same event twice. This patch also adds check in iflib_vlan_register() to prevent registering VLAN while being in detach. Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>, erj <erj@FreeBSD.org> and Jacob Keller <jacob.e.keller@intel.com>. Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Reviewed by: gallatin@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21711	2019-09-24 17:03:31 +00:00
Konstantin Belousov	247cf5664e	Add SIOCGIFDOWNREASON. The ioctl(2) is intended to provide more details about the cause of the down for the link. Eventually we might define a comprehensive list of codes for the situations. But interface also allows the driver to provide free-form null-terminated ASCII string to provide arbitrary non-formalized information. Sample implementation exists for mlx5(4), where the string is fetched from firmware controlling the port. Reviewed by: hselasky, rrs Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21527	2019-09-17 18:49:13 +00:00
Kyle Evans	40b1c921bd	SIOCSIFNAME: Do nothing if we're not actually changing Instead of throwing EEXIST, just succeed if the name isn't actually changing. We don't need to trigger departure or any of that because there's no change from consumers' perspective. PR: 240539 Reviewed by: brooks MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D21618	2019-09-12 15:36:48 +00:00
Hans Petter Selasky	180fecd5b6	Callout drain does not have to be followed by a callout stop call. Fix bogus code. MFC after: 1 week Sponsored by: Mellanox Technologies	2019-09-10 14:33:07 +00:00
Li-Wen Hsu	4835262b68	Fix build for the platforms where db_expr_t is not long Sponsored by: The FreeBSD Foundation	2019-09-10 08:51:11 +00:00
Conrad Meyer	0b12ab8111	Appease Clang false-positive Werrors in r352112 Reported by: bcran	2019-09-10 01:56:47 +00:00
Conrad Meyer	8b6acd2b51	ddb(4): Add 'show route <dest>' and 'show routetable [<af>]' These commands show the route resolved for a specified destination, or print out the entire routing table for a given address family (or all families, if none is explicitly provided). Discussed with: emaste Differential Revision: https://reviews.freebsd.org/D21510	2019-09-09 22:54:27 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Vincenzo Maffione	253b2ec199	netmap: import changes from upstream (SHA 137f537eae513) - Rework option processing. - Use larger integers for memory size values in the memory management code. MFC after: 2 weeks	2019-09-01 14:47:41 +00:00
Matt Joras	16cf6bdbb6	Wrap a vlan's parent's if_output in a separate function. When a vlan interface is created, its if_output is set directly to the parent interface's if_output. This is fine in the normal case but has an unfortunate consequence if you end up with a certain combination of vlan and lagg interfaces. Consider you have a lagg interface with a single laggport member. When an interface is added to a lagg its if_output is set to lagg_port_output, which blackholes traffic from the normal networking stack but not certain frames from BPF (pseudo_AF_HDRCMPLT). If you now create a vlan with the laggport member (not the lagg interface) as its parent, its if_output is set to lagg_port_output as well. While this is confusing conceptually and likely represents a misconfigured system, it is not itself a problem. The problem arises when you then remove the lagg interface. Doing this resets the if_output of the laggport member back to its original state, but the vlan's if_output is left pointing to lagg_port_output. This gives rise to the possibility that the system will panic when e.g. bpf is used to send any frames on the vlan interface. Fix this by creating a new function, vlan_output, which simply wraps the parent's current if_output. That way when the parent's if_output is restored there is no stale usage of lagg_port_output. Reviewed by: rstone Differential Revision: D21209	2019-08-30 20:19:43 +00:00
John Baldwin	b2e60773c6	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277	2019-08-27 00:01:56 +00:00
Kyle Evans	5c4eed8601	tuntap: belatedly add MODULE_VERSION for if_tun and if_tap When tun/tap were merged, appropriate MODULE_VERSION should have been added for things like modfind(2) to continue to do the right thing with the old names. Reported by: jhb	2019-08-19 19:01:59 +00:00
Vincenzo Maffione	b5b83671ea	if_tuntap: minor improvements Rewrite a loop to avoid duplicating the exit condition. Simplify mask processing in tunpoll(). Fix minor typos. Reviewed by: kevans, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D21302	2019-08-19 17:23:22 +00:00
Eric Joyner	f4aa9b67eb	net: Update SFF-8024 definitions and strings with values from rev 4.6 This will let ifconfig -v's SFF eeprom read functionality recognize more module types. Signed-off-by: Eric Joyner <erj@freebsd.org> Reviewed by: gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21041	2019-08-17 00:10:56 +00:00
Eric Joyner	566144142e	iflib: add iflib_deregister to help cleanup on exit Commit message by Jake: The iflib_register function exists to allocate and setup some common structures used by both iflib_device_register and iflib_pseudo_register. There is no associated cleanup function used to undo the steps taken in this function. Both iflib_device_deregister and iflib_pseudo_deregister have some of the necessary steps scattered in their flow. However, most of the necessary cleanup is not done during the error path of iflib_device_register and iflib_pseudo_register. Some examples of missed cleanup include: the ifp pointer is not free'd during error cleanup the STATE and CTX locks are not destroyed during error cleanup the vlan event handlers are not removed during error cleanup media added to the ifmedia structure is not removed the kobject reference is never deleted Additionally, when initializing the kobject class reference counter is increased even though kobj_init already increases it. This results in the class never being free'd again because the reference count would never hit zero even after all driver instances are unloaded. To aid in proper cleanup, implement an iflib_deregister function that goes through the reverse steps taken by iflib_register. Call this function during the error cleanup for iflib_device_register and iflib_pseudo_register. Additionally call the function in the iflib_device_deregister and iflib_pseudo_deregister functions near the end of their flow. This helps reduce code duplication and ensures that proper steps are taken to cleanup allocations and references in both the regular and error cleanup flows. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: shurd@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21005	2019-08-16 23:33:44 +00:00
George V. Neville-Neil	d8dc4e350f	Properly validte arguments for route deletion Reported by: Liang Zhuo brightiup.zhuo@gmail.com MFC after: 1 week	2019-08-03 14:42:07 +00:00
Eric Joyner	197c679824	iflib: Prevent kernel panic caused by loading driver with a specific interrupt configuration If a device has only 1 MSI-X interrupt available and does not support either MSI or legacy interrupts, iflib_device_register() will fail, leak memory and MSI resources, and the driver will not load. Worse, if another iflib-using driver tries to unload afterwards, a kernel panic will occur because the previous failed iflib driver loead did not properly call "taskqgroup_detach()" during it's cleanup. This patch is band-aid for this situation -- don't try allocating MSI or legacy interrupts if a single MSI-X interrupt was allocated, but fail to load instead. As well, during the cleanup, properly call taskqgroup_detach() on the admin task to prevent panics when other iflib drivers unload. This whole interrupt allocation process actually needs re-doing to properly support devices with only a single MSI-X interrupt, devices that only support MSI-X, non-PCI devices, and multiple non-MSIX interrupts, as well. Signed-off-by: Eric Joyner <erj@freebsd.org> Reviewed by: marius@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D20747	2019-08-01 17:37:25 +00:00
Eric Joyner	6a3f243b04	iflib: remove kobject class reference increment Commit message from Jake: In iflib_register, the context is initialized as a kobject using the device driver's "driver" kobject class. As part of this, the function mistakenly increments the ref counter. The ref counter is incremented twice, once in the code directly, and once again by kobj_class_compile. However, there is no associated decrement in the detach path. Because of this, the ref counter will never go back down to zero, and thus the kobject method table will never be released. Remove this unnecessary reference count increment. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: jhb@, erj@ MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21125	2019-08-01 17:28:36 +00:00
Randall Stewart	20abea6663	This adds the third step in getting BBR into the tree. BBR and an updated rack depend on having access to the new ratelimit api in this commit. Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D20953	2019-08-01 14:17:31 +00:00
Ed Maste	1082be6554	ppp: correct echo-req magic number on big endian archs The magic number is a 32-bit quantity; use uint32_t to match hton's return type and avoid sending zeros (upper 32 bits) on big-endian architectures. PR: 184141 MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-08-01 13:42:58 +00:00
Kyle Evans	0dbac71f19	if_tuntap(4): Add TUNGIFNAME This effectively just moves TAPGIFNAME into common ioctl territory. MFC after: 3 days	2019-07-25 22:23:34 +00:00
Eric Joyner	7f3f6aad3e	iflib: fix dangling device softc pointer Commit text by Jake: If a driver's IFDI_ATTACH_PRE function fails, the iflib_device_register function will free the ctx pointer. However, it does not reset the device softc pointer to NULL. This will result in memory corruption as a future access to the now invalid pointer will corrupt memory that is later allocated on top of the same memory location. The iflib_device_deregister function correctly resets the softc pointer by using device_set_softc(). This clears up the invalid dangling pointer and prevents memory corruption that could lead to a panic or undefined behavior if the device's driver failed to attach. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Submitted by: Jacob Keller <jacob.e.keller@intel.com> Reviewed by: erj@, gallatin@ MFC after: 1 week Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D21003	2019-07-24 21:43:41 +00:00
Kirill Ponomarev	b7592822d5	Allow set MTU more than 1500 bytes. Submitted by: Alexandr Fedorov <aleksandr.fedorov_itglobal_dot_com> Approved by: jhb, rgrimes Sponsored by: ITGlobal.com Differential Revision: https://reviews.freebsd.org/D19422	2019-07-24 16:10:20 +00:00
Chuck Tuffli	94c15665a5	Fix a typo in r349969 OUI_FRREBSD_NVME_HIGH should have been OUI_FREEBSD_NVME_HIGH Caught by: Gary Jennejohn	2019-07-14 03:49:48 +00:00
Chuck Tuffli	409a80e5a4	bhyve: Create EUI64 for NVMe namespaces Accept an IEEE Extended Unique Identifier (EUI-64) from the command line for each NVMe namespace. If one isn't provided, it will create one based on the CRC16 of: - the FreeBSD IEEE OUI - PCI bus, device/slot, function values - Namespace ID Reviewed by: imp, araujo, jhb, rgrimes Approved by: imp (mentor), jhb (maintainer) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19905	2019-07-13 12:48:28 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
John Baldwin	66d0c056be	Support IFCAP_NOMAP in vlan(4). Enable IFCAP_NOMAP for a vlan interface if it is supported by the underlying trunk device. Reviewed by: gallatin, hselasky, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:51:38 +00:00
John Baldwin	82334850ea	Add an external mbuf buffer type that holds multiple unmapped pages. Unmapped mbufs allow sendfile to carry multiple pages of data in a single mbuf, without mapping those pages. It is a requirement for Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web serving workloads when used by sendfile, due to effectively compressing socket buffers by an order of magnitude, and hence reducing cache misses. For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer now points to a struct mbuf_ext_pgs structure instead of a data buffer. This structure contains an array of physical addresses (this reduces cache misses compared to an earlier version that stored an array of vm_page_t pointers). It also stores additional fields needed for in-kernel TLS such as the TLS header and trailer data that are currently unused. To more easily detect these mbufs, the M_NOMAP flag is set in m_flags in addition to M_EXT. Various functions like m_copydata() have been updated to safely access packet contents (using uiomove_fromphys()), to make things like BPF safe. NIC drivers advertise support for unmapped mbufs on transmit via a new IFCAP_NOMAP capability. This capability can be toggled via the new 'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only transmit packet contents via DMA and use bus_dma, adding the capability to if_capabilities and if_capenable should be all that is required. If a NIC does not support unmapped mbufs, they are converted to a chain of mapped mbufs (using sf_bufs to provide the mapping) in ip_output or ip6_output. If an unmapped mbuf requires software checksums, it is also converted to a chain of mapped mbufs before computing the checksum. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: ae, kp (firewalls) Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:48:33 +00:00
Hans Petter Selasky	0dbdf04125	Need to wait for epoch callbacks to complete before detaching a network interface. This particularly manifests itself when an INP has multicast options attached during a network interface detach. Then the IPv4 and IPv6 leave group call which results from freeing the multicast address, may access a freed ifnet structure. These are the steps to reproduce: service mdnsd onestart # installed from ports ifconfig epair create ifconfig epair0a 0/24 up ifconfig epair0a destroy Tested by: pho @ MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-28 10:49:04 +00:00
Marius Strobl	c2c5d1e787	o In iflib_txq_drain(): - Remove desc_used, which is only ever written to. - Remove a dead store to reclaimed. - Don't recycle avail. - Sort variables according to style(9). These changes will make a subsequent commit easier to read. o In iflib_tx_credits_update(), don't bother checking whether the ift_txd_credits_update method pointer is NULL; _iflib_pre_assert() asserts upfront that this method has been assigned and functions like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}() and _task_fn_tx() were already unconditionally relying on the method being callable.	2019-06-26 15:28:21 +00:00

1 2 3 4 5 ...

4201 Commits