freebsd-nq

Author	SHA1	Message	Date
Bjoern A. Zeeb	334fc5822b	vnet: virtualise more network stack sysctls. Virtualise tcp_always_keepalive, TCP and UDP log_in_vain. All three are set in the netoptions startup script, which we would love to run for VNETs as well [1]. While virtualising the log_in_vain sysctls seems pointles at first for as long as the kernel message buffer is not virtualised, it at least allows an administrator to debug the base system or an individual jail if needed without turning the logging on for all jails running on a system. PR: 243193 [1] MFC after: 2 weeks	2020-01-08 23:30:26 +00:00
John Baldwin	93dafad57a	Expand net epoch in the cxgbe TOE driver to satisfy assertions. Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22483	2019-12-13 23:33:54 +00:00
Michael Tuexen	fa49a96419	In order for the TCP Handshake to support ECN++, and further ECN-related improvements, the ECN bits need to be exposed to the TCP SYNcache. This change is a minimal modification to the function headers, without any functional change intended. Submitted by: Richard Scheffenegger Reviewed by: rgrimes@, rrs@, tuexen@ Differential Revision: https://reviews.freebsd.org/D22436	2019-12-01 18:05:02 +00:00
John Baldwin	bddf73433e	NIC KTLS for Chelsio T6 adapters. This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters. Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE connections. NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment the encrypted TLS frames output by the crypto engine. Instead, the TOE is placed into a special setup to permit "dummy" connections to be associated with regular sockets using KTLS. This permits using the TOE to segment the encrypted TLS records. However, this approach does have some limitations: 1) Regular TOE sockets cannot be used when the TOE is in this special mode. One can use either TOE and TOE-based KTLS or NIC KTLS, but not both at the same time. 2) In NIC KTLS mode, the TOE is only able to accept a per-connection timestamp offset that varies in the upper 4 bits. Put another way, only connections whose timestamp offset has the 28 lower bits cleared can use NIC KTLS and generate correct timestamps. The driver will refuse to enable NIC KTLS on connections with a timestamp offset with any of the lower 28 bits set. To use NIC KTLS, users can either disable TCP timestamps by setting the net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the tcp_new_ts_offset() function to clear the lower 28 bits of the generated offset. 3) Because the TCP segmentation relies on fields mirrored in a TCB in the TOE, not all fields in a TCP packet can be sent in the TCP segments generated from a TLS record. Specifically, for packets containing TCP options other than timestamps, the driver will inject an "empty" TCP packet holding the requested options (e.g. a SACK scoreboard) along with the segments from the TLS record. These empty TCP packets are counted by the dev.cc.N.txq.M.kern_tls_options sysctls. Unlike TOE TLS which is able to buffer encrypted TLS records in on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS records for retransmit requests as well as non-retransmit requests that do not include the start of a TLS record but do include the trailer. The T6 NIC KTLS code tries to optimize some of the cases for requests to transmit partial TLS records. In particular it attempts to minimize sending "waste" bytes that have to be given as input to the crypto engine but are not needed on the wire to satisfy mbufs sent from the TCP stack down to the driver. TCP packets for TLS requests are broken down into the following classes (with associated counters): - Mbufs that send an entire TLS record in full do not have any waste bytes (dev.cc.N.txq.M.kern_tls_full). - Mbufs that send a short TLS record that ends before the end of the trailer (dev.cc.N.txq.M.kern_tls_short). For sockets using AES-CBC, the encryption must always start at the beginning, so if the mbuf starts at an offset into the TLS record, the offset bytes will be "waste" bytes. For sockets using AES-GCM, the encryption can start at the 16 byte block before the starting offset capping the waste at 15 bytes. - Mbufs that send a partial TLS record that has a non-zero starting offset but ends at the end of the trailer (dev.cc.N.txq.M.kern_tls_partial). In order to compute the authentication hash stored in the trailer, the entire TLS record must be sent as input to the crypto engine, so the bytes before the offset are always "waste" bytes. In addition, other per-txq sysctls are provided: - dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq using AES-CBC. - dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq using AES-GCM. - dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to compensate for the TOE engine not being able to set FIN on the last segment of a TLS record if the TLS record mbuf had FIN set. - dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this txq including full, short, and partial records. - dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header and payload) sent for TLS record requests. - dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS record requests. To enable NIC KTLS with T6, set the following tunables prior to loading the cxgbe(4) driver: hw.cxgbe.config_file=kern_tls hw.cxgbe.kern_tls=1 Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21962	2019-11-21 19:30:31 +00:00
John Baldwin	a1b2b6e184	Create a file to hold shared routines for dealing with T6 key contexts. ccr(4) and TLS support in cxgbe(4) construct key contexts used by the crypto engine in the T6. This consolidates some duplicated code for helper functions used to build key contexts. Reviewed by: np MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22156	2019-11-13 00:53:45 +00:00
Gleb Smirnoff	1a49612526	Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER(). Remove few outdated comments and extraneous assertions. No functional change here.	2019-11-07 00:08:34 +00:00
John Baldwin	866a7f286f	Always allocate the atid table during attach. Previously the table was allocated on first use by TOE and the ratelimit code. The forthcoming NIC KTLS code also uses this table. Allocate it unconditionally during attach to simplify consumers. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D22028	2019-10-22 20:01:47 +00:00
John Baldwin	aeb63511bd	Remove an unused parameter from get_new_keyid().	2019-10-14 18:02:56 +00:00
John Baldwin	4f13842f75	Add support for KTLS in the Chelsio TOE module. This adds a TOE hook to allocate a KTLS session. It also recognizes TLS mbufs in the socket buffer and sends those to the NIC using a TLS work request to encrypt the record before segmenting it. TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in addition to enabling KTLS. Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891	2019-10-08 21:40:42 +00:00
Mark Johnston	fee2a2fa39	Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486	2019-09-09 21:32:42 +00:00
Navdeep Parhar	5fc7854e69	cxgbe/t4_tom: Use the correct value of sndbuf in AIO Tx. This should have been part of r351540. Sponsored by: Chelsio Communications	2019-08-28 23:31:58 +00:00
Navdeep Parhar	c537e887ac	cxgbe/t4_tom: Initialize all TOE connection parameters in one place. Remove now-redundant items from toepcb and synq_entry and the code to support them. Let the driver calculate tx_align, rx_coalesce, and sndbuf by default. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21387	2019-08-27 04:19:40 +00:00
Navdeep Parhar	241c83909c	cxgbe/t4_tom: Limit work requests with immediate payload to a single descriptor. The per-tid tx credits are in demand during active Tx and it's best not to use too many just for payload. Sponsored by: Chelsio Communications	2019-08-27 01:16:02 +00:00
Navdeep Parhar	c5560a884d	cxgbe/t4_tom: Any invalid scaling factor in the hardware's wsf field implies that window scaling is not in use. MFC after: 3 days Sponsored by: Chelsio Communications	2019-08-23 22:41:16 +00:00
Navdeep Parhar	4e4469cf3c	whitespace nit.	2019-08-23 22:34:14 +00:00
Li-Wen Hsu	57f0337a57	Fix gcc build for cxgbe(4) Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20879	2019-07-08 19:59:15 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
John Baldwin	7b17c92129	Use unmapped (M_NOMAP) mbufs for zero-copy AIO writes via TOE. Previously the TOE code used its own custom unmapped mbufs via EXT_FLAG_VENDOR1. The old version always wired the entire AIO request buffer first for the duration of the AIO operation and constructed multiple mbufs which used the wired buffer as an external buffer. The new version determines how much room is available in the socket buffer and only wires the pages needed for the available room building chains of M_NOMAP mbufs. This means that a large AIO write will now limit the amount of wired memory it uses to the size of the socket buffer. Reviewed by: gallatin, np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20839	2019-07-03 16:06:11 +00:00
John Baldwin	d76bbe175a	Add support for IFCAP_NOMAP to cxgbe(4). Since cxgbe(4) uses sglist instead of bus_dma, this required updates to the code that generates scatter/gather lists for packets. Also, unmapped mbufs are always sent via DMA and never as immediate data in the payload of a work request. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: np Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:52:21 +00:00
Navdeep Parhar	8674e626c6	cxgbe/t4_tom: Tweaks to some of the AIO related CTRs. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications	2019-06-28 19:57:42 +00:00
Navdeep Parhar	74a155edb0	cxgbe/t4_tom: the AIO tx job queue must be empty by the time the driver releases the offload resources associated with the tid. Reviewed by: jhb@ MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20798	2019-06-28 19:27:45 +00:00
Navdeep Parhar	d49be2a696	cxgbe/t4_tom: Mark the socket's receive as done before calling handle_ddp_close. This eliminates a bad race where an aio_ddp_requeue that happened to run after handle_ddp_close could bump up the active count. Discussed with: jhb@ MFC after: 3 days Sponsored by: Chelsio Communications	2019-06-28 04:02:56 +00:00
Navdeep Parhar	b7acf27c2e	cxgbe/t4_tom: Fix regression in t_maxseg usage within t4_tom. t_maxseg was changed in r293284 to not have any adjustment for TCP timestamps. t4_tom inadvertently went back to pre-r293284 semantics in r332506. Sponsored by: Chelsio Communications	2019-06-28 02:41:17 +00:00
John Baldwin	7f63b888c7	Hold an explicit reference on the socket for the aiotx task. Previously, the aiotx task relied on the aio jobs in the queue to hold a reference on the socket. However, when the last job is completed, there is nothing left to hold a reference to the socket buffer lock used to check if the queue is empty. In addition, if the last job on the queue is cancelled, the task can run with no queued jobs holding a reference to the socket buffer lock the task uses to notice the queue is empty. Fix these races by holding an explicit reference on the socket when the task is queued and dropping that reference when the task completes. Reviewed by: np MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20539	2019-06-27 19:36:30 +00:00
Navdeep Parhar	17795d8234	cxgbe/t4_tom: DDP_DEAD is a ddp flag and not a toepcb flag. The driver was in effect setting TPF_ABORT_SHUTDOWN on the toepcb instead of what was intended. MFC after: 1 week Sponsored by: Chelsio Communications	2019-06-20 20:06:19 +00:00
John Baldwin	5f37b74d5d	Fix debug trace after removal of pdu_overhead. MFC after: 1 week Sponsored by: Chelsio Communications	2019-06-07 21:30:11 +00:00
Navdeep Parhar	ebb8639822	cxgbe/t4_tom: adjust the hardware receive window to match changes to the receive sockbuf's high water mark. Calculate rx credits on the spot instead of tracking sbused/sb_cc and rx_credits in the toepcb. The previous method worked when the high water mark changed due to SB_AUTOSIZE but not when it was adjusted directly (for example, by the soreserve in nfsrvd_addsock). This fixes a connection hang while running iozone over an NFS mounted share where nfsd's TCP sockets are being handled by t4_tom. MFC after: 3 days Sponsored by: Chelsio Communications	2019-06-01 03:03:48 +00:00
Navdeep Parhar	35c0026f42	cxgbe/t4_tom: Do not attempt to look up entries in the TCB history if it hasn't been initialized. This fixes a bug in r346570 that could cause a panic when servicing TCP_INFO for offloaded connections. MFC after: 3 days Sponsored by: Chelsio Communications	2019-05-30 17:27:40 +00:00
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Navdeep Parhar	61e02298ce	cxgbe/t4_tom: Add a "TCB history" feature that samples hardware state for a tid and maintains a running history of some interesting events. Service TCP_INFO queries from the history when the tid is being tracked there.	2019-04-22 17:48:10 +00:00
Navdeep Parhar	be09e82abb	cxgbe/t4_tom: Catch up with r344433, which removed tcb_autorcvbuf_inc. The declaration in tcp_var.h is still around so t4_tom continued to compile but wouldn't load. A separate commit will fix tcp_var.h Reported By: Dustin Marquess (dmarquess at gmail) Sponsored by: Chelsio Communications	2019-03-29 16:43:24 +00:00
Navdeep Parhar	edb518f44d	cxgbe(4): Treat the viid as an opaque identifier. Recent firmwares prefer to use a different format for viid internally and this change allows them to do so. MFC after: 1 week Sponsored by: Chelsio Communications	2019-03-20 17:27:11 +00:00
Navdeep Parhar	6c5c0137a9	Remove unused macros from t4_tom.h.	2018-12-21 20:46:45 +00:00
Navdeep Parhar	b156a400a6	cxgbe/t4_tom: fixes for issues on the passive open side. - Fix PR 227760 by getting the TOE to respond to the SYN after the call to toe_syncache_add, not during it. The kernel syncache code calls syncache_respond just before syncache_insert. If the ACK to the syncache_respond is processed in another thread it may run before the syncache_insert and won't find the entry. Note that this affects only t4_tom because it's the only driver trying to insert and expand syncache entries from different threads. - Do not leak resources if an embryonic connection terminates at SYN_RCVD because of L2 lookup failures. - Retire lctx->synq and associated code because there is never a need to walk the list of embryonic connections associated with a listener. The per-tid state is still called a synq entry in the driver even though the synq itself is now gone. PR: 227760 MFC after: 2 weeks Sponsored by: Chelsio Communications	2018-12-19 01:37:00 +00:00
John Baldwin	78afed1396	Move CLIP table handling out of TOM and into the base driver. - Store the clip table in 'struct adapter' instead of in the TOM softc. - Init the clip table during attach and teardown during detach. - While here, add a dev.<nexus>.<unit>.misc.clip sysctl to dump the CLIP table. This does mean that we update the clip table even if TOE is not enabled, but non-TOE things need the CLIP table anyway. Reviewed by: np, Krishnamraju Eraparaju @ Chelsio Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D18010	2018-11-29 01:15:53 +00:00
John Baldwin	d09389fd05	Consolidate on a single set of constants for SCMD fields. Both ccr(4) and the TOE TLS code had separate sets of constants for fields in SCMD messages. Sponsored by: Chelsio Communications	2018-11-16 19:08:52 +00:00
John Baldwin	f0aefccb70	Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM. vmem's are not just used for TLS memory in TOM and the #include actually predates the TLS code so should not have been removed when the TLS vmem moved in r340466. Pointy hat to: jhb Sponsored by: Chelsio Communications	2018-11-16 01:27:24 +00:00
John Baldwin	47c64f9e3e	Remove bogus roundup2() of the key programming work request header. The key context is always placed immediately after the work request header. The total work request length has to be rounded up by 16 however. MFC after: 1 month Sponsored by: Chelsio Communications	2018-11-15 23:31:04 +00:00
John Baldwin	bc13c69bef	Move the TLS key map into the adapter softc so non-TOE code can use it. Sponsored by: Chelsio Communications	2018-11-15 23:00:30 +00:00
John Baldwin	c15600b71a	Use sbsndptr_adv() instead of sbsndptr() for TOE TLS. For TOE TLS, we just want to advance the send pointer to skip over the record just sent to the TOE. The recently added sbsndptr_adv() is sufficient for that and is cheaper. MFC after: 1 month Sponsored by: Chelsio Communications	2018-11-15 22:47:47 +00:00
John Baldwin	fe03ca08a6	Use tcp_state_change() in the cxgbe(4) TOE module. r254889 added tcp_state_change() as a centralized place to log state changes in TCP connections for DTrace. r294869 and r296881 took advantage of this central location to manage per-state counters. However, TOE sockets were still performing some (but not all) state change updates via direct assignments to t_state. This resulted in state counters underflowing when TOE was in use. Fix by using tcp_state_change() when changing a TOE connection's state. Reviewed by: np, markj MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D17915	2018-11-09 21:16:45 +00:00
Navdeep Parhar	ea710848dc	cxgbe(4): Link related changes. - Switch to using 32b port/link capabilities in the driver. The 32b format is used internally by firmwares > 1.16.45.0 and the driver will now interact with the firmware in its native format, whether it's 16b or 32b. Note that the 16b format doesn't have room for 50G, 200G, or 400G speeds. - Add a bit in the pause_settings knobs to allow negotiated PAUSE settings to override manual settings. - Ensure that manual link settings persist across an administrative down/up as well as transceiver unplug/replug. - Remove unused is_*G_port() functions. Approved by: re@ (gjb@) MFC after: 1 month Sponsored by: Chelsio Communications	2018-09-25 05:52:42 +00:00
Navdeep Parhar	d6ddb0848c	cxgbe/tom: Unregister shared CPL handlers on module unload. This fixes a panic with INVARIANTS that occurs when t4_tom is unloaded and reloaded. Approved by: re@ (kib@)	2018-08-28 18:16:02 +00:00
Navdeep Parhar	24bc8671f9	cxgbe/tom: Make sure 'matched' is always initialized before use. Reported by: Coverity (CID 1390894) MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-21 22:19:34 +00:00
Navdeep Parhar	7576fe761e	cxgbe/tom: Provide the hardware tid in tcp_info. Submitted by: marius@	2018-08-20 21:40:14 +00:00
Navdeep Parhar	72049e7395	cxgbe/tom: Put the ifnet or VLAN's PCP value in the 802.1Q tag of frames generated by the TOE. Works with vid 0 (no VLAN, just priority) too. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-17 19:22:46 +00:00
Navdeep Parhar	9f78434942	cxgbe(4): Use VLAN_TRUNKDEV instead of private cookie to figure out the parent of a VLAN ifnet. MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-15 21:24:05 +00:00
Navdeep Parhar	408954013a	Whitespace nit in t4_tom.h	2018-08-13 19:21:28 +00:00
Navdeep Parhar	5fc0f72f3b	cxgbe(4): Add support for high priority filters on T6+. They have their own region in the TCAM starting with T6, unlike previous chips where they were in the same region as normal filters. These filters "hit" before anything else in the LE's lookup. The exact order is: a) High priority filters b) TOE's active region (TCAM and/or hash) c) Servers (TOE hw listeners) d) Normal filters MFC after: 1 week Sponsored by: Chelsio Communications	2018-08-09 14:19:47 +00:00
Navdeep Parhar	1979b51141	cxgbe(4): Allow user-configured and driver-configured traffic classes to be used simultaneously. Move sysctl_tc and sysctl_tc_params to t4_sched.c while here. MFC after: 3 weeks Sponsored by: Chelsio Communications	2018-08-06 23:21:13 +00:00

1 2 3 4 5

202 Commits