freebsd-skq

Author	SHA1	Message	Date
Ryan Libby	b541ba195c	cxgbe: delete now-redundant vnet decls r324539 gathered some vnet decls into netinet/tcp_var.h, so that they are now redundant in dev/cxgbe/tom/{t4_cpl_io.c,t4_ddp.c}. This triggers gcc -Wredundant-decls. Reviewed by: np Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12674	2017-10-17 20:37:31 +00:00
Gleb Smirnoff	e8fd18f306	Shorten list of arguments to mbuf external storage freeing function. All of these arguments are stored in m_ext, so there is no reason to pass them in the argument list. Not all functions need the second argument, some don't even need the first one. The second argument lives in next cache line, so not dereferencing it is a performance gain. This was discovered in sendfile(2), which will be covered by next commits. The second goal of this commit is to bring even more flexibility to m_ext mbufs, allowing to create more fields in m_ext, opaque to the generic mbuf code, and potentially set and dereferenced by subsystems. Reviewed by: gallatin, kbowling Differential Revision: https://reviews.freebsd.org/D12615	2017-10-09 20:35:31 +00:00
John Baldwin	91a65e2f6b	Avoid reusing the wrong buffer for a DDP AIO request. To optimize the case of ping-ponging between two buffers, the DDP code caches the last two buffers used keeping the pages wired and page pods stored in the NIC's RAM. If a new aio_read() request uses one of the same buffers, then the work of holding pages, etc. can be avoided. However, the starting virtual address of an aio buffer was not saved, only the page count, length, and initial page offset. Thus, an aio_read() request could match a different buffer in the address space. (Earlier during development vm_fault_hold_quick_pages() was always called and the vm_page_t values were compared, but that was eventually removed without being adequately replaced.) Fix by storing the starting virtual address and comparing that (along with other fields) to determine if a buffer can be reused. MFC after: 3 days Sponsored by: Chelsio Communications	2017-09-15 22:40:57 +00:00
Navdeep Parhar	0a3bf7fb7e	cxgbe/t4_tom: There may not be a tid to update if the connection isn't established. MFC after: 2 weeks Sponsored by: Chelsio Communications	2017-08-31 23:34:08 +00:00
Navdeep Parhar	3ef7429927	cxgbe/t4_tom: Add a knob to select the congestion control algorigthm used by the TOE hardware for fully offloaded connections. The knob affects new connections only. MFC after: 2 weeks Sponsored by: Chelsio Communications	2017-08-31 20:33:22 +00:00
Navdeep Parhar	b58fd65490	cxgbe/t4_tom: Use correct name for the ISS-valid bit in options2. MFC after: 3 days Sponsored by: Chelsio Communications	2017-08-15 19:21:27 +00:00
Navdeep Parhar	089400e72d	cxgbe/t4_tom: Log more details about the newly ESTABLISHED tid to the trace buffer. MFC after: 3 days	2017-07-19 01:49:01 +00:00
Navdeep Parhar	3a5426dd3c	cxgbe/t4_tom: Do not include space taken by the TCP timestamp option in the "effective MSS" for the connection. The chip expects it this way. Submitted by: Krishnamraju Eraparaju @ Chelsio MFC after: 3 days Sponsored by: Chelsio Communications	2017-06-27 22:05:06 +00:00
Navdeep Parhar	98d391a0d3	cxgbe/t4_tom: sbspace on listening sockets is no longer supported (as of r319722), use sol_sbrcv_hiwat instead. Sponsored by: Chelsio Communications	2017-06-27 17:43:28 +00:00
Navdeep Parhar	6790499792	cxgbe/t4_tom: Per-connection rate limiting for TCP sockets handled by the TOE. For now this capability is always enabled in kernels with options RATELIMIT. t4_tom will check if_capenable once the base driver gets code to support rate limiting for any socket (TOE or not). This was tested with iperf3 and netperf ToT as they already support SO_MAX_PACING_RATE sockopt. There is a bug in firmwares prior to 1.16.45.0 that affects the BSD driver only and results in rate-limiting at an incorrect rate. This will resolve by itself as soon as 1.16.45.0 or later firmware shows up in the driver. Relnotes: Yes Sponsored by: Chelsio Communications	2017-05-05 20:06:49 +00:00
Navdeep Parhar	eaf5669483	cxgbe/t4_tom: Fix CLIP entry refcounting on the passive side. Every IPv6 connection being handled by the TOE should have a reference on its CLIP entry. Sponsored by: Chelsio Communications	2017-02-06 17:48:25 +00:00
John Baldwin	fbb779cca9	Unregister CPL handlers for TOE-related messages when unloading TOM. MFC after: 1 week Sponsored by: Chelsio Communications	2017-01-27 23:08:30 +00:00
John Baldwin	b67915014d	Don't drop a reference to the TOE PCB in undo_offload_socket(). undo_offload_socket() is only called by t4_connect() during a connection setup failure, but t4_connect() still owns the TOE PCB and frees ita after undo_offload_socket() returns. Release a reference in undo_offload_socket() resulted in a double-free which panicked when t4_connect() performed the second free. The reference release was added to undo_offload_socket() incorrectly in r299210. MFC after: 1 week Sponsored by: Chelsio Communications	2017-01-27 23:03:28 +00:00
Navdeep Parhar	ff2c4d3f79	cxgbe/tom: Fix a case where do_pass_accept_req wasn't properly restoring the VNET. This should have been in r311949. MFC after: 2 days	2017-01-18 03:35:42 +00:00
Navdeep Parhar	a342904bb5	cxgbe/tom: Add VIMAGE support to the TOE driver. Active Open: - Save the socket's vnet at the time of the active open (t4_connect) and switch to it when processing the reply (do_act_open_rpl or do_act_establish). Passive Open: - Save the listening socket's vnet in the driver's listen_ctx and switch to it when processing incoming SYNs for the socket. - Reject SYNs that arrive on an ifnet that's not in the same vnet as the listening socket. CLIP (Compressed Local IPv6) table: - Add only those IPv6 addresses to the CLIP that are in a vnet associated with one of the card's ifnets. Misc: - Set vnet from the toepcb when processing TCP state transitions. - The kernel sets the vnet when calling the driver's output routine so t4_push_frames runs in proper vnet context already. One exception is when incoming credits trigger tx within the driver's ithread. Set the vnet explicitly in do_fw4_ack for that case. MFC after: 3 days Sponsored by: Chelsio Communications	2017-01-11 23:48:17 +00:00
Navdeep Parhar	d663d5ca23	cxgbe/t4_tom: Fix tid accounting. An offloaded IPv6 connection uses 2 tids, not 1, in the hardware. MFC after: 3 days Sponsored by: Chelsio Communications	2017-01-07 20:26:19 +00:00
Navdeep Parhar	e4612db2cd	Fix comment in t4_tom. No functional change. MFC after: 3 days	2017-01-07 00:08:55 +00:00
Eitan Adler	eb1c1a7f24	Remove a a duplicated word.	2016-09-29 13:59:14 +00:00
Navdeep Parhar	efe00fd923	cxgbe/t4_tom: Update the active/passive open code to support T6. Data path works as-is. Sponsored by: Chelsio Communications	2016-09-17 23:08:49 +00:00
Navdeep Parhar	b0c554c3a5	cxgbe/t4_tom: The SMAC entry for a VI is at a different location in the T6. Sponsored by: Chelsio Communications	2016-09-17 22:13:03 +00:00
Navdeep Parhar	5aaa3bc3b9	cxgbe/t4_tom: toepcb should be all-zero on allocation because the code that cleans up on failure assumes that non-NULL values indicate initialized items. Sponsored by: Chelsio Communications	2016-09-05 19:37:47 +00:00
Navdeep Parhar	a9feb2cdbb	cxgbe/t4_tom: Two new routines to allocate and write page pods for a buffer in the kernel's address space.	2016-09-01 00:51:59 +00:00
Navdeep Parhar	968267fdb8	cxgbe/t4_tom: Add general purpose routines to deal with page pod regions and allocations within them. Switch to these routines to manage the TOE DDP region. Sponsored by: Chelsio Communications	2016-08-31 23:23:46 +00:00
Navdeep Parhar	9217931fb4	cxgbe/t4_tom: The page pod arena allocates from pod address space and not index space. The minimum valid allocation out of this arena is the size of a single page pod. Sponsored by: Chelsio Communications	2016-08-04 17:29:42 +00:00
Navdeep Parhar	515b36c5b5	cxgbe/t4_tom: Read the chip's DDP page sizes and save them in a per-adapter data structure. This replaces a global array with hardcoded page sizes. Sponsored by: Chelsio Communications	2016-08-02 23:54:21 +00:00
John Baldwin	07159830be	Add support for zero-copy aio_write() on TOE sockets. AIO write requests for a TOE socket on a Chelsio T4+ adapter can now DMA directly from the user-supplied buffer. This is implemented by wiring the pages backing the user-supplied buffer and queueing special mbufs backed by raw VM pages to the socket buffer. The TOE code recognizes these special mbufs and builds a sglist from the VM page array associated with the mbuf when queueing a work request to the TOE. Because these mbufs do not have an associated virtual address, m_data is not valid. Thus, the AIO handler does not invoke sosend() directly for these mbufs but instead inlines portions of sosend_generic() and tcp_usr_send(). An aiotx_buffer structure is used to describe the user buffer (e.g. it holds the array of VM pages and a reference to the AIO job). The special mbufs reference this structure via m_ext. Note that a single job might be split across multiple mbufs (e.g. if it is larger than the socket buffer size). The 'ext_arg2' member of each mbuf gives an offset relative to the backing aiotx_buffer. The AIO job associated with an aiotx_buffer structure is completed when the last reference to the structure is released. Zero-copy aio_write()'s for connections associated with a given adapter can be enabled/disabled at runtime via the 'dev.t[45]nex.N.toe.tx_zcopy' sysctl. MFC after: 1 month Relnotes: yes Sponsored by: Chelsio Communications	2016-07-27 18:29:35 +00:00
Enji Cooper	092af585e1	Remove redundant declaration for tcp_dooptions, similar to r302576 netinet/tcp_var.h already defines this function Differential Revision: https://reviews.freebsd.org/D7189 MFC after: 1 week PR: 209920 Reported by: Mark Millard <markmi@dsl-only.net> Reviewed by: np Tested with: clang 3.8.0, gcc 4.2.1, gcc 5.3.0 Sponsored by: EMC / Isilon Storage Division	2016-07-11 17:11:18 +00:00
Navdeep Parhar	671bf2b8b2	cxgbe(4): Changes to the CPL-handler registration mechanism and code related to "shared" CPLs. a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single function. Allow callers to direct the response to any iq. Tidy up set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of magic constants. b) Remove all CPL handler tables from struct adapter. This reduces its size by around 2KB. All handlers are now registered at MOD_LOAD instead of attach or some kind of initialization/activation. The registration functions do not need an adapter parameter any more. c) Add per-iq handlers to deal with CPLs whose destination cannot be determined solely from the opcode. There are 2 such CPLs in use right now: SET_TCB_RPL and L2T_WRITE_RPL. The base driver continues to send filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq. t4_tom (including the DDP code) now uses the port's ctrlq to send L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq. fwq and ofld_rxq have different handlers that know what kind of tid to expect in the reply. Update t4_write_l2e and callers to to support any wrq/iq combination. Approved by: re@ (kib@) Sponsored by: Chelsio Communications	2016-07-05 01:29:24 +00:00
Navdeep Parhar	5e03372b18	cxgbe(4): Do not bring up an interface when IFCAP_TOE is enabled on it. The interface's queues are functional after VI_INIT_DONE (which is short of interface-up) and that's all that's needed for t4_tom to communicate with the chip. Approved by: re@ (gjb@) Sponsored by: Chelsio Communications	2016-06-29 06:55:30 +00:00
John Baldwin	b1012d8036	Account for AIO socket operations in thread/process resource usage. File and disk-backed I/O requests store counts of read/written disk blocks in each AIO job so that they can be charged to the thread that completes an AIO request via aio_return() or aio_waitcomplete(). This change extends AIO jobs to store counts of received/sent messages and updates socket backends to set these counts accordingly. Note that the socket backends are careful to only charge a single messages for each AIO request even though a single request on a blocking socket might invoke sosend or soreceive multiple times. This is to mimic the resource accounting of synchronous read/write. Adjust the UNIX socketpair AIO test to verify that the message resource usage counts update accordingly for aio_read and aio_write. Approved by: re (hrs) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6911	2016-06-21 22:19:06 +00:00
John Baldwin	ae0b1ccbab	Use sbused() instead of sbspace() to avoid signed issues. Inserting a full mbuf with an external cluster into the socket buffer resulted in sbspace() returning -MLEN. However, since sb_hiwat is unsigned, the -MLEN value was converted to unsigned in comparisons. As a result, the socket buffer was never autosized. Note that sb_lowat is signed to permit direct comparisons with sbspace(), but sb_hiwat is unsigned. Follow suit with what tcp_output() does and compare the value of sbused() with sb_hiwat instead. Approved by: re (gjb) Sponsored by: Chelsio Communications	2016-06-15 21:08:51 +00:00
John Baldwin	fe0bdd1d2c	Move backend-specific fields of kaiocb into a union. This reduces the size of kaiocb slightly. I've also added some generic fields that other backends can use in place of the BIO-specific fields. Change the socket and Chelsio DDP backends to use 'backend3' instead of abusing _aiocb_private.status directly. This confines the use of _aiocb_private to the AIO internals in vfs_aio.c. Reviewed by: kib (earlier version) Approved by: re (gjb) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6547	2016-06-15 20:56:45 +00:00
Navdeep Parhar	c4765d2743	cxgbe/t4_tom: Fix inverted assertion in r300895. It is RDMA connections and not others that are allowed to fail the receive window check. Approved by: re (gjb@)	2016-06-14 21:09:00 +00:00
Navdeep Parhar	addd6a52c4	cxgbe/t4_tom: Exempt RDMA connections from a TCP sanity test for now, to avoid panicking debug kernels. t4_tom does not keep track of a connection once it switches to ULP mode iWARP. If the connection falls out of ULP mode the driver/hardware seq# etc. are out of sync. A better fix would be to figure out what the current seq# are, update the driver's state, and perform all sanity checks as usual.	2016-05-28 00:38:17 +00:00
John Baldwin	1081d2766c	Move the KTR for the update of ddp_active_id on each completion under VERBOSE_TRACES. Sponsored by: Chelsio Communications	2016-05-20 23:08:22 +00:00
John Baldwin	dc9643853d	Use DDP to implement zerocopy TCP receive with aio_read(). Chelsio's TCP offload engine supports direct DMA of received TCP payload into wired user buffers. This feature is known as Direct-Data Placement. However, to scale well the adapter needs to prepare buffers for DDP before data arrives. aio_read() is more amenable to this requirement than read() as applications often call read() only after data is available in the socket buffer. When DDP is enabled, TOE sockets use the recently added pru_aio_queue protocol hook to claim aio_read(2) requests instead of letting them use the default AIO socket logic. The DDP feature supports scheduling DMA to two buffers at a time so that the second buffer is ready for use after the first buffer is filled. The aio/DDP code optimizes the case of an application ping-ponging between two buffers (similar to the zero-copy bpf(4) code) by keeping the two most recently used AIO buffers wired. If a buffer is reused, the aio/DDP code is able to reuse the vm_page_t array as well as page pod mappings (a kind of MMU mapping the Chelsio NIC uses to describe user buffers). The generation of the vmspace of the calling process is used in conjunction with the user buffer's address and length to determine if a user buffer matches a previously used buffer. If an application queues a buffer for AIO that does not match a previously used buffer then the least recently used buffer is unwired before the new buffer is wired. This ensures that no more than two user buffers per socket are ever wired. Note that this feature is best suited to applications sending a steady stream of data vs short bursts of traffic. Discussed with: np Relnotes: yes Sponsored by: Chelsio Communications	2016-05-07 00:33:35 +00:00
John Baldwin	826c2372c5	Set the correct vnet in TOE event handlers. Differential Revision: https://reviews.freebsd.org/D6152	2016-05-06 23:49:10 +00:00
Pedro F. Giffuni	b66bb393f2	Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.	2016-04-22 16:57:42 +00:00
John Baldwin	113f2316c6	Add a 'show t4 tcb <nexus> <tid>' command to dump a TCB from DDB. This allows the contents of a TCB to be extracted from a T4/T5 card in DDB after a panic.	2016-04-10 05:06:58 +00:00
Navdeep Parhar	40bf7442fa	cxgbe: catch up with the latest hardware-related definitions. Obtained from: Chelsio Communications Sponsored by: Chelsio Communications	2016-02-19 00:29:16 +00:00
Gleb Smirnoff	f353ae1c62	More fixes to the build.	2016-01-27 05:15:53 +00:00
Gleb Smirnoff	57a78e3bae	Augment struct tcpstat with tcps_states[], which is used for book-keeping the amount of TCP connections by state. Provides a cheap way to get connection count without traversing the whole pcb list. Sponsored by: Netflix	2016-01-27 00:45:46 +00:00
Alexander V. Chernikov	8a9f7532b0	Convert cxgb/cxgbe to the new routing API. Discussed with: np	2016-01-07 08:07:17 +00:00
Gleb Smirnoff	0c39d38d21	Historically we have two fields in tcpcb to describe sender MSS: t_maxopd, and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned up after T/TCP removal. After all permutations over the years the result is that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if timestamps are in action, or is equal to t_maxopd otherwise. That's a very rough estimate of MSS reduced by options length. Throughout the code it was used in places, where preciseness was not important, like cwnd or ssthresh calculations. With this change: - t_maxopd goes away. - t_maxseg now stores MSS not adjusted by options. - new function tcp_maxseg() is provided, that calculates MSS reduced by options length. The functions gives a better estimate, since it takes into account SACK state as well. Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D3593	2016-01-07 00:14:42 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Navdeep Parhar	9eb533d3b4	cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI offload driver. These changes come from projects/cxl_iscsi.	2015-12-26 00:26:02 +00:00
John Baldwin	fe2ebb7644	Add support for configuring additional virtual interfaces (VIs) on a port. Each virtual interface has its own MAC address, queues, and statistics. The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented as additional VIs on each port. This change allows additional non-netmap interfaces to be configured on each port. Additional virtual interfaces use the naming scheme vcxgbeX or vcxlX. Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a value greater than 1 before loading the cxgbe(4) or cxl(4) driver. NB: The first VI on each port is the "main" interface (cxgbeX or cxlX). T4/T5 NICs provide a limited number of MAC addresses for each physical port. As a result, a maximum of six VIs can be configured on each port (including the "main" interface and the netmap interface when netmap is enabled). One user-visible result is that when netmap is enabled, packets received or transmitted via the netmap interface are no longer counted in the stats for the "main" interface, but are not accounted to the netmap interface. The netmap interfaces now also have a new-bus device and export various information sysctl nodes via dev.n(cxgbe\|cxl).X. The cxgbetool 'clearstats' command clears the stats for all VIs on the specified port along with the port's stats. There is currently no way to clear the stats of an individual VI. Reviewed by: np MFC after: 1 month Sponsored by: Chelsio	2015-12-03 00:02:01 +00:00
Navdeep Parhar	baa7d0bf9d	cxgbe/tom: decide whether to shove segments or not only if there is payload to transmit. MFC after: 1 week	2015-10-30 01:18:07 +00:00
John Baldwin	8fb15ddb00	Add a comment that to clarify how to determine the amount of received DDP data. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D3619	2015-09-10 21:41:11 +00:00
Julien Charbon	ff9b006d61	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.	2015-08-03 12:13:54 +00:00

1 2 3

127 Commits