freebsd-dev

Author	SHA1	Message	Date
Navdeep Parhar	addd6a52c4	cxgbe/t4_tom: Exempt RDMA connections from a TCP sanity test for now, to avoid panicking debug kernels. t4_tom does not keep track of a connection once it switches to ULP mode iWARP. If the connection falls out of ULP mode the driver/hardware seq# etc. are out of sync. A better fix would be to figure out what the current seq# are, update the driver's state, and perform all sanity checks as usual.	2016-05-28 00:38:17 +00:00
John Baldwin	1081d2766c	Move the KTR for the update of ddp_active_id on each completion under VERBOSE_TRACES. Sponsored by: Chelsio Communications	2016-05-20 23:08:22 +00:00
John Baldwin	dc9643853d	Use DDP to implement zerocopy TCP receive with aio_read(). Chelsio's TCP offload engine supports direct DMA of received TCP payload into wired user buffers. This feature is known as Direct-Data Placement. However, to scale well the adapter needs to prepare buffers for DDP before data arrives. aio_read() is more amenable to this requirement than read() as applications often call read() only after data is available in the socket buffer. When DDP is enabled, TOE sockets use the recently added pru_aio_queue protocol hook to claim aio_read(2) requests instead of letting them use the default AIO socket logic. The DDP feature supports scheduling DMA to two buffers at a time so that the second buffer is ready for use after the first buffer is filled. The aio/DDP code optimizes the case of an application ping-ponging between two buffers (similar to the zero-copy bpf(4) code) by keeping the two most recently used AIO buffers wired. If a buffer is reused, the aio/DDP code is able to reuse the vm_page_t array as well as page pod mappings (a kind of MMU mapping the Chelsio NIC uses to describe user buffers). The generation of the vmspace of the calling process is used in conjunction with the user buffer's address and length to determine if a user buffer matches a previously used buffer. If an application queues a buffer for AIO that does not match a previously used buffer then the least recently used buffer is unwired before the new buffer is wired. This ensures that no more than two user buffers per socket are ever wired. Note that this feature is best suited to applications sending a steady stream of data vs short bursts of traffic. Discussed with: np Relnotes: yes Sponsored by: Chelsio Communications	2016-05-07 00:33:35 +00:00
John Baldwin	826c2372c5	Set the correct vnet in TOE event handlers. Differential Revision: https://reviews.freebsd.org/D6152	2016-05-06 23:49:10 +00:00
Pedro F. Giffuni	b66bb393f2	Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.	2016-04-22 16:57:42 +00:00
John Baldwin	113f2316c6	Add a 'show t4 tcb <nexus> <tid>' command to dump a TCB from DDB. This allows the contents of a TCB to be extracted from a T4/T5 card in DDB after a panic.	2016-04-10 05:06:58 +00:00
Navdeep Parhar	40bf7442fa	cxgbe: catch up with the latest hardware-related definitions. Obtained from: Chelsio Communications Sponsored by: Chelsio Communications	2016-02-19 00:29:16 +00:00
Gleb Smirnoff	f353ae1c62	More fixes to the build.	2016-01-27 05:15:53 +00:00
Gleb Smirnoff	57a78e3bae	Augment struct tcpstat with tcps_states[], which is used for book-keeping the amount of TCP connections by state. Provides a cheap way to get connection count without traversing the whole pcb list. Sponsored by: Netflix	2016-01-27 00:45:46 +00:00
Alexander V. Chernikov	8a9f7532b0	Convert cxgb/cxgbe to the new routing API. Discussed with: np	2016-01-07 08:07:17 +00:00
Gleb Smirnoff	0c39d38d21	Historically we have two fields in tcpcb to describe sender MSS: t_maxopd, and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned up after T/TCP removal. After all permutations over the years the result is that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if timestamps are in action, or is equal to t_maxopd otherwise. That's a very rough estimate of MSS reduced by options length. Throughout the code it was used in places, where preciseness was not important, like cwnd or ssthresh calculations. With this change: - t_maxopd goes away. - t_maxseg now stores MSS not adjusted by options. - new function tcp_maxseg() is provided, that calculates MSS reduced by options length. The functions gives a better estimate, since it takes into account SACK state as well. Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D3593	2016-01-07 00:14:42 +00:00
Alexander V. Chernikov	4fb3a8208c	Implement interface link header precomputation API. Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp\|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102	2015-12-31 05:03:27 +00:00
Navdeep Parhar	9eb533d3b4	cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI offload driver. These changes come from projects/cxl_iscsi.	2015-12-26 00:26:02 +00:00
John Baldwin	fe2ebb7644	Add support for configuring additional virtual interfaces (VIs) on a port. Each virtual interface has its own MAC address, queues, and statistics. The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented as additional VIs on each port. This change allows additional non-netmap interfaces to be configured on each port. Additional virtual interfaces use the naming scheme vcxgbeX or vcxlX. Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a value greater than 1 before loading the cxgbe(4) or cxl(4) driver. NB: The first VI on each port is the "main" interface (cxgbeX or cxlX). T4/T5 NICs provide a limited number of MAC addresses for each physical port. As a result, a maximum of six VIs can be configured on each port (including the "main" interface and the netmap interface when netmap is enabled). One user-visible result is that when netmap is enabled, packets received or transmitted via the netmap interface are no longer counted in the stats for the "main" interface, but are not accounted to the netmap interface. The netmap interfaces now also have a new-bus device and export various information sysctl nodes via dev.n(cxgbe\|cxl).X. The cxgbetool 'clearstats' command clears the stats for all VIs on the specified port along with the port's stats. There is currently no way to clear the stats of an individual VI. Reviewed by: np MFC after: 1 month Sponsored by: Chelsio	2015-12-03 00:02:01 +00:00
Navdeep Parhar	baa7d0bf9d	cxgbe/tom: decide whether to shove segments or not only if there is payload to transmit. MFC after: 1 week	2015-10-30 01:18:07 +00:00
John Baldwin	8fb15ddb00	Add a comment that to clarify how to determine the amount of received DDP data. Reviewed by: np Differential Revision: https://reviews.freebsd.org/D3619	2015-09-10 21:41:11 +00:00
Julien Charbon	ff9b006d61	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.	2015-08-03 12:13:54 +00:00
Andrey V. Elsukov	cc0a3c8ca4	Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock. Both are used to protect access to IP addresses lists and they can be acquired for reading several times per packet. To reduce lock contention it is better to use rmlock here. Reviewed by: gnn (previous version) Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3149	2015-07-29 08:12:05 +00:00
Navdeep Parhar	c7dbd80213	cxgbe(4): Update T4 and T5 firmwares to 1.14.2.0. Obtained from: Chelsio Communications MFC after: 3 days	2015-07-14 08:02:05 +00:00
Gleb Smirnoff	28ebe80cab	Provide functions to determine presence of a given address configured on a given interface. Discussed with: np Sponsored by: Nginx, Inc.	2015-04-17 11:57:06 +00:00
Navdeep Parhar	7ef00d7884	cxgbe/tom: return rx credits promptly if the socket buffer's low water mark cannot be reached because the window advertised to the peer isn't wide enough. While here, tweak the normal credit return too. MFC after: 1 month	2015-03-31 01:22:20 +00:00
John Baldwin	b12c0a9eeb	Move special DDP handling for closing a connection into a new handle_ddp_close() function in t4_ddp.c as the logic is similar to handle_ddp_data(). This allows all knowledge of the special DDP mbufs to be private to t4_ddp.c as well.	2015-03-16 15:56:06 +00:00
John Baldwin	69a088631e	Resize receive socket buffers that support autosizing when receiving TCP data via direct data placement. Sponsored by: Chelsio MFC after: 1 week	2015-03-11 17:35:07 +00:00
Navdeep Parhar	b3d44a6800	cxgbe(4): tidy up some of the interaction between the Upper Layer Drivers (ULDs) and the base if_cxgbe driver. Track the per-adapter activation of ULDs in a new "active_ulds" field. This was done pretty arbitrarily before this change -- via TOM_INIT_DONE in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in adapter->offload_map for iWARP. iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM ULD). The rules are: a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then iWARP and/or iSCSI are enabled too. b) When the iWARP and iSCSI modules are loaded they go looking for adapters with TOE enabled and enable themselves on that adapter. c) You cannot deactivate or unload the TOM module from underneath iWARP or iSCSI. Any such attempt will fail with EBUSY. MFC after: 2 weeks	2015-02-08 09:28:55 +00:00
John Baldwin	86f05ea6cf	Lock the socket buffer before jumping to the 'out' label if sblock() fails in t4_soreceive_ddp().	2015-01-26 16:32:41 +00:00
John Baldwin	de5a10ecbc	- Update a disabled KASSERT() to use sbused() instead of accessing the no-longer existant sb_cc sockbuf member. - Use sbavail() instead of sbused() in t4_soreceive_ddp() to match the usage in soreceive_stream() on which it is based. Discussed with: glebius (2)	2015-01-26 16:29:14 +00:00
Navdeep Parhar	db8bcd1b21	cxgbe/tom: allocate page pod addresses instead of ppod#. MFC after: 2 weeks	2015-01-07 06:20:33 +00:00
Navdeep Parhar	f8c479085f	cxgbe/tom: use vmem(9) as the DDP page pod allocator. MFC after: 1 month	2015-01-06 01:30:32 +00:00
Navdeep Parhar	79b93bf6a3	cxgbe/tom: do not engage the TOE's payload chopper for payload < 2 MSS or for 10Gbps ports. MFC after: 2 weeks	2015-01-03 00:09:21 +00:00
Navdeep Parhar	402873f32a	cxgbe/tom: fix the MSS calculation for IPv6 connections handled by the TOE. MFC after: 1 week	2015-01-02 21:13:24 +00:00
Navdeep Parhar	dd1be4d418	cxgbe/tom: log some more details in send_flowc_wr. MFC after: 1 week	2015-01-02 20:52:51 +00:00
John Baldwin	5ad25ceb41	Check for SS_NBIO in so->so_state instead of sb->sb_flags in soreceive_stream(). Differential Revision: https://reviews.freebsd.org/D1299 Reviewed by: bz, gnn MFC after: 1 week	2014-12-15 17:52:08 +00:00
Navdeep Parhar	a7570ee305	Move KTR_CXGBE from t4_tom.h to adapter.h so that the base if_cxgbe code can use it too. MFC after: 1 week	2014-12-12 21:54:59 +00:00
Gleb Smirnoff	651e4e6a30	Merge from projects/sendfile: extend protocols API to support sending not ready data: o Add new flag to pru_send() flags - PRUS_NOTREADY. o Add new protocol method pru_ready(). Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-30 13:24:21 +00:00
Gleb Smirnoff	0f9d0a73a4	Merge from projects/sendfile: o Introduce a notion of "not ready" mbufs in socket buffers. These mbufs are now being populated by some I/O in background and are referenced outside. This forces following implications: - An mbuf which is "not ready" can't be taken out of the buffer. - An mbuf that is behind a "not ready" in the queue neither. - If sockbet buffer is flushed, then "not ready" mbufs shouln't be freed. o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc. The sb_ccc stands for ""claimed character count", or "committed character count". And the sb_acc is "available character count". Consumers of socket buffer API shouldn't already access them directly, but use sbused() and sbavail() respectively. o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones with M_BLOCKED. o New field sb_fnrdy points to the first not ready mbuf, to avoid linear search. o New function sbready() is provided to activate certain amount of mbufs in a socket buffer. A special note on SCTP: SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet allow protocol specific sockbufs. Thus, SCTP does some hacks to make itself compatible with FreeBSD: it manages sockbufs on its own, but keeps sb_cc updated to inform the stack of amount of data in them. The new notion of "not ready" data isn't supported by SCTP. Instead, only a mechanical substitute is done: s/sb_cc/sb_ccc/. A proper solution would be to take away struct sockbuf from struct socket and allow protocols to implement their own socket buffers, like SCTP already does. This was discussed with rrs@. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 12:52:33 +00:00
Gleb Smirnoff	cfa6009e36	In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-12 09:57:15 +00:00
Navdeep Parhar	527e4e62ac	Always request a completion for every work request for iWARP. The initial MPA exchange must be tracked this way so that t4_tom's state for the tid is all clean at the time the tid transitions to RDMA mode. Once it does, t4_tom is out of the way and iw_cxgbe uses the qp endpoints directly. Sponsored by: Chelsio Communications	2014-10-28 18:10:57 +00:00
Navdeep Parhar	19abdd0654	cxgbe/tom: don't leak resources tied to an active open request that cannot be sent to the chip because a prerequisite L2 resolution failed. Submitted by: Hariprasad at chelsio dot com (original version) MFC after: 2 weeks.	2014-10-07 21:26:22 +00:00
Navdeep Parhar	3a260c2a19	Update comment (missed this bit in r272079).	2014-09-24 20:08:43 +00:00
Navdeep Parhar	13251b21e1	cxgbe/tom: Catch up with r271119, syncache_add doesn't need tcbinfo lock.	2014-09-24 20:04:11 +00:00
Navdeep Parhar	0fe982772d	Some hooks in cxgbe(4) for the offloaded iSCSI driver. (I'm committing this on behalf of my colleagues in the Storage team at Chelsio). Submitted by: Sreenivasa Honnur <shonnur at chelsio dot com> Sponsored by: Chelsio Communications.	2014-07-24 18:39:08 +00:00
Attilio Rao	3ae10f7477	- Modify vm_page_unwire() and vm_page_enqueue() to directly accept the queue where to enqueue pages that are going to be unwired. - Add stronger checks to the enqueue/dequeue for the pagequeues when adding and removing pages to them. Of course, for unmanaged pages the queue parameter of vm_page_unwire() will be ignored, just as the active parameter today. This makes adding new pagequeues quicker. This change effectively modifies the KPI. __FreeBSD_version will be, however, bumped just when the full cache of free pages will be evicted. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2014-06-16 18:15:27 +00:00
Bjoern A. Zeeb	255cd9fd58	Move the tcp_fields_to_host() and tcp_fields_to_net() (inline) functions to the tcp_var.h header file in order to avoid further duplication with upcoming commits. Reviewed by: np MFC after: 2 weeks	2014-05-23 20:15:01 +00:00
Navdeep Parhar	88bb82e511	Do not create a hardware IPv6 server if the listen address is not in6addr_any and is not in the CLIP table either. This fixes a reported TOE+IPv6 NULL-dereference panic in do_pass_open_rpl(). While here, stop creating hardware servers for any loopback address. It's just a waste of server tids. MFC after: 1 week	2013-12-17 21:41:23 +00:00
Navdeep Parhar	93e9cae3fa	Read card capabilities after firmware initialization, instead of setting them up as part of firmware initialization (which the driver gets to do only if it's the master driver). Read the range of tids available for the ETHOFLD functionality if it's enabled. New is_ftid() and is_etid() functions to test whether a tid falls within the range of filter tids or ETHOFLD tids respectively. MFC after: 2 weeks	2013-12-14 03:08:03 +00:00
Gleb Smirnoff	66e01d73cd	- Provide necessary includes. - Remove unnecessary includes. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-29 11:17:49 +00:00
Gleb Smirnoff	c3322cb91c	Include necessary headers that now are available due to pollution via if_var.h. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-28 07:29:16 +00:00
Navdeep Parhar	48d05478bf	cxgbe(4): Update T4 and T5 firmwares to 1.9.12.0	2013-10-14 21:25:07 +00:00
Navdeep Parhar	eb22728291	Rework the tx credit mechanism between the cxgbe/tom driver and the card. This helps smooth out some burstiness in the exchange. Approved by: re (glebius)	2013-09-09 04:38:57 +00:00
Navdeep Parhar	c81d56a0aa	Fix a miscalculation that caused cxgbe/tom to auto-increment a TOE socket's tx buffer size too aggressively. Approved by: re (delphij)	2013-09-09 00:16:59 +00:00

1 2

94 Commits