freebsd-nq

Author	SHA1	Message	Date
Adrian Chadd	ce44e5f776	Add the UDP hash -> RSS mbuf hash type for the ixgbe(4) driver.	2014-07-20 08:43:53 +00:00
Adrian Chadd	e3af537e75	Teach ixgbe(4) about rss_gethashconfig(). If RSS is enabled, ixgbe(4) will query the RSS API for the types of hashes which should be used. It'll then only enable hashes that are exposed via the RSS layer. This way it won't try to do things like enable UDP hashing if RSS explicitly states that it isn't supported in lookups. Tested: * 82599EB ixgbe(4) NIC	2014-07-20 07:45:48 +00:00
Adrian Chadd	e965b0dcd1	Disable the ixgbe(4) UDP 4-tuple hashing for the time being. A mix of fragmented and non-fragmented UDP in a single stream will end up being hashed differently, resulting in out-of-order behaviour in the receive path. This was done in the linux e1000 driver in 2011. Discussed with: jfv	2014-07-20 07:43:41 +00:00
Adrian Chadd	c64a6bc62d	Correctly program the RSS redirection table entries. Without this, the RSS bucket assignments aren't correct - they're DCBA instead of ABCD in each DWORD. Tested: 82599EB ixgbe(4), TCP and UDP RSS	2014-07-20 04:11:18 +00:00
Hiren Panchasara	2814ea66e3	Fix a typo. PR: 191898 Submitted by: vsjcfm@gmail.com	2014-07-17 06:21:58 +00:00
Adrian Chadd	8c0d2adf3f	Initialise these variables so gcc doesn't complain. Submitted by: luigi	2014-06-30 23:34:36 +00:00
Adrian Chadd	7063e348ab	Add initial RSS awareness to the ixgbe(4) driver. The ixgbe(4) hardware is capable of RSS hashing RX packets and doing RSS queue selection for up to 8 queues. However, even if multi-queue is enabled for ixgbe(4), the RX path doesn't use the RSS flowid from the received descriptor. It just uses the MSIX queue id. This patch does a handful of things if RSS is enabled: * Instead of using a random key at boot, fetch the RSS key from the RSS code and program that in to the RSS redirection table. That whole chunk of code should be double checked for endian correctness. * Use the RSS queue mapping to CPU ID to figure out where to thread pin the RX swi thread and the taskqueue threads for each queue. * The software queue is now really an "RSS bucket". * When programming the RSS indirection table, use the RSS code to figure out which RSS bucket each slot in the indirection table maps to. * When transmitting, use the flowid RSS mapping if the mbuf has an RSS aware hash. The existing method wasn't guaranteed to align correctly with the destination RSS bucket (and thus CPU ID.) This code warns if the number of RSS buckets isn't the same as the automatically configured number of hardware queues. The administrator will have to tweak one of them for better performance. There's currently no way to re-balance the RSS indirection table after startup. I'll worry about that later. Additionally, it may be worthwhile to always use the full 32 bit flowid if multi-queue is enabled. It'll make things like lagg(4) behave better with respect to traffic distribution.	2014-06-30 04:38:29 +00:00
Hans Petter Selasky	af3b2549c4	Pull in r267961 and r267973 again. Fix for issues reported will follow.	2014-06-28 03:56:17 +00:00
Glen Barber	37a107a407	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory	2014-06-27 22:05:21 +00:00
Hans Petter Selasky	3da1cf1e88	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies	2014-06-27 16:33:43 +00:00
John Baldwin	46e89834dc	- Don't compare bus_dma map pointers for static DMA allocations against NULL to determine if bus_dmamap_unload() or bus_dmamem_free() should be called. Instead, check the associated bus and virtual addresses. - Don't clear static DMA maps to NULL. Reviewed by: jfv	2014-06-12 11:15:19 +00:00
Luigi Rizzo	c7156fe92f	make sure if_transmit returns 0 if the mbuf is enqueued. ixgbe/ixv.c still needs a similar fix but it takes a little more restructuring of the code. MFC after: 3 days	2014-06-06 20:49:56 +00:00
Gleb Smirnoff	b245f96c44	Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed. o Remove the if_baudrate_pf crutch. o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second. This also removes quite a lot of COMPAT_FREEBSD32 code. o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes. o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters. __FreeBSD_version bumped. Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-03-13 03:42:24 +00:00
Luigi Rizzo	17885a7bfd	It is 2014 and we have a new version of netmap. Most relevant features: - netmap emulation on any NIC, even those without native netmap support. On the ixgbe we have measured about 4Mpps/core/queue in this mode, which is still a lot more than with sockets/bpf. - seamless interconnection of VALE switch, NICs and host stack. If you disable accelerations on your NIC (say em0) ifconfig em0 -txcsum -txcsum you can use the VALE switch to connect the NIC and the host stack: vale-ctl -h valeXX:em0 allowing sharing the NIC with other netmap clients. - THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers instead of pointers/count as before). This was unavoidable to support, in the future, multiple threads operating on the same rings. Netmap clients require very small source code changes to compile again. On the plus side, the new API should be easier to understand and the internals are a lot simpler. The manual page has been updated extensively to reflect the current features and give some examples. This is the result of work of several people including Giuseppe Lettieri, Vincenzo Maffione, Michio Honda and myself, and has been financially supported by EU projects CHANGE and OPENLAB, from NetApp University Research Fund, NEC, and of course the Universita` di Pisa.	2014-01-06 12:53:15 +00:00
Gleb Smirnoff	f56831a217	Fix build broken in r259644. Submitted by: tuexen Pointy hat to: glebius	2013-12-20 13:18:50 +00:00
Gleb Smirnoff	46bf53de69	ixgbe(4) takes packet counters from hardware in ixgbe_update_stats_counters(), so we don't need to do a per packet increment, which trashes cache line. Submitted by: oleg	2013-12-20 10:57:47 +00:00
Oleg Bulyzhin	ba36d317f6	- Fix link loss on vlan reconfiguration. - Fix issues with 'vlanhwfilter'. MFC after: 1 week Silence from: jfv 5 weeks	2013-11-05 09:46:01 +00:00
Luigi Rizzo	ce3ee1e7c4	update to the latest netmap snapshot. This includes the following: - use separate memory regions for VALE ports - locking fixes - some simplifications in the NIC-specific routines - performance improvements for the VALE switch - some new features in the pkt-gen test program - documentation updates There are small API changes that require programs to be recompiled (NETMAP_API has been bumped so you will detect old binaries at runtime). In particular: - struct netmap_slot now is 16 bytes to support an extra pointer, which may save one data copy when using VALE ports or VMs; - the struct netmap_if has two extra fields; MFC after: 3 days	2013-11-01 21:21:14 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
Gleb Smirnoff	4cdc1f5421	There are some high performance NICs that count statistics in hardware, and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius)	2013-10-09 19:04:40 +00:00
Hiren Panchasara	5b9d734b08	Expose system level ixgbe sysctls. Device level sysctls are already exposed as dev.ix.<device> Fixing the case where number of queues for igb is auto-tuned and hw.igb.num_queues does not return current/updated value. Reviewed by: jfv Approved by: re (delphij) MFC after: 2 weeks	2013-10-05 19:17:56 +00:00
Andre Oppermann	1b4381afbb	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
Scott Long	c68534f1d5	Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI command register. The lazy BAR allocation code in FreeBSD sometimes disables this bit when it detects a range conflict, and will re-enable it on demand when a driver allocates the BAR. Thus, the bit is no longer a reliable indication of capability, and should not be checked. This results in the elimination of a lot of code from drivers, and also gives the opportunity to simplify a lot of drivers to use a helper API to set the busmaster enable bit. This changes fixes some recent reports of disk controllers and their associated drives/enclosures disappearing during boot. Submitted by: jhb Reviewed by: jfv, marius, achadd, achim MFC after: 1 day	2013-08-12 23:30:01 +00:00
Jack F Vogel	4dc63104ae	Improve the MSIX setup code in the drivers, thanks to Marius for the changes. Make sure that pci_alloc_msix() does give us the vectors we need and fall back to MSI when it doesn't, also release any that were allocated when insufficient. MFC after: 3 days	2013-08-12 22:54:38 +00:00
Jack F Vogel	d0913b7f25	Make the various driver MSIX setup routines fallback to MSI more gracefully. This change was suggested by Marius Strobl, thank you. PR: kern/181016 MFC after: ASAP	2013-08-06 21:01:38 +00:00
Jack F Vogel	7301d64aba	Correct a fat-finger in the last delta. MFC after: ASAP	2013-08-05 16:16:50 +00:00
Jack F Vogel	cbe75ae8f5	A number of important fixes: - mbuf reused after an RX_COPY optimized operation can sometimes have a bogus cached address, resulting in TCP hangs. Add critical save points to the cached address. Thanks to Michael and the team at Verisign for finding this problem. - A couple more spots where the rxbuf->flags member should be cleared just to be sure no incorrect RX_COPY state is left around. Thanks to Adrian for tracking these down. - Remove the rearm_queues function from the driver, this was found to be responsible for some out-of-order packets by Verisign, and was always a bandaid, with the other fixes in this delta the bandaid can finally be removed. - In the other/link interrupt handler the entire state of the EICS register was being writen back into EICR (which clears causes and thus re-enables those interrupts), this was wrong, so now mask off the queue portion of the register value, so we only clear the other/link interrupt we intend. Marc from Verisign found this. - Make the SFP+ unsupported option tuneable now, by customer request. - Finally, just a couple of minor DEBUG string fixes. I want to call out and thank all the participants in the 10G community/Intel calls for helping track down these problems and make the driver better for everyone! MFC after: 3 days, these are critical fixes for 9.2!	2013-08-01 20:10:16 +00:00
Jack F Vogel	bae4b87e8a	Opps, need to change the VF code as well. MFC after: ASAP	2013-07-12 21:21:15 +00:00
Jack F Vogel	ee738eea01	Remove the conditional define around the option headers, when building the driver as a module the result of the present system results in INET and INET6 being undefined, and will cause the panic in ixgbe_tso_setup(). The Makefile in the module directory now renders the conditional in the source unnecessary and wrong. MFC after: ASAP - the panic as a module must not get into 9.2	2013-07-12 21:14:42 +00:00
Jack F Vogel	3f80cc03fd	Fix my last commit, flags rather than flag... duh. MFC after: 2 days	2013-07-11 03:44:06 +00:00
Jack F Vogel	804d70535a	Fix to a panic found internally, bad pointer during rxeof processing. Thanks for John Baldwin for catching this. Not clearing the flag member of the rxbuf could result in a NULL mbuf pointer being used. MFC after: 2 days (this needs to get into 9.2!)	2013-07-10 23:14:24 +00:00
Jack F Vogel	fd75b91d13	Add quad port probe support, this gives the admin proper information about the slot (which should be a PCIE Gen 3 slot for this adapter) by looking back thru the PCI parent devices to the slot device. The fix above also corrects the bandwidth display to GT/s rather than the incorrect Gb/s Next, allow the use of ALTQ if you select the compile option IXGBE_LEGACY_TX. Allow the use of 'unsupported' optic modules by a compile option as well. Add a phy reset capability into the stop code, this is so a static configured driver will still behave properly when taken down (not being able to unload it). This revision synchronizes the shared code with Intel internal current code, and note that it now includes DCB supporting code, this was necessitated by some internal changes with the code, but it also will provide the opportunity to develop this feature in the core driver down the road. I have edited the README to get rid of some of the worse anachronisms in it as well, its by no means as robust as I might wish at this point however. Oh, I also have included some conditional stuff in the code so it will be compatible in both the 9.X and 10 environments. Performance has been a focus in recent changes and I believe this revision driver will perform very well in most workloads. MFC after: 2 weeks	2013-06-18 21:28:19 +00:00
Luigi Rizzo	d61ba75247	use netmap_rx_irq() / netmap_tx_irq() to handle interrupts in netmap mode, removing the logic from individual drivers. (note: if_lem.c not updated yet due to some other pending modifications)	2013-04-30 16:18:29 +00:00
Jack F Vogel	7bdac10465	Two small fixes: Set promiscuous code was unconditionally turning off multicast when turning off promiscuous mode, this should only be done when there are less than MAX groups. Thanks to Mike Karels for this correction. Second, the overtmp interrupt setup/detection was wrong, correcting it. MFC after: one week	2013-03-29 18:03:00 +00:00
Jack F Vogel	facc592d88	Fix a small, but important bug, a task drain was mistakenly being compiled only when setting LEGACY_TX, this means you would not get the drain when needed on detach!! Thanks to Bryan Venteicher (bryanv@freebsd.org) for catching this little gremlin!! :)	2013-03-04 23:15:07 +00:00
Jack F Vogel	0ecc2ff0e8	First, sync to internal shared code, and then Fixes: - flow control - don't override user value on re-init - fix to make 1G optics work correctly - change to interrupt enabling - some bits were incorrect for certain hardware. - certain stats fixes, remove a duplicate increment of ierror, thanks to Scott Long for pointing these out. - shared code link interface changed, requiring some core code changes to accomodate this. - add an m_adj() to ETHER_ALIGN on the recieve side, this was requested by Mike Karels, thanks Mike. - Multicast code corrections also thanks to Mike Karels.	2013-03-04 23:07:40 +00:00
Dag-Erling Smørgrav	cdc1296734	revert 247035	2013-02-20 21:16:50 +00:00
Dag-Erling Smørgrav	c9263bd288	Reduce excessive nesting.	2013-02-20 12:59:21 +00:00
Randall Stewart	ded5ea6a25	This fixes a out-of-order problem with several of the newer drivers. The basic problem was that the driver was pulling the mbuf off the drbr ring and then when sending with xmit(), encounting a full transmit ring. Thus the lower layer xmit() function would return an error, and the drivers would then append the data back on to the ring. For TCP this is a horrible scenario sure to bring on a fast-retransmit. The fix is to use drbr_peek() to pull the data pointer but not remove it from the ring. If it fails then we either call the new drbr_putback or drbr_advance method. Advance moves it forward (we do this sometimes when the xmit() function frees the mbuf). When we succeed we always call advance. The putback will always copy the mbuf back to the top of the ring. Note that the putback cannot be used with a drbr_dequeue() only with drbr_peek(). We most of the time, in putback, would not need to copy it back since most likey the mbuf is still the same, but sometimes xmit() functions will change the mbuf via a pullup or other call. So the optimial case for the single consumer is to always copy it back. If we ever do a multiple_consumer (for lagg?) we will need a test and atomic in the put back possibly a seperate putback_mc() in the ring buf. Reviewed by: jhb@freebsd.org, jlv@freebsd.org	2013-02-07 15:20:54 +00:00
Sofian Brabez	61bfd86762	Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays Reviewed by: cognet Approved by: cognet	2013-01-30 18:01:20 +00:00
Pedro F. Giffuni	646a7fea0c	Clean some 'svn:executable' properties in the tree. Submitted by: Christoph Mallon MFC after: 3 days	2013-01-26 22:08:21 +00:00
Luigi Rizzo	60372f6f58	rename the 'tag' and 'map' fields used the rx ring to their previous names, 'ptag' and 'pmap' -- p stands for packet. This change reduces the difference between the code in stable/9 and head, and also helps using the same ixgbe_netmap.h on both branches. Approved by: Jack Vogel	2012-12-20 22:26:03 +00:00
Gleb Smirnoff	c6499eccad	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags in sys/dev.	2012-12-04 09:32:43 +00:00
Jack F Vogel	4153fe7216	Remove the sysctl process_limit interface, after some thought I've decided its overkill,a simple tuneable for each RX and TX limit, and then init sets the ring values based on that, should be sufficient. More importantly, fix a bug causing a panic, when changing the define style to IXGBE_LEGACY_TX a taskqueue init was inadvertently set #ifdef when it should be #ifndef.	2012-12-03 21:38:02 +00:00
Jack F Vogel	39aa926bb3	Patch #12 OK, I said there was only 11 patches, but unfortunately the revamped sysctl code did not work, and needed a change. This makes the limit get set at the time that all sysctl stats are created and is actually more elegant imho anyway.	2012-12-01 01:24:40 +00:00
Jack F Vogel	5a5d90a268	Patch #11 - The final patch: this one greatly improves the TX hot path by getting rid of index calculations and simply managing pointers. Much of the creative code is due to my coworker here at Intel, Alex Duyck, thanks Alex! Also, this whole series of patches was given the critical eye of Gleb Smirnoff and is all the better for it, thanks Gleb!	2012-12-01 00:11:24 +00:00
Jack F Vogel	d777904f05	Patch #10 Performance - this changes the protocol offload interface and code in the TX path,making it tighter and hopefully more efficient.	2012-12-01 00:03:58 +00:00
Jack F Vogel	df51baf38f	Patch #9 Performance - improve the tx dma failure path, similar to a change done in igb long ago.	2012-11-30 23:54:57 +00:00
Jack F Vogel	47dd71a877	Patch #8 Performance changes - this one improves locality, moving some counters and data to the ring struct from the adapter struct, also compressing some data in the move.	2012-11-30 23:45:55 +00:00
Jack F Vogel	27329b1a91	Patch #7 This is primarily about processing limit control. - add a limit for both RX and TX, change the default to 256 - change the sysctl usage to be common, and now to be called during init for each ring. - the TX limit is not yet used, but the changes in the last patch in this series uses the value. - the motivation behind these changes is to improve data locality in the final code. - rxeof interface changes since it now gets limit from the ring struct	2012-11-30 23:28:01 +00:00

1 2 3 4

153 Commits