30248 Commits

Author SHA1 Message Date
Sepherosa Ziehau
4f8f2d4274 hyperv/hn: Removed unused netvsc_init()
Submitted by:		Dexuan Cui <decui microsoft com>
Reviewed by:		me, adrian, royger,
			Hongjiang Zhang <honzhan microsoft com>
Approved by:		adrian (mentor)
Sponsored by:		Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D4594
2016-01-12 01:55:57 +00:00
Sepherosa Ziehau
c48d20d7c7 hyperv/hn: Avoid mbuf cluster allocation, if the packet is small.
This one mainly avoids mbuf cluster allocation for TCP ACKs during
TCP sending tests.  And it gives me ~200Mbps improvement (4.7Gbps
-> 4.9Gbps), when running iperf3 TCP sending test w/ 16 connections.

While I'm here, nuke the unnecessary zeroing out pkthdr.csum_flags.

Reviewed by:		adrain
Approved by:		adrian (mentor)
Sponsored by:		Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D4853
2016-01-12 01:50:56 +00:00
Sepherosa Ziehau
da949700f2 hyperv/hn: Implement SIOC[SG]IFMEDIA support
Many applications and kernel modules (e.g. bridge) rely on the ifmedia
status report; give them what they want.

Submitted by:		Dexuan Cui <decui microsoft com>
Reviewed by:		Jun Su <junsu microsoftc com>, me, adrian
Modified by:		me (minor)
Original differential:	https://reviews.freebsd.org/D4611
Differential Revision:	https://reviews.freebsd.org/D4852
Approved by:		adrian (mentor)
Sponsored by:		Microsoft OSTC
2016-01-12 01:41:34 +00:00
Sepherosa Ziehau
39863fbd98 hyperv/hn: Implement LRO
- Implement the LRO using tcp_lro APIs, and LRO is enabled by default.
- Add several stats sysctl nodes.
- Check IP/TCP length before sending the packet to tcp_lro_rx(), if host
  does not provide RX csum information (*); and add an option through
  sysctl to always trust host TCP segment csum checks (default is off).
- Add sysctl to control the LRO entry depth; it is disabled by default.
  It is used to avoid holding too much TCP segments in driver.  Limiting
  the LRO entry depth helps a lot in a one/two streams RX test.

This one 3x the RX performance on my local test (3Gbps -> 10Gbps), and
~2x the RX performance over a directly connected 40Ge network (5Gbps ->
9Gbps).

(*) It seems the host stops supplying csum information, once the network
load is high.  This still needs investigation...

Reviewed by:		Hongjiang Zhang <honzhan microsoft com>,
			Dexuan Cui <decui microsoft com>,
			Jun Su <junsu microsoft com>,
			delphij
Tested by:		me (local),
			Hongjiang Zhang <honzhan microsoft com>
			(directly connected 40Ge)
Approved by:		delphij (mentor), adrian (mentor, no objection)
With feedback from:	delphij, Hongjiang Zhang <honzhan microsoft com>
Sponsored by:		Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D4824
2016-01-12 01:30:51 +00:00
Andriy Voskoboinyk
1c6b43df0a wpi, iwn: implement ic_getradiocaps method
This will allow to restore channel list after switching interface
to more restrictive regdomain.

Tested with Intel 3945BG (wpi) only.

Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D4863
2016-01-12 00:24:40 +00:00
Andriy Voskoboinyk
6420fb29aa rtwn: import r290022 (do not filter out control frames in the RX path)
Tested by:	kevlo
Reviewed by:	kevlo
Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D4838
2016-01-12 00:12:18 +00:00
Colin Percival
cbb261aec7 Add two more assertions to catch busdma problems. Each segment provided
by busdma to the blkfront driver must be an integer number of sectors,
and must be aligned in memory on a "sector" boundary.

Having these assertions yesterday would have made finding the bug fixed
in r293698 somewhat easier.
2016-01-11 21:02:30 +00:00
Navdeep Parhar
5725f0e490 cxgbe: bind the ithreads that handle NIC rx to the correct CPU if the kernel
is built with option RSS.
2016-01-11 17:52:42 +00:00
Steven Hartland
481b36c66a Close iSCSI sessions on shutdown
Ensure that all iSCSI sessions are correctly terminated during shutdown.

* Enhances the changes done by r286226 (D3052).
* Add shutdown post sync event to run after filesystem shutdown
  (SHUTDOWN_PRI_FIRST) but before CAM shutdown (SHUTDOWN_PRI_DEFAULT).
* Changes iscsi_maintenance_thread to processes terminate in preference to
  reconnect.

Reviewed by:	trasz
MFC after:	2 weeks
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D4429
2016-01-11 10:24:30 +00:00
Andrew Rybchenko
b53f4a640f sfxge: add Medford build option disabled by default
Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
2016-01-11 09:15:25 +00:00
Marius Strobl
3deebd539b - Add support for Advantech PCI-1602 Rev. B1 and PCI-1603 cards. [1]
- Add a description of Advantech PCI-1602 Rev. A boards. [1]
- Properly set up REG_ACR also for PCI-1602 Rev. A based on what the
  Advantech-supplied Linux driver does.
- Additionally use the macros of <dev/ic/ns16550.h> to replace existing
  magic values and get rid of trivial comments.
- Fix the style of some comments.

PR:		205359 [1]
Submitted by:	Jan Mikkelsen (original patch) [1]
2016-01-10 18:11:23 +00:00
Andriy Voskoboinyk
950678b488 rtwn: fix sequence number assignment (part of r290630)
Reviewed by:	kevlo
Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D4819
2016-01-09 21:45:21 +00:00
Nathan Whitehorn
bb0455d7dd Make graphical consoles work under PowerKVM. Without using hypercalls, it is
not possible to write the framebuffer before pmap is up. Solve this by
deferring initialization until that happens, like on PS3.

MFC after:	1 week
2016-01-09 21:28:56 +00:00
Gleb Smirnoff
2bab0c5535 New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.

The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.

Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.

The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.

Celebrates:	Netflix global launch!
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
Relnotes:	yes
2016-01-08 20:34:57 +00:00
Conrad Meyer
1502e36346 ioat(4): Add ioat_acquire_reserve() KPI
ioat_acquire_reserve() is an extended version of ioat_acquire().  It
allows users to reserve space in the channel for some number of
descriptors.  If this succeeds, it guarantees that at least submission
of N valid descriptors will succeed.

Sponsored by:	EMC / Isilon Storage Division
2016-01-07 23:02:15 +00:00
Jim Harris
042231951b ismt: fix ISMT_DESC_ADDR_RW macro
Submitted by:	Masanobu SAITOH <msaitoh@netbsd.org>
MFC after:	3 days
2016-01-07 21:16:44 +00:00
Jim Harris
9c6b5d40eb nvme: replace NVME_CEILING macro with howmany()
Suggested by:	rpokala
MFC after:	3 days
2016-01-07 20:35:26 +00:00
Jim Harris
50dea2da12 nvme: add hw.nvme.min_cpus_per_ioq tunable
Due to FreeBSD system-wide limits on number of MSI-X vectors
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321),
it may be desirable to allocate fewer than the maximum number
of vectors for an NVMe device, in order to save vectors for
other devices (usually Ethernet) that can take better
advantage of them and may be probed after NVMe.

This tunable is expressed in terms of minimum number of CPUs
per I/O queue instead of max number of queues per controller,
to allow for a more even distribution of CPUs per queue.  This
avoids cases where some number of CPUs have a dedicated queue,
but other CPUs need to share queues.  Ideally the PR referenced
above will eventually be fixed and the mechanism implemented
here becomes obsolete anyways.

While here, fix a bug in the CPUs per I/O queue calculation to
properly account for the admin queue's MSI-X vector.

Reviewed by:	gallatin
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 20:32:04 +00:00
Andriy Voskoboinyk
0046e1868f net80211 drivers: fix ieee80211_init_channels() usage
Fix out-of-bounds read (all) / write (11n capable) for drivers
that are using ieee80211_init_channels() to initialize channel list.

Tested with:
 * RTL8188EU, STA mode.
 * RTL8188CUS, STA mode.
 * WUSB54GC, HOSTAP mode.

Approved by:	adrian (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4818
2016-01-07 18:41:03 +00:00
Sean Bruno
9030be4bad Fix VF handling of VLANs.
This helps immensily with our ability to operate in the Amazon Cloud.

Discussed on Intel Networking Community call this morning.

Submitted by:	Jarrod Petz(petz@nisshoko.net)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4788
2016-01-07 18:34:56 +00:00
Sean Bruno
97f9586e97 Fixup SFP module insertion on the 82599 when insertion happens after
the system is booted and running.

Add PHY detection logic to ixgbe_handle_mod() and add locking to
ixgbe_handle_msf() as well.

PR:		150251
Submitted by:	aboyer@averesystems.com
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3188
2016-01-07 17:02:34 +00:00
Sean Bruno
676822acb5 Disable the reuse of checksum offload context descriptors in the case
of multiple queues in em(4).  Document errata in the code.

MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D3995
2016-01-07 16:48:47 +00:00
Sean Bruno
b834dcea9a Switch em(4) to the extended RX descriptor format. This matches the
e1000/e1000e split in linux.

Split rxbuffer and txbuffer apart to support the new RX descriptor format
structures. Move rxbuffer manipulation to em_setup_rxdesc() to unify the
new behavior changes.

Add a RSSKEYLEN macro for help in generating the RSSKEY data structures
in the card.

Change em_receive_checksum() to process the new rxdescriptor format
status bit.

MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D3447
2016-01-07 16:42:48 +00:00
Sean Bruno
8061e8bb1e Wow, um ... sorry about that. The commit log for this code should have
read that it was for EM_MULTIQUEUE.  Revert this and try again.
2016-01-07 16:24:18 +00:00
Sean Bruno
712b97a630 Switch em(4) to the extended RX descriptor format. This matches the
e1000/e1000e split in linux.

MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D3447
2016-01-07 16:20:55 +00:00
Jim Harris
2b647da7a0 nvme: do not revert o single I/O queue when per-CPU queues not possible
Previously nvme(4) would revert to a signle I/O queue if it could not
allocate enought interrupt vectors or NVMe submission/completion queues
to have one I/O queue per core.  This patch determines how to utilize a
smaller number of available interrupt vectors, and assigns (as closely
as possible) an equal number of cores to each associated I/O queue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:18:32 +00:00
Jim Harris
d400f790b1 nvme: break out interrupt setup code into a separate function
MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:12:42 +00:00
Jim Harris
e5af5854ff nvme: do not pre-allocate MSI-X IRQ resources
The issue referenced here was resolved by other changes
in recent commits, so this code is no longer needed.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:11:31 +00:00
Jim Harris
c75ad8ce5a nvme: remove per_cpu_io_queues from struct nvme_controller
Instead just use num_io_queues to make this determination.

This prepares for some future changes enabling use of multiple
queues when we do not have enough queues or MSI-X vectors
for one queue per CPU.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:09:56 +00:00
Jim Harris
d85f84abb8 nvme: simplify some of the nested ifs in interrupt setup code
This prepares for some follow-up commits which do more work in
this area.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:08:04 +00:00
Jim Harris
58d0b8f3c3 nvd: submit bios directly when BIO_ORDERED not set or in flight
This significantly improves parallelism in the most common case.
The taskqueue is still used whenever BIO_ORDERED bios are in flight.

This patch is based heavily on a patch from gallatin@.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 16:06:23 +00:00
Jim Harris
47ef4244f5 nvd: break out submission logic into separate function
This enables a future patch using this same logic to submit
I/O directly bypassing the taskqueue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 15:59:51 +00:00
Jim Harris
26ca317aef nvd: skip BIO_ORDERED logic when bio fails submission
This ensures the bio flags are not read after biodone().
The ordering will still be enforced, after the bio is
submitted successfully.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 15:58:44 +00:00
Jim Harris
8fe5c0d286 nvd: do not wait for previous bios before submitting ordered bio
Still wait until all in-flight bios (including the ordered bio)
complete before processing more bios from the queue.

MFC after:	3 days
Sponsored by:	Intel
2016-01-07 15:57:17 +00:00
Jim Harris
454f163b9f nvd: set DISKFLAG_DIRECT_COMPLETION
Submitted by:	gallatin
MFC after:	3 days
2016-01-07 15:55:41 +00:00
Alexander V. Chernikov
8a9f7532b0 Convert cxgb/cxgbe to the new routing API.
Discussed with:		np
2016-01-07 08:07:17 +00:00
Gleb Smirnoff
0c39d38d21 Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.

With this change:

- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
  options length. The functions gives a better estimate, since it takes
  into account SACK state as well.

Reviewed by:	jtl
Differential Revision:	https://reviews.freebsd.org/D3593
2016-01-07 00:14:42 +00:00
Conrad Meyer
bd81fe68ee ioat(4): Add ioat_get_max_io_size() KPI
Consumers need to know the permitted IO size to send maximally sized
chunks to the hardware.

Sponsored by:	EMC / Isilon Storage Division
2016-01-05 20:42:19 +00:00
Andriy Voskoboinyk
430384523d iwm: revert r293178
This optimization is not proper (and causes kernel panic),
since driver checks fw_status to optimize away parsing stage
if it was already done.

Reported by:	dchagin
2016-01-05 20:09:26 +00:00
Ulrich Spörlein
623534d683 Fix undefined behavior when using asmc_fan_getstring()
It was returning a pointer to stack-allocated memory, so make the
allocation at the caller instead.

Found by:	clang static analyzer
Coverity:	CID 1245774
Reviewed by:	ed, rpaulo
Review URL:	https://reviews.freebsd.org/D4740
2016-01-05 10:25:22 +00:00
Hans Petter Selasky
076daeda4f Fix for directly connected FULL or LOW speed USB devices.
Found by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	1 week
2016-01-05 09:18:43 +00:00
Navdeep Parhar
9f6b62e791 iw_cxgbe: Shut down the socket but do not close the fd in case of error.
The fd is closed later in this case.  This fixes a "SS_NOFDREF on enter"
panic.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
Reviewed by:	Steve Wise @ Open Grid Computing
2016-01-05 01:32:40 +00:00
Andriy Voskoboinyk
60e9dd4e56 urtwn: add bits for R92C_HWSEQ_CTRL and R92C_TXPAUSE registers
Reviewed by:	kevlo
Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D4770
2016-01-04 21:16:49 +00:00
Andriy Voskoboinyk
55d352400f iwn: reduce code duplication in iwn_read_firmware()
- Separate 'firmware_put(sc->fw_fp, FIRMWARE_UNLOAD); sc->fw_fp = NULL;'
into iwn_unload_firmware().
- Move error handling to the end of iwn_read_firmware().

No functional changes.

Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D4768
2016-01-04 21:11:27 +00:00
Andriy Voskoboinyk
e8d24c2011 iwm: free firmware related resources after uploading it to the hardware
iwn(4) / wpi(4) works in the same way
(read_firmware() -> hw_init() -> firmware_put())

Approved by:	adrian (mentor)
Differential Revision:	https://reviews.freebsd.org/D4766
2016-01-04 21:07:08 +00:00
Andriy Voskoboinyk
1b3ae3ba63 iwm: store pointer for 'struct firmware' instead of
'size_t' and 'void *' pair.

Approved by:	adrian (mentor)
Obtained from:	DragonFlyBSD
Differential Revision:	https://reviews.freebsd.org/D4765
2016-01-04 21:03:01 +00:00
Andriy Voskoboinyk
dd693ac6da iwm: use m_collapse() to defragment a mbuf chain
- Simplify defragmentation code.
- Use proper number of dma segments for data.

Approved by:	adrian (mentor)
Obtained from:	DragonFlyBSD (mostly)
Differential Revision:	https://reviews.freebsd.org/D4754
2016-01-03 21:32:47 +00:00
Enji Cooper
1c66ead7d6 Fix ixl(4) compilation with PCI_IOV pre-r266974
stable/10 doesn't have the if_getdrvflags(9) KPI. Reference the field in the
structure directly if the __FreeBSD_version is < 1100022, so the driver can
be built with PCI_IOV support on stable/10, without backporting all of
r266974 (which requires additional changes due to projects/ifnet, etc)

Differential Revision: https://reviews.freebsd.org/D4759
Reviewed by: erj, sbruno
Sponsored by: EMC / Isilon Storage Division
2016-01-03 18:09:46 +00:00
Adrian Chadd
17f42e0d6b [ath] remove the inline version of the register access macros.
These are going to be much more efficient on low end embedded systems
but unfortunately they make it .. less convenient to implement correct
bus barriers and debugging.  They also didn't implement the register
serialisation workaround required for Owl (AR5416.)

So, just remove them for now.  Later on I'll just inline the routines
from ah_osdep.c.
2016-01-03 17:58:11 +00:00
Ian Lepore
3e0aa2f449 Eliminate code for walking through the early static env data. This code
is called from a device attach routine, and thus cannot be called before
the cutover from static to dynamic kernel env.
2016-01-03 14:46:19 +00:00