- Add optimizing LRO wrapper which pre-sorts all incoming packets
according to the hash type and flowid. This prevents exhaustion of
the LRO entries due to too many connections at the same time.
Testing using a larger number of higher bandwidth TCP connections
showed that the incoming ACK packet aggregation rate increased from
~1.3:1 to almost 3:1. Another test showed that for a number of TCP
connections greater than 16 per hardware receive ring, where 8 TCP
connections was the LRO active entry limit, there was a significant
improvement in throughput due to being able to fully aggregate more
than 8 TCP stream. For very few very high bandwidth TCP streams, the
optimizing LRO wrapper will add CPU usage instead of reducing CPU
usage. This is expected. Network drivers which want to use the
optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
"tcp_lro_flush()". Further the LRO control structure must be
initialized using "tcp_lro_init_args()" passing a non-zero number
into the "lro_mbufs" argument.
- Make LRO statistics 64-bit. Previously 32-bit integers were used for
statistics which can be prone to wrap-around. Fix this while at it
and update all SYSCTL's which expose LRO statistics.
- Ensure all data is freed when destroying a LRO control structures,
especially leftover LRO entries.
- Reduce number of memory allocations needed when setting up a LRO
control structure by precomputing the total amount of memory needed.
- Add own memory allocation counter for LRO.
- Bump the FreeBSD version to force recompilation of all KLDs due to
change of the LRO control structure size.
Sponsored by: Mellanox Technologies
Reviewed by: gallatin, sbruno, rrs, gnn, transport
Tested by: Netflix
Differential Revision: https://reviews.freebsd.org/D4914
If the NVSP protocol version is not greater than NVSP_PROTOCOL_VERSION_2,
then the recv buffer size is 15MB, otherwise the buffer size is 16MB.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: royger, Dexuan Cui <decui microsoft com>, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4814
Submitted by: Howard Su <howard0su gmail com>
Reviewed by: royger, Dexuan Cui <decui microsoft com>, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4693
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: delphij, royger, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4676
We don't need them at all.
Submitted by: Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Reviewed by: royger, adrian, delphij
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D4595
This is first step to move the generic part of HV code into kernel instead
of module, so that it is possible to use hypercall to implement some other
paravirtualization code in the kernel.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: royger, delphij, adrian
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D3072
This one mainly avoids mbuf cluster allocation for TCP ACKs during
TCP sending tests. And it gives me ~200Mbps improvement (4.7Gbps
-> 4.9Gbps), when running iperf3 TCP sending test w/ 16 connections.
While I'm here, nuke the unnecessary zeroing out pkthdr.csum_flags.
Reviewed by: adrain
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4853
Many applications and kernel modules (e.g. bridge) rely on the ifmedia
status report; give them what they want.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Jun Su <junsu microsoftc com>, me, adrian
Modified by: me (minor)
Original differential: https://reviews.freebsd.org/D4611
Differential Revision: https://reviews.freebsd.org/D4852
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
- Implement the LRO using tcp_lro APIs, and LRO is enabled by default.
- Add several stats sysctl nodes.
- Check IP/TCP length before sending the packet to tcp_lro_rx(), if host
does not provide RX csum information (*); and add an option through
sysctl to always trust host TCP segment csum checks (default is off).
- Add sysctl to control the LRO entry depth; it is disabled by default.
It is used to avoid holding too much TCP segments in driver. Limiting
the LRO entry depth helps a lot in a one/two streams RX test.
This one 3x the RX performance on my local test (3Gbps -> 10Gbps), and
~2x the RX performance over a directly connected 40Ge network (5Gbps ->
9Gbps).
(*) It seems the host stops supplying csum information, once the network
load is high. This still needs investigation...
Reviewed by: Hongjiang Zhang <honzhan microsoft com>,
Dexuan Cui <decui microsoft com>,
Jun Su <junsu microsoft com>,
delphij
Tested by: me (local),
Hongjiang Zhang <honzhan microsoft com>
(directly connected 40Ge)
Approved by: delphij (mentor), adrian (mentor, no objection)
With feedback from: delphij, Hongjiang Zhang <honzhan microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4824
We'll remove the per-channel control_work_queue because it can't properly
do serialization of message handling, e.g., when there are 2 NIC devices,
vmbus_channel_on_offer() -> hv_queue_work_item() has a race condition:
for an SMP VM, vmbus_channel_process_offer() can run concurrently on
different CPUs and if the second NIC's
vmbus_channel_process_offer() -> hv_vmbus_child_device_register() runs
first, the second NIC's name will be hn0 and the first NIC's name will
be hn1!
We can fix the race condition by removing the per-channel control_work_queue
and run all the message handlers in the global
hv_vmbus_g_connection.work_queue -- we'll do this in the next patch.
With the coming next patch, we have to run the non-blocking handlers
directly in the kernel thread vmbus_msg_swintr(), because the special
handling of sub-channel: when a sub-channel (e.g., of the storvsc driver)
is received and being handled in vmbus_channel_on_offer() running on the
global hv_vmbus_g_connection.work_queue, vmbus_channel_process_offer()
invokes channel->sc_creation_callback, i.e., storvsc_handle_sc_creation,
and the callback will invoke hv_vmbus_channel_open() -> hv_vmbus_post_message
and expect a further reply from the host, but the handling of the further
messag can't be done because the current message's handling hasn't finished
yet; as result, hv_vmbus_channel_open() -> sema_timedwait() will time out
and th device can't work.
Also renamed the handler type from hv_pfn_channel_msg_handler to
vmbus_msg_handler: the 'pfn' and 'channel' in the old name make no sense.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: royger
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4596
Now vmbus_channel_on_offer() -> vmbus_channel_process_offer() can
safely run on the global hv_vmbus_g_connection.work_queue now.
We remove the per-channel control_work_queue to achieve the proper
serialization of the message handling.
I removed the bogus TODO in vmbus_channel_on_offer(): a vmbus offer
can only come from the parent partition, i.e., the host.
PR: kern/205156
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Howard Su <howard0su gmail com>, delphij
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4597
Without the patch, there is a race condition: when poll() is invoked(),
if kvp_globals.daemon_busy is false, the daemon won't be timely
woke up, because hv_kvp_send_msg_to_daemon() can't wake up the daemon
in this case.
Submitted by: Dexuan Cui <decui@microsoft.com>
Sponsored by: Microsoft OSTC
Reviewed by: delphij, royger
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D4258
Windows Server 2012 and earlier hosts.
Submitted by: whu
Reviewed by: royger
Approved by: royger
MFC after: 3 days
Relnotes: No
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D3086
offload support was introduced in r284746.
While here also fix the ioctl() handler for IPv4 added in r279819,
which was never compiled in given opt_inet.h was not included.
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
- Vmbus multi channel support.
- Vector interrupt support.
- Signal optimization.
- Storvsc driver performance improvement.
- Scatter and gather support for storvsc driver.
- Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.
PR: 195238
Submitted by: whu
Reviewed by: royger, jhb, delphij
Approved by: royger
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Microsoft OSTC
- Bump link state when stopping or starting the interface;
- Don't handle SIOCGIFADDR specially, similar to r277103.
This change is based on a previous revision from Andy Zhang
(Microsoft) who did the diagnostic work and many thanks to
them for their help in supporting the HyperV work.
PR: kern/187203
MFC after: 2 weeks
Previously, any timeout value for which (timeout * hz) will overflow the
signed integer, will give weird results, since callout(9) routines will
convert negative values of ticks to '1'. For unsigned integer overflow we
will get sufficiently smaller timeout values than expected.
Switch from callout_reset, which requires conversion to int based ticks
to callout_reset_sbt to avoid this.
Also correct isci to correctly resolve ccb timeout.
This was based on the original work done by Eygene Ryabinkin
<rea@freebsd.org> back in 5 Aug 2011 which used a macro to help avoid
the overlow.
Differential Revision: https://reviews.freebsd.org/D1157
Reviewed by: mav, davide
MFC after: 1 month
Sponsored by: Multiplay
it, except Ethernet, where it carried ng_ether(4) pointer.
For now carry the pointer in if_l2com directly.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
the Microsoft Azure service does not recognize the second
attached disk on the system.
Submitted by: kyliel@Microsoft
Patched by: weh@Microsoft
PR: 194376
MFC after: 3 days
X-MFC-10.1: yes, ASAP
Sponsored by: The FreeBSD Foundation
many thanks for their continued support of FreeBSD.
While I'm there, also implement a new build knob, WITHOUT_HYPERV to
disable building and installing of the HyperV utilities when necessary.
The HyperV utilities are only built for i386 and amd64 targets.
This is a stable/10 candidate for inclusion with 10.1-RELEASE.
Submitted by: Wei Hu <weh microsoft com>
MFC after: 1 week
This fixes kernel panic during boot, caused by incompatibility of recent
CAM locking changes and this bus scanner code.
Submitted by: Microsoft
MFC after: 1 week
resist easy conversion since they implement a great deal of their attach
logic inside probe(). Some of this could be fixed by moving it to attach(),
but some requires something more subtle than BUS_PROBE_NOWILDCARD.
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs, grehan
Approved by: re (gjb)
sys/sys/systm.h:
* Add a new VM_GUEST type, VM_GUEST_HV (HyperV guest).
sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c:
sys/dev/hyperv/vmbus/hv_hv.c:
sys/dev/hyperv/stordisengage/hv_ata_pci_disengage.c:
* Set vm_guest to VM_GUEST_HV and use that on other HyperV related
devices instead of cloning the cpuid hypervisor check.
* Cleanup the vmbus_identify function.
and holding a reference prior to calling further into the hyperv
stack.
Added missing FreeBSD idents.
Submitted by: Microsoft hyperv dev team
Approved by: re@ (gjb)
union members in strict C99, by giving them names. While here, add some
FreeBSD keywords where they were missing.
Approved by: re (gjb)
Reviewed by: grehan
aware drivers on Xen hypervisors that advertise support for some
HyperV features.
x86/xen/hvm.c:
When running in HVM mode on a Xen hypervisor, set vm_guest
to VM_GUEST_XEN so other virtualization aware components in
the FreeBSD kernel can detect this mode is active.
dev/hyperv/vmbus/hv_hv.c:
Use vm_guest to ignore Xen's HyperV emulation when Xen is
detected and Xen PV drivers are active.
Reported by: Shanker Balan
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (Xen blanket)
- change the SI_SUB_RUN_SCHEDULER sysinits in hv_utilc and
hv_netvsc_drv_freebsd.c to SI_SUB_KTHREAD_IDLE, since the
former is no longer in FreeBSD.
The use of these SYSINITs can probably be removed.
be pulled into FreeBSD. From now, FreeBSD will be considered the
upstream repo.
First step: move the drivers away from the contrib area and into
the base system.
A follow-on commit will include the drivers in the amd64 GENERIC kernel.