d99b0dd2c5
VM pages into mbufs as it can -- up to the free send socket buffer space. The outer loop then drops the whole mbuf chain into the send socket buffer, calls tcp_output() on it and then waits until 50% of the socket buffer are free again to repeat the cycle. This way tcp_output() gets the full amount of data to work with and can issue up to 64K sends for TSO to chop up in the network adapter without using any CPU cycles. Thus it gets very efficient especially with the readahead the VM and I/O system do. The previous sendfile(2) code simply looped over the file, turned each 4K page into an mbuf and sent it off. This had the effect that TSO could only generate 2 packets per send instead of up to 44 at its maximum of 64K. Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of sleeping on mbuf allocation failures. Benchmarking shows significant improvements (95% confidence): 45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO) 83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month |
||
---|---|---|
.. | ||
amd64 | ||
arm | ||
boot | ||
bsm | ||
cam | ||
coda | ||
compat | ||
conf | ||
contrib | ||
crypto | ||
ddb | ||
dev | ||
fs | ||
gdb | ||
geom | ||
gnu | ||
i4b | ||
i386 | ||
ia64 | ||
isa | ||
isofs/cd9660 | ||
kern | ||
libkern | ||
modules | ||
net | ||
net80211 | ||
netatalk | ||
netatm | ||
netgraph | ||
netinet | ||
netinet6 | ||
netipsec | ||
netipx | ||
netkey | ||
netnatm | ||
netncp | ||
netsmb | ||
nfs | ||
nfs4client | ||
nfsclient | ||
nfsserver | ||
opencrypto | ||
pc98 | ||
pccard | ||
pci | ||
posix4 | ||
powerpc | ||
rpc | ||
security | ||
sparc64 | ||
sun4v | ||
sys | ||
tools | ||
ufs | ||
vm | ||
Makefile |