freebsd-dev

Author	SHA1	Message	Date
Ruslan Ermilov	ea26d58729	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
Colin Percival	491869163b	After finishing sending file data in sendfile(2), don't forget to send the provided trailers. This has been broken since revision 1.240. Submitted by: Dan Nelson PR: kern/120948 "sounds ok to me" from: phk MFC after: 3 days	2008-02-24 00:07:00 +00:00
Dag-Erling Smørgrav	60e15db992	This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payload consists of the null-terminated name and the contents of any structure you wish to record. A new ktrstruct() function constructs and emits a KTR_STRUCT record. It is accompanied by convenience macros for struct stat and struct sockaddr. In kdump(1), KTR_STRUCT records are handled by a dispatcher function that runs stringent sanity checks on its contents before handing it over to individual decoding funtions for each type of structure. Currently supported structures are struct stat and struct sockaddr for the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK and AF_IPX is present but disabled, as I am unable to test it properly. Since 's' was already taken, the letter 't' is used by ktrace(1) to enable KTR_STRUCT trace points, and in kdump(1) to enable their decoding. Derived from patches by Andrew Li <andrew2.li@citi.com>. PR: kern/117836 MFC after: 3 weeks	2008-02-23 01:01:49 +00:00
Simon L. B. Nielsen	1b7089994c	Fix sendfile(2) write-only file permission bypass. Security: FreeBSD-SA-08:03.sendfile Submitted by: kib	2008-02-14 11:44:31 +00:00
Poul-Henning Kamp	b75a1171d8	Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs referencing the files VM pages are returned from the network stack, making changes to the file safe. This flag does not guarantee that the data has been transmitted to the other end.	2008-02-03 15:54:41 +00:00
Poul-Henning Kamp	cf827063a9	Give MEXTADD() another argument to make both void pointers to the free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.	2008-02-01 19:36:27 +00:00
Robert Watson	265de5bb62	Correct two problems relating to sorflush(), which is called to flush read socket buffers in shutdown() and close(): - Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv(). - Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it. To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked. Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>	2008-01-31 08:22:24 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Jeff Roberson	397c19d175	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho	2007-12-30 01:42:15 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Randall Stewart	2afb3e849f	- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)	2007-08-27 05:19:48 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
Randall Stewart	b8709d23c5	- Add some needed error checking on bad fd passing in the sctp syscalls. Approved by: re@freebsd.org (Ken Smith) Obtained from: Weongyo Jeong (weongyo.jeong@gmail.com)	2007-07-02 12:50:53 +00:00
Andre Oppermann	1c9dbd1567	In kern_sendfile() adjust byte accounting of the file sending loop to ignore the size of any headers that were passed with the sendfile(2) system call. Otherwise the file sent will be truncated by the header size if the nbytes parameter was provided. The bug doesn't show up when either nbytes is zero, meaning send the whole file, or no header iovec is provided. Resolve a potential error aliasing of errors from the VM and sf_buf parts and the protocol send parts where an error of the latter over- writes one of the former. Update comments. The byte accounting bug wasn't seen in earlier because none of the popular sendfile(2) consumers, Apache, lighttpd and our ftpd(8) use it in modes that trigger it. The varnish HTTP proxy makes full use of it and exposed the problem. Bug found by: phk Tested by: phk	2007-05-19 20:50:59 +00:00
Robert Watson	d19e16a72c	Generally migrate to ANSI function headers, and remove 'register' use.	2007-05-16 20:41:08 +00:00
Robert Watson	7abab91135	sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.	2007-05-03 14:42:42 +00:00
Pawel Jakub Dawidek	eed20b37f5	Don't reinvent vm_page_grab(). Reviewed by: ups	2007-04-20 19:49:20 +00:00
Pawel Jakub Dawidek	fb1daf8164	Fix a bug in sendfile(2) when files larger than page size and nbytes=0. When nbytes=0, sendfile(2) should use file size. Because of the bug, it was sending half of a file. The bug is that 'off' variable can't be used for size calculation, because it changes inside the loop, so we should use uap->offset instead.	2007-04-19 05:54:45 +00:00
Robert Watson	7b20aa9ca6	Remove XXX comment that changes to file fields should be protected with the file lock rather than the filedesc lock: I fixed this in the last revision. Spotted by: kris	2007-04-06 23:31:30 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
John Baldwin	1ce2bc9187	Fix a fd leak in socketpair(): - Close the new file objects created during socketpair() if the copyout of the new file descriptors fails. - Add a test to the socketpair regression test for this edge case.	2007-04-02 19:15:47 +00:00
Robert Watson	873fbcd776	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.	2007-03-05 13:10:58 +00:00
Robert Watson	0c14ff0eb5	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.	2007-03-04 22:36:48 +00:00
Randall Stewart	6dbde03086	Fixes the MSG_PEEK for sctp_generic_recvmsg() the msg_flags were not being copied in properly so PEEK and any other msg_flags input operation were not being performed right. Approved by: gnn	2007-01-24 12:59:56 +00:00
Andre Oppermann	3e932ca715	In kern_sendfile() fix the calculation of sbytes (the total number of bytes written to the socket). The rewrite in revision 1.240 got confused by the FreeBSD 4.x bug compatibility code. For some reason lighttpd, that was used for testing the new sendfile code, was not affected by the problem but apache and others using headers/trailers in the sendfile call received incorrect sbytes values after return from non- blocking sockets. This then lead to restarts with wrong offsets and thus mixed up file contents when the socket was writeable again. All programs not using headers/trailers, like ftpd, were not affected by the bug. Reported by: Pawel Worach <pawel.worach-at-gmail.com> Tested by: Pawel Worach <pawel.worach-at-gmail.com>	2006-11-12 20:57:00 +00:00
Andre Oppermann	62b36a7fc2	Style cleanups to the sctp_* syscall functions.	2006-11-07 21:28:12 +00:00
Andre Oppermann	bda8b1f3b8	Handle early errors in kern_sendfile() by introducing a new goto 'out' label after the sbunlock() part. This correctly handles calls to sendfile(2) without valid parameters that was broken in rev. 1.240. Coverity error: 272162	2006-11-06 21:53:19 +00:00
Randall Stewart	f8829a4a40	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
Andre Oppermann	5e20f43d31	Rename m_getm() to m_getm2() and rewrite it to allocate up to page sized mbuf clusters. Add a flags parameter to accept M_PKTHDR and M_EOR mbuf chain flags. Provide compatibility macro for m_getm() calling m_getm2() with M_PKTHDR set. Rewrite m_uiotombuf() to use m_getm2() for mbuf allocation and do the uiomove() in a tight loop over the mbuf chain. Add a flags parameter to accept mbuf flags to be passed to m_getm2(). Adjust all callers for the extra parameter. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:37:22 +00:00
Andre Oppermann	d99b0dd2c5	Rewrite kern_sendfile() to work in two loops, the inner which turns as many VM pages into mbufs as it can -- up to the free send socket buffer space. The outer loop then drops the whole mbuf chain into the send socket buffer, calls tcp_output() on it and then waits until 50% of the socket buffer are free again to repeat the cycle. This way tcp_output() gets the full amount of data to work with and can issue up to 64K sends for TSO to chop up in the network adapter without using any CPU cycles. Thus it gets very efficient especially with the readahead the VM and I/O system do. The previous sendfile(2) code simply looped over the file, turned each 4K page into an mbuf and sent it off. This had the effect that TSO could only generate 2 packets per send instead of up to 44 at its maximum of 64K. Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of sleeping on mbuf allocation failures. Benchmarking shows significant improvements (95% confidence): 45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO) 83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 16:53:26 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
Alan Cox	5786be7cc7	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
Alan Cox	7c4b7ecc4c	Reduce the scope of the page queues lock in kern_sendfile() now that vm_page_sleep_if_busy() no longer requires the caller to hold the page queues lock.	2006-08-06 01:00:09 +00:00
Alan Cox	10c09f3f61	The page queues lock is no longer required by vm_page_io_start(). Reduce the scope of the page queues lock in kern_sendfile() accordingly.	2006-08-04 05:53:20 +00:00
John Baldwin	f30e89ced3	Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.	2006-07-27 19:54:41 +00:00
Robert Watson	b0668f7151	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
John Baldwin	b33887ea31	Don't free the sockaddr in kern_bind() and kern_connect() as not all callers pass a sockaddr allocated via malloc() from M_SONAME anymore. Instead, free it in the callers when necessary.	2006-07-19 18:28:52 +00:00
John Baldwin	c870740e09	- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for use by ABI emulators. - Alter the interface of kern_recvit() somewhat. Specifically, go ahead and hard code UIO_USERSPACE in the uio as that's what all the callers specify. In place, add a new uioseg to indicate what type of pointer is in mp->msg_name. Previously it was always a userland address, but ABI emulators may pass in kernel-side sockaddrs. Also, remove the namelenp field and instead require the two places that used it to explicitly copy mp->msg_namelen out to userland. - Use the patched kern_recvit() to replace svr4_recvit() and the stock kern_sendit() to replace svr4_sendit(). - Use kern_bind() instead of stackgap use in ti_bind(). - Use kern_getpeername() and kern_getsockname() instead of stackgap in svr4_stream_ti_ioctl(). - Use kern_connect() instead of stackgap in svr4_do_putmsg(). - Use kern_getpeername() and kern_accept() instead of stackgap in svr4_do_getmsg(). - Retire the stackgap from SVR4 compat as it is no longer used.	2006-07-10 21:38:17 +00:00
George V. Neville-Neil	fb11be62a2	Properly cast the values of valsize (the size of the value passed in) in setsockopt so that they can be compared correctly against negative values. Passing in a negative value had a rather negative effect on our socket code, making it impossible to open new sockets. PR: 98858 Submitted by: James.Juran@baesystems.com MFC after: 1 week	2006-06-20 12:36:40 +00:00
Robert Watson	b37ffd3189	Move some functions and definitions from uipc_socket2.c to uipc_socket.c: - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month	2006-06-10 14:34:07 +00:00
Robert Watson	20bdac8a4f	Use getsock() and fput() instead of fgetsock() and fputsock() in sendfile(). This causes sendfile() to use the file descriptor reference to the socket instead of bumping the socket reference count, which avoids an additional refcount operation, as well as a potential expensive socket refcount drop, which can lead to contention on the accept mutex. This change also has the side effect of further reducing the number of cases where an in-progress I/O operation can occur on a socket after close, as using the file descriptor refcount prevents the socket from closing while in use. MFC after: 3 months	2006-05-25 15:10:13 +00:00
Robert Watson	102ea03373	Extend getsock() to return the struct file flags read while holding the file lock, in the style of fgetsock(). Modify accept1() to use getsock() instead of fgetsock(), relying on the file descriptor reference rather than an acquired socket reference to prevent the listen socket from being destroyed during accept(). This avoids additional reference count operations, which should improve performance, and also avoids accept1() operating on a socket whose file descriptor has been torn down, which may have resulted in protocol shutdown starting. MFC after: 3 months	2006-04-25 11:48:16 +00:00
Robert Watson	fa4c5373ce	Add comment to accept1() that it should use getsock() instead of fgetsock() to avoid additional mutex operations, and also to avoid use of soref/sorele which are now not preferred. MFC after: 3 months	2006-04-01 11:14:56 +00:00
Alan Cox	7c8dcf2def	Use NET_LOCK_GIANT() and VFS_LOCK_GIANT() instead of unconditionally acquiring Giant in kern_sendfile(). Guard against the forced reclamation of a vnode in kern_sendfile(). Discussed with: jeff Reviewed by: tegge MFC after: 3 weeks	2006-03-27 04:23:16 +00:00
Paul Saab	fa545f434c	Fix 32bit sendfile by implementing kern_sendfile so that it takes the header and trailers as iovec arguments instead of copying them in inside of sendfile. Reviewed by: jhb MFC after: 3 weeks	2006-02-28 19:39:18 +00:00
Paul Saab	ecc44de7a2	Reformat socket control messages on input/output for 32bit compatibility on 64bit systems. Submitted by: ps, ups Reviewed by: jhb	2005-10-31 21:09:56 +00:00
Paul Saab	a372f8224c	Implement the 32bit versions of recvmsg, recvfrom, sendmsg Partially obtained from: jhb	2005-10-15 05:57:06 +00:00
Robert Watson	6758f88ea4	Add MAC Framework and MAC policy entry point mac_check_socket_create(), which is invoked from socket() and socketpair(), permitting MAC policy modules to control the creation of sockets by domain, type, and protocol. Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl) Requested by: SCC	2005-07-05 22:49:10 +00:00
Maksim Yevmenkin	75ae257016	Change m_uiotombuf so it will accept offset at which data should be copied to the mbuf. Offset cannot exceed MHLEN bytes. This is currently used to fix Ethernet header alignment problem on alpha and sparc64. Also change all users of m_uiotombuf to pass proper offset. Reviewed by: jmg, sam Tested by: Sten Spans "sten AT blinkenlights DOT nl" MFC after: 1 week	2005-05-04 18:55:03 +00:00
Robert Watson	7f53207b92	Introduce three additional MAC Framework and MAC Policy entry points to control socket poll() (select()), fstat(), and accept() operations, required for some policies: poll() mac_check_socket_poll() fstat() mac_check_socket_stat() accept() mac_check_socket_accept() Update mac_stub and mac_test policies to be aware of these entry points. While here, add missing entry point implementations for: mac_stub.c stub_check_socket_receive() mac_stub.c stub_check_socket_send() mac_test.c mac_test_check_socket_send() mac_test.c mac_test_check_socket_visible() Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA	2005-04-16 18:46:29 +00:00
Jeff Roberson	f247a5240d	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
Maxim Sobolev	8d6e40c3f1	Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress SIGPIPE signal for the duration of the sento-family syscalls. Use it to replace previously added hack in Linux layer based on temporarily setting SO_NOSIGPIPE flag. Suggested by: alfred	2005-03-08 16:11:41 +00:00
Robert Watson	29bdd01910	Remove now unused 'int s' from spl(). MFC after: 3 days	2005-02-18 21:39:55 +00:00
Robert Watson	d8d716bef5	De-spl kern_connect(). MFC after: 3 days	2005-02-18 19:37:36 +00:00
Robert Watson	1e8f89541e	In accept1(), extend coverage of the socket lock from just covering soref() to also covering the update of so_state. While no other user threads can update the socket state here as it's not yet hooked up to the file descriptor array yet, the protocol could also frob the socket state here, leading to a lost update to the so_state field. No reported instances of this bug (as yet). MFC after: 3 days	2005-02-17 13:00:23 +00:00
Maxim Sobolev	a6886ef173	Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator. MFC after: 2 weeks	2005-01-30 07:20:36 +00:00
Poul-Henning Kamp	8516dd18e1	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
Poul-Henning Kamp	f6dc414a5c	Save a line by unlocking before we test.	2005-01-24 14:13:24 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Poul-Henning Kamp	124e4c3be8	Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST. Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.	2004-11-13 11:53:02 +00:00
Alan Cox	d3cb0d99e0	Introduce two new options, "CPU private" and "no wait", to sf_buf_alloc(). Change the spelling of the "catch" option to be consistent with the new options. Implement the "no wait" option. An implementation of the "CPU private" for i386 will be committed at a later date.	2004-11-08 00:43:46 +00:00
Poul-Henning Kamp	ef11fbd7c4	Introduce fdclose() which will clean an entry in a filedesc. Replace homerolled versions with call to fdclose(). Make fdunused() static to kern_descrip.c	2004-11-07 22:16:07 +00:00
Poul-Henning Kamp	b1fa752732	Use fget_locked() instead of homerolled	2004-11-07 16:09:56 +00:00
Alan Cox	d19ef81437	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
Robert Watson	ae1e5c5d8d	Move from using the socket reference count to the file reference count to prevent sockets from being garbage collected during socket-specific system calls. This is the same approach used in most VFS-specific system calls, as well as generic file descriptor system calls such as read() and write(). To do this, add a utility function getsock(), which is logically identical to getvnode() used for the same purpose in VFS. Unlike fgetsock(), it returns with the file reference count elevated, but no bump of the socket reference count. Replace matching calls to fputsock() with fdrop(). This change is made to all socket system calls other than sendfile() and accept(), but the approach should be applicable to those system calls also. This shaves about four mutex operations off of each of these system calls, including send() and recv() variants, adding about 1% to pps on minimal UDP packets for UP using netblast, and 4% on SMP. Reviewed by: pjd	2004-10-24 23:45:01 +00:00
Alan Cox	01ad40dac5	Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().	2004-10-24 20:09:59 +00:00
Alan Cox	0f777d7d9b	Modify the vm object locking in do_sendfile() so that the containing object is locked when vm_page_io_finish() is called on a page. This is to satisfy a new, post-RELENG_5 assertion in vm_page_io_finish(). (I am in the process of transitioning the responsibility for synchronizing access to various fields/flags on the page from the global page queues lock to the per-object lock.) Tripped over by: obrien@	2004-10-20 17:44:40 +00:00
Alan Cox	86dac448f2	Add a SOCKBUF_LOCK() to a rarely executed path in do_sendfile().	2004-10-02 05:37:47 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
David Malone	e140eb430c	Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.	2004-07-17 21:06:36 +00:00
Poul-Henning Kamp	552afd9c12	Clean up and wash struct iovec and struct uio handling. Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.	2004-07-10 15:42:16 +00:00
Robert Watson	6ec70e64c6	Remove spl()'s from do_sendfile().	2004-07-09 01:46:03 +00:00
Robert Watson	ad6b0efff5	Acquire socket lock in the "waiting for connection" loop in kern_connect(), replacing tsleep() with msleep() with the socket mutex.	2004-06-24 01:43:23 +00:00
Bruce M Simpson	a3146ff925	Fix an inconsistency in socket option propagation on accept(). Propagate the SS_NBIO flag from the parent socket to the child socket during an accept() operation. The file descriptor O_NONBLOCK flag would have been propagated already by the fflag assignment, and therefore would have been inconsistent with the underlying socket's so_state member. This makes accept() more closely adhere to the API contract we effectively outline in the manual page. Note also that Linux continues to differ here; O_NONBLOCK is not propagated. The other BSDs do propagate the flag, as does Solaris. The Single UNIX Specification does not offer specific advice on this issue. PR: kern/45733 Requested by: Jayanth Vijayaraghavan Reviewed by: rwatson	2004-06-22 23:58:09 +00:00
Robert Watson	31f555a1c5	Assert socket buffer lock in sb_lock() to protect socket buffer sleep lock state. Convert tsleep() into msleep() with socket buffer mutex as argument. Hold socket buffer lock over sbunlock() to protect sleep lock state. Assert socket buffer lock in sbwait() to protect the socket buffer wait state. Convert tsleep() into msleep() with socket buffer mutex as argument. Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK() in order to call into these functions with the lock, as well as to start protecting other socket buffer use in their implementation. Drop the socket buffer mutexes around calls into the protocol layer, around potentially blocking operations, for copying to/from user space, and VM operations relating to zero-copy. Assert the socket buffer mutex strategically after code sections or at the beginning of loops. In some cases, modify return code to ensure locks are properly dropped. Convert the potentially blocking allocation of storage for the remote address in soreceive() into a non-blocking allocation; we may wish to move the allocation earlier so that it can block prior to acquisition of the socket buffer lock. Drop some spl use. NOTE: Some races exist in the current structuring of sosend() and soreceive(). This commit only merges basic socket locking in this code; follow-up commits will close additional races. As merged, these changes are not sufficient to run without Giant safely. Reviewed by: juli, tjr	2004-06-19 03:23:14 +00:00
Robert Watson	c0b99ffa02	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
Robert Watson	310e7ceb94	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
Robert Watson	3e87b34a25	Correct whitespace errors in merge from rwatson_netperf: tabs instead of spaces, no trailing tab at the end of line. Pointed out by: csjp	2004-06-12 23:36:59 +00:00
Robert Watson	395a08c904	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
Poul-Henning Kamp	1930e303cf	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.	2004-06-11 11:16:26 +00:00
Robert Watson	aa57bb0424	Correct a resource leak introduced in recent accept locking changes: when I reordered events in accept1() to allocate a file descriptor earlier, I didn't properly update use of goto on exit to unwind for cases where the file descriptor is now held, but wasn't previously. The result was that, in the event of accept() on a non-blocking socket, or in the event of a socket error, a file descriptor would be leaked. This ended up being non-fatal in many cases, as the file descriptor would be properly GC'd on process exit, so only showed up for processes that do a lot of non-blocking accept() calls, and also live for a long time (such as qmail). This change updates the use of goto targets to do additional unwinding. Eyes provided by: Brian Feldman <green@freebsd.org> Feet, hands provided by: Stefan Ehmann <shoesoft@gmx.net>, Dimitry Andric <dimitry@andric.com> Arjan van Leeuwen <avleeuwen@piwebs.com>	2004-06-07 21:45:44 +00:00
Hajimu UMEMOTO	7a1a900c65	allow more than MLEN bytes for ancillary data to meet the requirement of Section 20.1 of RFC3542. Obtained from: KAME MFC after: 1 week	2004-06-07 09:59:50 +00:00
Robert Watson	2658b3bb8e	Integrate accept locking from rwatson_netperf, introducing a new global mutex, accept_mtx, which serializes access to the following fields across all sockets: so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes. While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular: - Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize. - In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code. - In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one. - In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller. - Generally cluster accept queue related operations together throughout these functions in order to facilitate locking. Annotate new locking in socketvar.h.	2004-06-02 04:15:39 +00:00
Robert Watson	36568179e3	The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.	2004-06-01 02:42:56 +00:00
Bosko Milekic	099a0e588c	Bring in mbuma to replace mballoc. mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)	2004-05-31 21:46:06 +00:00
Robert Watson	f7250466a8	Unconditionally lock Giant in do_sendfile(), rather than locking it conditional on debug.mpsafenet. We can try pushing down Giant here later, but we don't want to enter VFS without holding Giant. Bumped into by: kris	2004-05-08 02:24:21 +00:00
Alan Cox	5a32489377	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@	2004-05-06 05:03:23 +00:00
Mike Silbersack	e8410540b7	Fix a regression in my change which sends headers along with data; a side effect of that change caused headers to not be sent if a 0 byte file was passed to sendfile. This change fixes that behavior, allowing sendfile to send out the headers even with a 0 byte file again. Noticed by: Dirk Engling	2004-04-08 07:14:34 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Robert Watson	051bbf603a	Detatch incorrect spellings of detach.	2004-04-04 19:15:45 +00:00
Alan Cox	121230a40d	In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it should not. Add a new parameter so that the caller can specify which is the case. Reported by: dillon	2004-04-03 09:16:27 +00:00
Robert Watson	627e4a9973	Conditionally acquire Giant when entering the sockets layer via the socket-specific system calls based on debug.mpsafenet, rather than acquiring Giant unconditionally.	2004-03-29 02:21:56 +00:00
Robert Watson	74041f5a10	When validating that the length sum in recvit(), we fail to release Giant on an error. Add a Giant acquisition. Reviewed by: sam, bms	2004-03-29 01:37:06 +00:00
Alan Cox	90ecfebd82	Refactor the existing machine-dependent sf_buf_free() into a machine- dependent function by the same name and a machine-independent function, sf_buf_mext(). Aside from the virtue of making more of the code machine- independent, this change also makes the interface more logical. Before, sf_buf_free() did more than simply undo an sf_buf_alloc(); it also unwired and if necessary freed the page. That is now the purpose of sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used as a general-purpose emphemeral map cache.	2004-03-16 19:04:28 +00:00
Robert Watson	0b759971a2	Remove unneeded label 'done2' from socket(). We now grab Giant only around socreate(), and don't need it for file descriptor accesses. Submitted by: sam	2004-03-04 01:57:48 +00:00
Mike Silbersack	b49d824e8b	Add the SF_NODISKIO flag to sendfile. This flag causes sendfile to be mindful of blocking on disk I/O and instead return EBUSY when such blocking would occur. Results from the DeBox project indicate that blocking on disk I/O can slow the performance of a kqueue/poll based webserver. Using a flag such as SF_NODISKIO and throwing connections that would block to helper processes/threads helped increase performance. Currently, only the Flash webserver uses this flag, although it could probably be applied to thttpd with relative ease. Idea by: Yaoping Ruan & Vivek Pai	2004-02-08 07:35:48 +00:00
Mike Silbersack	ff5e43a3fd	Rename iov_to_uio to uiofromiov to be more consistent with other uio* functions. Suggested by: bde	2004-02-04 08:43:21 +00:00
Mike Silbersack	beb699c7ba	Rewrite sendfile's header support so that headers are now sent in the first packet along with data, instead of in their own packet. When serving files of size (packetsize - headersize) or smaller, this will result in one less packet crossing the network. Quick testing with thttpd and http_load has shown a noticeable performance improvement in this case (350 vs 330 fetches per second.) Included in this commit are two support routines, iov_to_uio, and m_uiotombuf; these routines are used by sendfile to construct the header mbuf chain that will be linked to the rest of the data in the socket buffer.	2004-02-01 07:56:44 +00:00

1 2 3 4 5 ...

319 Commits