freebsd-skq

Author	SHA1	Message	Date
alc	b98eae58a6	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().	2006-08-09 17:43:27 +00:00
alc	67d9b76d0e	Reduce the scope of the page queues lock in kern_sendfile() now that vm_page_sleep_if_busy() no longer requires the caller to hold the page queues lock.	2006-08-06 01:00:09 +00:00
alc	fe447f8ea1	The page queues lock is no longer required by vm_page_io_start(). Reduce the scope of the page queues lock in kern_sendfile() accordingly.	2006-08-04 05:53:20 +00:00
jhb	6b46a69f12	Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.	2006-07-27 19:54:41 +00:00
rwatson	40868fda8a	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman	2006-07-24 15:20:08 +00:00
jhb	947b8c9fbd	Don't free the sockaddr in kern_bind() and kern_connect() as not all callers pass a sockaddr allocated via malloc() from M_SONAME anymore. Instead, free it in the callers when necessary.	2006-07-19 18:28:52 +00:00
jhb	cfc179a934	- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for use by ABI emulators. - Alter the interface of kern_recvit() somewhat. Specifically, go ahead and hard code UIO_USERSPACE in the uio as that's what all the callers specify. In place, add a new uioseg to indicate what type of pointer is in mp->msg_name. Previously it was always a userland address, but ABI emulators may pass in kernel-side sockaddrs. Also, remove the namelenp field and instead require the two places that used it to explicitly copy mp->msg_namelen out to userland. - Use the patched kern_recvit() to replace svr4_recvit() and the stock kern_sendit() to replace svr4_sendit(). - Use kern_bind() instead of stackgap use in ti_bind(). - Use kern_getpeername() and kern_getsockname() instead of stackgap in svr4_stream_ti_ioctl(). - Use kern_connect() instead of stackgap in svr4_do_putmsg(). - Use kern_getpeername() and kern_accept() instead of stackgap in svr4_do_getmsg(). - Retire the stackgap from SVR4 compat as it is no longer used.	2006-07-10 21:38:17 +00:00
gnn	549bd60e43	Properly cast the values of valsize (the size of the value passed in) in setsockopt so that they can be compared correctly against negative values. Passing in a negative value had a rather negative effect on our socket code, making it impossible to open new sockets. PR: 98858 Submitted by: James.Juran@baesystems.com MFC after: 1 week	2006-06-20 12:36:40 +00:00
rwatson	120490c1a5	Move some functions and definitions from uipc_socket2.c to uipc_socket.c: - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month	2006-06-10 14:34:07 +00:00
rwatson	032282fd7e	Use getsock() and fput() instead of fgetsock() and fputsock() in sendfile(). This causes sendfile() to use the file descriptor reference to the socket instead of bumping the socket reference count, which avoids an additional refcount operation, as well as a potential expensive socket refcount drop, which can lead to contention on the accept mutex. This change also has the side effect of further reducing the number of cases where an in-progress I/O operation can occur on a socket after close, as using the file descriptor refcount prevents the socket from closing while in use. MFC after: 3 months	2006-05-25 15:10:13 +00:00
rwatson	dd8ff1c1c5	Extend getsock() to return the struct file flags read while holding the file lock, in the style of fgetsock(). Modify accept1() to use getsock() instead of fgetsock(), relying on the file descriptor reference rather than an acquired socket reference to prevent the listen socket from being destroyed during accept(). This avoids additional reference count operations, which should improve performance, and also avoids accept1() operating on a socket whose file descriptor has been torn down, which may have resulted in protocol shutdown starting. MFC after: 3 months	2006-04-25 11:48:16 +00:00
rwatson	cbb87d3f67	Add comment to accept1() that it should use getsock() instead of fgetsock() to avoid additional mutex operations, and also to avoid use of soref/sorele which are now not preferred. MFC after: 3 months	2006-04-01 11:14:56 +00:00
alc	e299a61648	Use NET_LOCK_GIANT() and VFS_LOCK_GIANT() instead of unconditionally acquiring Giant in kern_sendfile(). Guard against the forced reclamation of a vnode in kern_sendfile(). Discussed with: jeff Reviewed by: tegge MFC after: 3 weeks	2006-03-27 04:23:16 +00:00
ps	6014145f38	Fix 32bit sendfile by implementing kern_sendfile so that it takes the header and trailers as iovec arguments instead of copying them in inside of sendfile. Reviewed by: jhb MFC after: 3 weeks	2006-02-28 19:39:18 +00:00
ps	bd0529b5a0	Reformat socket control messages on input/output for 32bit compatibility on 64bit systems. Submitted by: ps, ups Reviewed by: jhb	2005-10-31 21:09:56 +00:00
ps	a72385743d	Implement the 32bit versions of recvmsg, recvfrom, sendmsg Partially obtained from: jhb	2005-10-15 05:57:06 +00:00
rwatson	efcac3d02e	Add MAC Framework and MAC policy entry point mac_check_socket_create(), which is invoked from socket() and socketpair(), permitting MAC policy modules to control the creation of sockets by domain, type, and protocol. Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl) Requested by: SCC	2005-07-05 22:49:10 +00:00
emax	a52b6c9ce3	Change m_uiotombuf so it will accept offset at which data should be copied to the mbuf. Offset cannot exceed MHLEN bytes. This is currently used to fix Ethernet header alignment problem on alpha and sparc64. Also change all users of m_uiotombuf to pass proper offset. Reviewed by: jmg, sam Tested by: Sten Spans "sten AT blinkenlights DOT nl" MFC after: 1 week	2005-05-04 18:55:03 +00:00
rwatson	155bfd8789	Introduce three additional MAC Framework and MAC Policy entry points to control socket poll() (select()), fstat(), and accept() operations, required for some policies: poll() mac_check_socket_poll() fstat() mac_check_socket_stat() accept() mac_check_socket_accept() Update mac_stub and mac_test policies to be aware of these entry points. While here, add missing entry point implementations for: mac_stub.c stub_check_socket_receive() mac_stub.c stub_check_socket_send() mac_test.c mac_test_check_socket_send() mac_test.c mac_test_check_socket_visible() Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA	2005-04-16 18:46:29 +00:00
jeff	97c40ebd49	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.	2005-03-31 04:37:09 +00:00
sobomax	b795e2430a	Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress SIGPIPE signal for the duration of the sento-family syscalls. Use it to replace previously added hack in Linux layer based on temporarily setting SO_NOSIGPIPE flag. Suggested by: alfred	2005-03-08 16:11:41 +00:00
rwatson	88bf7ca80c	Remove now unused 'int s' from spl(). MFC after: 3 days	2005-02-18 21:39:55 +00:00
rwatson	c231be26b7	De-spl kern_connect(). MFC after: 3 days	2005-02-18 19:37:36 +00:00
rwatson	27fc9123db	In accept1(), extend coverage of the socket lock from just covering soref() to also covering the update of so_state. While no other user threads can update the socket state here as it's not yet hooked up to the file descriptor array yet, the protocol could also frob the socket state here, leading to a lost update to the so_state field. No reported instances of this bug (as yet). MFC after: 3 days	2005-02-17 13:00:23 +00:00
sobomax	68d0bd2186	Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator. MFC after: 2 weeks	2005-01-30 07:20:36 +00:00
phk	796d435574	Don't use VOP_GETVOBJECT, use vp->v_object directly.	2005-01-25 00:40:01 +00:00
phk	730f6f1d85	Save a line by unlocking before we test.	2005-01-24 14:13:24 +00:00
imp	20280f1431	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
phk	216166ee0d	Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST. Use this in all the places where sleeping with the lock held is not an issue. The distinction will become significant once we finalize the exact lock-type to use for this kind of case.	2004-11-13 11:53:02 +00:00
alc	279c442e7b	Introduce two new options, "CPU private" and "no wait", to sf_buf_alloc(). Change the spelling of the "catch" option to be consistent with the new options. Implement the "no wait" option. An implementation of the "CPU private" for i386 will be committed at a later date.	2004-11-08 00:43:46 +00:00
phk	52da2f8e34	Introduce fdclose() which will clean an entry in a filedesc. Replace homerolled versions with call to fdclose(). Make fdunused() static to kern_descrip.c	2004-11-07 22:16:07 +00:00
phk	cad3685bce	Use fget_locked() instead of homerolled	2004-11-07 16:09:56 +00:00
alc	25b80a64b9	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
rwatson	d961169e94	Move from using the socket reference count to the file reference count to prevent sockets from being garbage collected during socket-specific system calls. This is the same approach used in most VFS-specific system calls, as well as generic file descriptor system calls such as read() and write(). To do this, add a utility function getsock(), which is logically identical to getvnode() used for the same purpose in VFS. Unlike fgetsock(), it returns with the file reference count elevated, but no bump of the socket reference count. Replace matching calls to fputsock() with fdrop(). This change is made to all socket system calls other than sendfile() and accept(), but the approach should be applicable to those system calls also. This shaves about four mutex operations off of each of these system calls, including send() and recv() variants, adding about 1% to pps on minimal UDP packets for UP using netblast, and 4% on SMP. Reviewed by: pjd	2004-10-24 23:45:01 +00:00
alc	e24e0aa793	Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().	2004-10-24 20:09:59 +00:00
alc	d66bfa760a	Modify the vm object locking in do_sendfile() so that the containing object is locked when vm_page_io_finish() is called on a page. This is to satisfy a new, post-RELENG_5 assertion in vm_page_io_finish(). (I am in the process of transitioning the responsibility for synchronizing access to various fields/flags on the page from the global page queues lock to the per-object lock.) Tripped over by: obrien@	2004-10-20 17:44:40 +00:00
alc	ad2a4ca3e0	Add a SOCKBUF_LOCK() to a rarely executed path in do_sendfile().	2004-10-02 05:37:47 +00:00
jmg	bc1805c6e8	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
dwmalone	c8c1b8f415	Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.	2004-07-17 21:06:36 +00:00
phk	b9f13e4266	Clean up and wash struct iovec and struct uio handling. Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.	2004-07-10 15:42:16 +00:00
rwatson	fb654efba8	Remove spl()'s from do_sendfile().	2004-07-09 01:46:03 +00:00
rwatson	deac06df05	Acquire socket lock in the "waiting for connection" loop in kern_connect(), replacing tsleep() with msleep() with the socket mutex.	2004-06-24 01:43:23 +00:00
bms	00a26380d4	Fix an inconsistency in socket option propagation on accept(). Propagate the SS_NBIO flag from the parent socket to the child socket during an accept() operation. The file descriptor O_NONBLOCK flag would have been propagated already by the fflag assignment, and therefore would have been inconsistent with the underlying socket's so_state member. This makes accept() more closely adhere to the API contract we effectively outline in the manual page. Note also that Linux continues to differ here; O_NONBLOCK is not propagated. The other BSDs do propagate the flag, as does Solaris. The Single UNIX Specification does not offer specific advice on this issue. PR: kern/45733 Requested by: Jayanth Vijayaraghavan Reviewed by: rwatson	2004-06-22 23:58:09 +00:00
rwatson	e5f4cab982	Assert socket buffer lock in sb_lock() to protect socket buffer sleep lock state. Convert tsleep() into msleep() with socket buffer mutex as argument. Hold socket buffer lock over sbunlock() to protect sleep lock state. Assert socket buffer lock in sbwait() to protect the socket buffer wait state. Convert tsleep() into msleep() with socket buffer mutex as argument. Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK() in order to call into these functions with the lock, as well as to start protecting other socket buffer use in their implementation. Drop the socket buffer mutexes around calls into the protocol layer, around potentially blocking operations, for copying to/from user space, and VM operations relating to zero-copy. Assert the socket buffer mutex strategically after code sections or at the beginning of loops. In some cases, modify return code to ensure locks are properly dropped. Convert the potentially blocking allocation of storage for the remote address in soreceive() into a non-blocking allocation; we may wish to move the allocation earlier so that it can block prior to acquisition of the socket buffer lock. Drop some spl use. NOTE: Some races exist in the current structuring of sosend() and soreceive(). This commit only merges basic socket locking in this code; follow-up commits will close additional races. As merged, these changes are not sufficient to run without Giant safely. Reviewed by: juli, tjr	2004-06-19 03:23:14 +00:00
rwatson	f2c0db1521	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
rwatson	f1bc833e95	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
rwatson	7c0b73a950	Correct whitespace errors in merge from rwatson_netperf: tabs instead of spaces, no trailing tab at the end of line. Pointed out by: csjp	2004-06-12 23:36:59 +00:00
rwatson	82295697cd	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
phk	86602fc06c	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.	2004-06-11 11:16:26 +00:00
rwatson	8555f72de8	Correct a resource leak introduced in recent accept locking changes: when I reordered events in accept1() to allocate a file descriptor earlier, I didn't properly update use of goto on exit to unwind for cases where the file descriptor is now held, but wasn't previously. The result was that, in the event of accept() on a non-blocking socket, or in the event of a socket error, a file descriptor would be leaked. This ended up being non-fatal in many cases, as the file descriptor would be properly GC'd on process exit, so only showed up for processes that do a lot of non-blocking accept() calls, and also live for a long time (such as qmail). This change updates the use of goto targets to do additional unwinding. Eyes provided by: Brian Feldman <green@freebsd.org> Feet, hands provided by: Stefan Ehmann <shoesoft@gmx.net>, Dimitry Andric <dimitry@andric.com> Arjan van Leeuwen <avleeuwen@piwebs.com>	2004-06-07 21:45:44 +00:00

1 2 3 4 5

236 Commits