freebsd-skq

Author	SHA1	Message	Date
Ed Maste	69a2875821	Renumber license clauses in sys/kern to avoid skipping #3	2016-09-15 13:16:20 +00:00
Pedro F. Giffuni	b85f65af68	kern: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 16:10:11 +00:00
John Baldwin	f3215338ef	Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness. Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing an asynchronous I/O request for a given file. The AIO subystem now exports library of routines to manipulate AIO requests as well as the ability to run a handler function in the "default" pool of AIO daemons to service a request. A default implementation for file types which do not include an fo_aio_queue method queues requests to the "default" pool invoking the fo_read or fo_write methods as before. The AIO subsystem permits file types to install a private "cancel" routine when a request is queued to permit safe dequeueing and cleanup of cancelled requests. Sockets now use their own pool of AIO daemons and service per-socket requests in FIFO order. Socket requests will not block indefinitely permitting timely cancellation of all requests. Due to the now-tight coupling of the AIO subsystem with file types, the AIO subsystem is now a standard part of all kernels. The VFS_AIO kernel option and aio.ko module are gone. Many file types may block indefinitely in their fo_read or fo_write callbacks resulting in a hung AIO daemon. This can result in hung user processes (when processes attempt to cancel all outstanding requests during exit) or a hung system. To protect against this, AIO requests are only permitted for known "safe" files by default. AIO requests for all file types can be enabled by setting the new vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have been updated to skip operations on unsafe file types if the sysctl is zero. Currently, AIO requests on sockets and raw disks are considered safe and are enabled by default. aio_mlock() is also enabled by default. Reviewed by: cem, jilles Discussed with: kib (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5289	2016-03-01 18:12:14 +00:00
Gleb Smirnoff	8ec07310fa	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
Gleb Smirnoff	829fae9063	Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM sockets, due to sendfile(2) supporting only the latter, there is a corner case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake of control data, albeit being stream socket. Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY, similar to m_demote().	2016-01-08 19:03:20 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Gleb Smirnoff	53b680caa2	In sbappend*() family of functions clear M_PROTO flags of incoming mbufs. sbappendstream() already does this in m_demote(). PR: 196174 Sponsored by: Nginx, Inc.	2014-12-22 15:39:24 +00:00
Gleb Smirnoff	e834a84026	Revert r274494, r274712, r275955 and provide extra comments explaining why there could appear a zero-sized mbufs in socket buffers. A proper fix would be to divorce record socket buffers and stream socket buffers, and divorce pru_send that accepts normal data from pru_send that accepts control data.	2014-12-20 22:12:04 +00:00
Gleb Smirnoff	b7413232f6	Add to sbappendstream_locked() a check against NULL mbuf, like it is done in sbappend_locked() and sbappendrecord_locked(). This is a quick fix to the panic introduced by r274712. A proper solution should be to make sosend_generic() avoid calling pru_send() with NULL mbuf for the protocols that do not understand control messages. Those protocols that understand control messages, should be able to receive NULL mbuf, if control is non-NULL.	2014-12-20 14:19:46 +00:00
Gleb Smirnoff	651e4e6a30	Merge from projects/sendfile: extend protocols API to support sending not ready data: o Add new flag to pru_send() flags - PRUS_NOTREADY. o Add new protocol method pru_ready(). Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-30 13:24:21 +00:00
Gleb Smirnoff	0f9d0a73a4	Merge from projects/sendfile: o Introduce a notion of "not ready" mbufs in socket buffers. These mbufs are now being populated by some I/O in background and are referenced outside. This forces following implications: - An mbuf which is "not ready" can't be taken out of the buffer. - An mbuf that is behind a "not ready" in the queue neither. - If sockbet buffer is flushed, then "not ready" mbufs shouln't be freed. o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc. The sb_ccc stands for ""claimed character count", or "committed character count". And the sb_acc is "available character count". Consumers of socket buffer API shouldn't already access them directly, but use sbused() and sbavail() respectively. o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones with M_BLOCKED. o New field sb_fnrdy points to the first not ready mbuf, to avoid linear search. o New function sbready() is provided to activate certain amount of mbufs in a socket buffer. A special note on SCTP: SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet allow protocol specific sockbufs. Thus, SCTP does some hacks to make itself compatible with FreeBSD: it manages sockbufs on its own, but keeps sb_cc updated to inform the stack of amount of data in them. The new notion of "not ready" data isn't supported by SCTP. Instead, only a mechanical substitute is done: s/sb_cc/sb_ccc/. A proper solution would be to take away struct sockbuf from struct socket and allow protocols to implement their own socket buffers, like SCTP already does. This was discussed with rrs@. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 12:52:33 +00:00
Gleb Smirnoff	57f43a45a3	- Move sbcheck() declaration under SOCKBUF_DEBUG. - Improve SOCKBUF_DEBUG macros. - Improve sbcheck(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 11:22:39 +00:00
Gleb Smirnoff	8967b220a3	Make sballoc() and sbfree() functions. Ideally, they could be marked as static, but unfortunately Infiniband (ab)uses them. Sponsored by: Nginx, Inc.	2014-11-30 11:02:07 +00:00
Gleb Smirnoff	8146bcfea1	- Use NULL to compare a pointer. - Use KASSERT() instead of panic. - Remove useless 'continue', no need to restart cycle here. Sponsored by: Nginx, Inc.	2014-11-14 15:44:19 +00:00
Gleb Smirnoff	f274e25659	There should not be zero length mbufs in socket buffers. The code comes from r1451, and thus can't be explained. A patch with explicit panic() here survived all tests. Tested by: pho Sponsored by: Nginx, Inc.	2014-11-14 06:02:29 +00:00
Hans Petter Selasky	9fd573c39d	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week	2014-09-22 08:27:27 +00:00
Xin LI	2827952eb4	Don't leave the padding between the msg header and the cmsg data, and the padding after the cmsg data un-initialized. Submitted by: tuexen Security: CVE-2014-3952 Security: FreeBSD-SA-14:17.kmem	2014-07-08 21:54:23 +00:00
Alan Somers	8de34a88de	Fix PR kern/185813 "SOCK_SEQPACKET AF_UNIX sockets with asymmetrical buffers drop packets". It was caused by a check for the space available in a sockbuf, but it was checking the wrong sockbuf. sys/sys/sockbuf.h sys/kern/uipc_sockbuf.c Add sbappendaddr_nospacecheck_locked(), which is just like sbappendaddr_locked but doesn't validate the receiving socket's space. Factor out common code into sbappendaddr_locked_internal(). We shouldn't simply make sbappendaddr_locked check the space and then call sbappendaddr_nospacecheck_locked, because that would cause the O(n) function m_length to be called twice. sys/kern/uipc_usrreq.c Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets, because the receiving sockbuf's size limit is irrelevant. tests/sys/kern/unix_seqpacket_test.c Now that 185813 is fixed, pipe_128k_8k fails intermittently due to 185812. Make it fail every time by adding a usleep after starting the writer thread and before starting the reader thread in test_pipe. That gives the writer time to fill up its send buffer. Also, clear the expected failure message due to 185813. It actually said "185812", but that was a typo. PR: kern/185813 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-06 20:24:15 +00:00
Gleb Smirnoff	761a9a1f8d	Fix comment.	2014-01-17 11:09:05 +00:00
Gleb Smirnoff	1d2df300e9	- Substitute sbdrop_internal() with sbcut_internal(). The latter doesn't free mbufs, but return chain of free mbufs to a caller. Caller can either reuse them or return to allocator in a batch manner. - Implement sbdrop()/sbdrop_locked() as a wrapper around sbcut_internal(). - Expose sbcut_locked() for outside usage. Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)	2013-10-09 11:57:53 +00:00
Davide Italiano	7729cbf1a6	Fix socket buffer timeouts precision using the new sbintime_t KPI instead of relying on the tvtohz() workaround. The latter has been introduced lately by jhb@ (r254699) in order to have a fix that can be backported to STABLE. Reported by: Vitja Makarov <vitja.makarov at gmail dot com> Reviewed by: jhb (earlier version)	2013-09-01 23:34:53 +00:00
Lawrence Stewart	0963c8e431	When a previous call to sbsndptr() leaves sb->sb_sndptroff at the start of an mbuf that was fully consumed by the previous call, the mbuf ptr returned by the current call ends up being the previous mbuf in the sb chain to the one that contains the data we want. This does not cause any observable issues because the mbuf copy routines happily walk the mbuf chain to get to the data at the moff offset, which in this case means they effectively skip over the mbuf returned by sbsndptr(). We can't adjust sb->sb_sndptr during the previous call for this case because the next mbuf in the chain may not exist yet. We therefore need to detect the condition and make the adjustment during the current call. Fix by detecting the special case of moff being at the start of the next mbuf in the chain and adjust the required accounting variables accordingly. Reviewed by: andre MFC after: 2 weeks	2013-06-19 03:08:01 +00:00
Gleb Smirnoff	844cacd17c	Once ng_ksocket(4) is fixed, re-apply r194662. See this revision for longer description. Discussed with: andre, rwatson Sponsored by: Nginx, Inc.	2013-03-29 14:06:04 +00:00
Gleb Smirnoff	c8b59ea750	Use m_get() and m_getcl() instead of compat macros.	2013-03-15 10:21:18 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Eitan Adler	3eb9ab5255	Document a large number of currently undocumented sysctls. While here fix some style(9) issues and reduce redundancy. PR: kern/155491 PR: kern/155490 PR: kern/155489 Submitted by: Galimov Albert <wtfcrap@mail.ru> Approved by: bde Reviewed by: jhb MFC after: 1 week	2011-12-13 00:38:50 +00:00
Bjoern A. Zeeb	b233773bb9	Increase the defaults for the maximum socket buffer limit, and the maximum TCP send and receive buffer limits from 256kB to 2MB. For sb_max_adj we need to add the cast as already used in the sysctl handler to not overflow the type doing the maths. Note that this is just the defaults. They will allow more memory to be consumed per socket/connection if needed but not change the default "idle" memory consumption. All values are still tunable by sysctls. Suggested by: gnn Discussed on: arch (Mar and Aug 2011) MFC after: 3 weeks Approved by: re (kib)	2011-08-25 09:20:13 +00:00
Gleb Smirnoff	443301e296	Revert r194662, since it breaks ng_ksocket(4) and may break other socket consumers with alternate sb_upcall. PR: kern/154676 Submitted by: Arnaud Lacombe <lacombar gmail.com> MFC after: 7 days	2011-04-14 14:54:22 +00:00
Andre Oppermann	4ad1c464fe	In sbappendstream_locked() demote all incoming packet mbufs (and chains) to pure data mbufs using m_demote(). This removes the packet header and all m_tag information as they are not meaningful anymore on a stream socket where mbufs are linked through m->m_next. Strictly speaking a packet header can be only ever valid on the first mbuf in an m_next chain. sbcompress() was doing this already when the mbuf chain layout lent itself to it (e.g. header splitting or merge-append), just not consistently. This frees resources at socket buffer append time instead of at sbdrop_internal() time after data has been read from the socket. For MAC the per packet information has done its duty and during socket buffer appending the policy of the socket itself takes over. With the append the packet boundaries disappear naturally and with it any context that was based on it. None of the residual information from mbuf headers in the socket buffer on stream sockets was looked at.	2009-06-22 21:46:40 +00:00
John Baldwin	74fb0ba732	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem	2009-06-01 21:17:03 +00:00
Maksim Yevmenkin	e72a94adc3	Fix sbappendrecord_locked(). The main problem is that sbappendrecord_locked() relies on sbcompress() to set sb_mbtail. This will not happen if sbappendrecord_locked() is called with mbuf chain made of exactly one mbuf (i.e. m0->m_next == NULL). In this case sbcompress() will be called with m == NULL and will do nothing. I'm not entirely sure if m == NULL is a valid argument for sbcompress(), and, it rather pointless to call it like that, but keep calling it so it can do SBLASTMBUFCHK(). The problem is triggered by the SOCKBUF_DEBUG kernel option that enables SBLASTRECORDCHK() and SBLASTMBUFCHK() checks. PR: kern/126742 Investigated by: pluknet < pluknet -at- gmail -dot- com > No response from: freebsd-current@, freebsd-bluetooth@ MFC after: 3 days	2009-04-21 19:14:13 +00:00
Robert Watson	7978014d3a	Rewrite sbreserve_locked()'s comment on NULL thread pointers, eliminating an XXXRW about the comment being stale. MFC after: 3 days	2008-10-07 09:51:39 +00:00
Bjoern A. Zeeb	6f4745d575	Catch a possible NULL pointer deref in case the offsets got mangled somehow. As a consequence we may now get an unexpected result(). Catch that error cases with a well defined panic giving appropriate pointers to ease debugging. () While the concensus was that the case should never happen unless there was a bug, noone was definitively sure. Discussed with: kmacy (about 8 months back) Reviewed by: silby (as part of a larger patch in March) MFC after: 2 months	2008-09-07 13:09:04 +00:00
George V. Neville-Neil	49f287f8c5	Update the kernel to count the number of mbufs and clusters (all types) used per socket buffer. Add support to netstat to print out all of the socket buffer statistics. Update the netstat manual page to describe the new -x flag which gives the extended output. Reviewed by: rwatson, julian	2008-05-15 20:18:44 +00:00
Robert Watson	3f0bfcccfd	Further clean up sorflush: - Expose sbrelease_internal(), a variant of sbrelease() with no expectations about the validity of locks in the socket buffer. - Use sbrelease_internel() in sorflush(), and as a result avoid intializing and destroying a socket buffer lock for the temporary stack copy of the actual buffer, asb. - Add a comment indicating why we do what we do, and remove an XXX since things have gotten less ugly in sorflush() lately. This makes socket close cleaner, and possibly also marginally faster. MFC after: 3 weeks	2008-02-04 12:25:13 +00:00
Robert Watson	265de5bb62	Correct two problems relating to sorflush(), which is called to flush read socket buffers in shutdown() and close(): - Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv(). - Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it. To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked. Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>	2008-01-31 08:22:24 +00:00
Kip Macy	5e0f5cfaed	Add SB_NOCOALESCE flag to disable socket buffer update in place	2007-12-17 10:02:01 +00:00
Jeff Roberson	ace8398da0	Refactor select to reduce contention and hide internal implementation details from consumers. - Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details. Tested by: kris Reviewed on: arch	2007-12-16 06:21:20 +00:00
Mohan Srinivasan	58d14dae6d	Set the NFS server sockbuf high watermarks to the system defaults (up form 32KB). The low highwatermark setting caused UDP fullsock request drops, throttling thruput greatly. Reported by: Kris Kennaway Approved by: re@ (Ken Smith)	2007-10-12 03:56:27 +00:00
Robert Watson	049c3b6cdf	Now that sx(9) locks support an interruptible lock acquire primitive, properly observe the SB_NOINTR flag in sblock. This restores the required behavior that lock acquisition be interruptible on the socket buffer I/O serialization lock to allow threads waiting for I/O to be signaled even if they aren't the thread currently holding the I/O lock. With this change, the sblock regression test is again passed. Reported by: alfred sx(9) handiwork: attilio	2007-05-31 11:51:22 +00:00
Robert Watson	d19e16a72c	Generally migrate to ANSI function headers, and remove 'register' use.	2007-05-16 20:41:08 +00:00
Robert Watson	7abab91135	sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.	2007-05-03 14:42:42 +00:00
Robert Watson	8c799760e1	Following movement of functions from uipc_socket2.c to uipc_socket.c and uipc_sockbuf.c, clean up and update comments.	2007-03-26 17:05:09 +00:00
Robert Watson	20d9e5e87c	Complete removal of uipc_socket2.c by moving the last few functions to other C files: - Move sbcreatecontrol() and sbtoxsockbuf() to uipc_sockbuf.c. While sbcreatecontrol() is really an mbuf allocation routine, it does its work with awareness of the layout of socket buffer memory. - Move pru_*() protocol switch stubs to uipc_socket.c where the non-stub versions of several of these functions live. Likewise, move socket state transition calls (soisconnecting(), etc) to uipc_socket.c. Moveo sodupsockaddr() and sotoxsocket().	2007-03-26 08:59:03 +00:00
Andre Oppermann	4e02375908	Maintain a pointer and offset pair into the socket buffer mbuf chain to avoid traversal of the entire socket buffer for larger offsets on stream sockets. Adjust tcp_output() make use of it. Tested by: gallatin	2007-03-19 18:35:13 +00:00
John Baldwin	86a93d51e3	Use sysctl_handle_long() instead of duplicating it's logic for kern.ipc.maxsockbuf so that this sysctl works for 32-bit binaries running on amd64 via compat/freebsd32. MFC after: 3 days	2006-09-06 21:59:36 +00:00
Robert Watson	050ac26521	Remove 'register'. Use ANSI C prototypes/function headers. More deterministically line wrap comments.	2006-08-02 13:01:58 +00:00
Robert Watson	eaa6dfbcc2	Reimplement socket buffer tear-down in sofree(): as the socket is no longer referenced by other threads (hence our freeing it), we don't need to set the can't send and can't receive flags, wake up the consumers, perform two levels of locking, etc. Implement a fast-path teardown, sbdestroy(), which flushes and releases each socket buffer. A manual dom_dispose of the receive buffer is still required explicitly to GC any in-flight file descriptors, etc, before flushing the buffer. This results in a 9% UP performance improvement and 16% SMP performance improvement on a tight loop of socket();close(); in micro-benchmarking, but will likely also affect CPU-bound macro-benchmark performance.	2006-08-01 10:30:26 +00:00
Robert Watson	f14cce87dc	Remove non-socket buffer routines from uipc_sockbuf.c, and socket buffer specific routines from uipc_socket2.c following repo-copy. We might rethink the location of one or two at some point, but the division was relatively clean. uipc_sockbuf.c is now the home of routines that manipulate socket buffers.	2006-07-24 16:21:31 +00:00
Robert Watson	5908c617bb	Several protocol switch functions (pru_abort, pru_detach, pru_sosetlabel) return void, so don't implement no-op versions of these functions. Instead, consistently check if those switch pointers are NULL before invoking them.	2006-07-11 23:18:28 +00:00

1 2 3 4 5

209 Commits