freebsd-dev

Author	SHA1	Message	Date
Gleb Smirnoff	8955d2720f	Microoptimisation of code from r236560, also coming from Nginx Inc. Submitted by: ru	2012-06-04 14:18:13 +00:00
Gleb Smirnoff	835d890042	Optimise kern_sendfile(): skip cycling through the entire mbuf chain in m_cat(), storing pointer to last mbuf in chain in local variable and attaching new mbuf to the end of chain. Submitter reports that CPU load dropped for > 10% on a web server serving large files with this optimisation. Submitted by: Sergey Budnevitch <sb nginx.com>	2012-06-04 12:49:21 +00:00
Michael Tuexen	99f293a20e	Fix bugs which can result in a panic when an non-SCTP socket it used with an sctp_ system-call which expects an SCTP socket. MFC after: 3 days.	2012-03-15 14:13:38 +00:00
Konstantin Belousov	526d0bd547	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month	2012-02-21 01:05:12 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Robert Watson	a9d2f8d84f	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
Jonathan Anderson	12bc222e57	Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)	2011-06-30 10:56:02 +00:00
John Baldwin	c721b93449	Log the socket address passed as the destination to sendto() and sendmsg() via ktrace. MFC after: 1 week	2011-06-07 17:40:33 +00:00
Konstantin Belousov	1fe80828e7	After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9) and remove the falloc() version that lacks flag argument. This is done to reduce the KPI bloat. Requested by: jhb X-MFC-note: do not	2011-04-01 13:28:34 +00:00
Bjoern A. Zeeb	1fb51a12f2	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
Alan Cox	8189ac85e9	Eliminate unnecessary page hold_count checks. These checks predate r90944, which introduced a general mechanism for handling the freeing of held pages. Reviewed by: kib@	2011-02-03 14:42:46 +00:00
Konstantin Belousov	9ca9fc5380	If more than one thread allocated sf buffers for sendfile(2), and each of the threads needs more while current pool of the buffers is exhausted, then neither thread can make progress. Switch to nowait allocations after we got first buffer already. Reported by: az Reviewed by: alc (previous version) Tested by: pho MFC after: 1 week	2011-01-28 17:37:09 +00:00
Pawel Jakub Dawidek	b452cf6317	Just pass M_ZERO to malloc(9) instead of clearing allocated memory separately.	2010-12-14 06:19:13 +00:00
Michael Tuexen	049640c1f0	Implement correct handling of address parameter and sendinfo for SCTP send calls. MFC after: 4 weeks.	2010-09-05 20:13:07 +00:00
John Baldwin	7a6f3d7890	Send SIGPIPE to the thread that issued the offending system call rather than to the entire process. Reported by: Anit Chakraborty Reviewed by: kib, deischen (concept) MFC after: 1 week	2010-06-29 20:44:19 +00:00
Michael Tuexen	e1c97831ec	* Do not dereference a NULL pointer when calling an SCTP send syscall not providing a destination address and using ktrace. * Do not copy out kernel memory when providing sinfo for sctp_recvmsg(). Both bug where reported by Valentin Nechayev. The first bug results in a kernel panic. MFC after: 3 days.	2010-06-26 19:26:20 +00:00
Ed Schouten	60ae52f785	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.	2010-06-21 09:55:56 +00:00
Alan Cox	f0c0d3998d	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
Alan Cox	52683078a2	Eliminate a small bit of unneeded code from kern_sendfile(): While kern_sendfile() is running, the file's vm object can't be destroyed because kern_sendfile() increments the vm object's reference count. (Once kern_sendfile() decrements the reference count and returns, the vm object can, however, be destroyed. So, sf_buf_mext() must handle the case where the vm object is destroyed.) Reviewed by: kib	2010-05-06 15:52:08 +00:00
Alan Cox	913814935a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
Konstantin Belousov	a0b8e597e5	Lock the page around hold_count access. Reviewed by: alc	2010-05-02 19:25:22 +00:00
Konstantin Belousov	5322f02ec0	Properly handle compat32 calls to sctp generic sendmsd/recvmsg functions that take iov. Reviewed by: tuexen MFC after: 2 weeks	2010-03-19 10:46:54 +00:00
Konstantin Belousov	fd9d1e7627	Remove dead statement. Reviewed by: tuexen MFC after: 2 weeks	2010-03-19 10:44:02 +00:00
Konstantin Belousov	0a977ede48	Fix two style issues. MFC after: 2 weeks	2010-03-19 10:41:32 +00:00
Pawel Jakub Dawidek	0454fe84e4	Use NULL instead of 0 when setting up pointer.	2010-02-18 22:12:40 +00:00
Matt Jacob	e7d829a46c	Fix argument order in a call to mtx_init. MFC after: 1 week	2009-12-17 00:22:56 +00:00
Konstantin Belousov	1c89fc757a	If socket buffer space appears to be lower then sum of count of already prepared bytes and next portion of transfer, inner loop of kern_sendfile() aborts, not preparing next mbuf for socket buffer, and not modifying any outer loop invariants. The thread loops in the outer loop forever. Instead of breaking from inner loop, prepare only bytes that fit into the socket buffer space. In collaboration with: pho Reviewed by: bz PR: kern/138999 MFC after: 2 weeks	2009-11-03 12:52:35 +00:00
Konstantin Belousov	7415a41f4a	Fix style issue.	2009-10-29 10:03:08 +00:00
Konstantin Belousov	75ffdc4049	Do not dereference vp->v_mount without holding vnode lock and checking that the vnode is not reclaimed. Noted by: Igor Sysoev <is rambler-co ru> MFC after: 1 week	2009-10-01 12:50:26 +00:00
Michael Tuexen	8518270e20	Get SCTP working in combination with VIMAGE. Contains code from bz. Approved by: rrs (mentor) MFC after: 1 month.	2009-09-19 14:02:16 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Robert Watson	15ca46f69d	Audit file descriptor numbers for various socket-related system calls. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 19:55:11 +00:00
Robert Watson	9e4c1521d5	Define missing audit argument macro AUDIT_ARG_SOCKET(), and capture the domain, type, and protocol arguments to socket(2) and socketpair(2). Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 18:54:49 +00:00
Bjoern A. Zeeb	c03528b663	SCTP needs either IPv4 or IPv6 as lower layer[1]. So properly hide the already #ifdef SCTP code with #if defined(INET) \|\| defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]	2009-06-10 14:36:59 +00:00
Robert Watson	bcf11e8d00	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
Robert Watson	f93bfb23dc	Add internal 'mac_policy_count' counter to the MAC Framework, which is a count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project	2009-06-02 18:26:17 +00:00
Dmitry Chagin	4202e1be20	Split native socketpair() syscall onto kern_socketpair() which should be used by kernel consumers and socketpair() itself. Approved by: kib (mentor) MFC after: 1 month	2009-05-31 12:12:38 +00:00
Jeff Roberson	bf422e5f27	- Implement a lockless file descriptor lookup algorithm in fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.	2009-05-14 03:24:22 +00:00
Marko Zec	2114e063f0	A NOP change: style / whitespace cleanup of the noise that slipped into r191816. Spotted by: bz Approved by: julian (mentor) (an earlier version of the diff)	2009-05-08 14:34:25 +00:00
Marko Zec	21ca7b57bd	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)	2009-05-05 10:56:12 +00:00
Kip Macy	f0b9868d3a	sendfile doesn't modify the vnode - acquire vnode lock shared Reviewed by: ups, jeffr	2009-04-12 05:19:35 +00:00
Dag-Erling Smørgrav	1ede983cc9	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months	2008-10-23 15:53:51 +00:00
Robert Watson	17c2fc0cc7	When sendto(2) is called with an explicit destination address argument, call mac_socket_check_connect() on that address before proceeding with the send. Otherwise policies instrumenting the connect entry point for the purposes of checking destination addresses will not have the opportunity to check implicit connect requests. MFC after: 3 weeks Sponsored by: nCircle Network Security, Inc.	2008-05-22 07:18:54 +00:00
Robert Watson	ae11a989e6	When writing trailers in sendfile(2), don't call kern_writev() while holding the socket buffer lock. These leads to an immediate panic due to recursing the socket buffer lock. This bug was introduced in uipc_syscalls.c:1.240, but masked by another bug until that was fixed in uipc_syscalls.c:1.269. Note that the current fix isn't perfect, but better than panicking: normally we guarantee that simultaneous invocations of a system call to write on a stream socket won't be interlaced, which is ensured by use of the socket buffer sleep lock. This is guaranteed for the sendfile headers, but not trailers. In practice, this is likely not a problem, but should be fixed. MFC after: 3 days Pointy hat to: andre (1.240), cperciva (1.269)	2008-04-27 15:50:00 +00:00
Ruslan Ermilov	ea26d58729	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
Colin Percival	491869163b	After finishing sending file data in sendfile(2), don't forget to send the provided trailers. This has been broken since revision 1.240. Submitted by: Dan Nelson PR: kern/120948 "sounds ok to me" from: phk MFC after: 3 days	2008-02-24 00:07:00 +00:00
Dag-Erling Smørgrav	60e15db992	This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payload consists of the null-terminated name and the contents of any structure you wish to record. A new ktrstruct() function constructs and emits a KTR_STRUCT record. It is accompanied by convenience macros for struct stat and struct sockaddr. In kdump(1), KTR_STRUCT records are handled by a dispatcher function that runs stringent sanity checks on its contents before handing it over to individual decoding funtions for each type of structure. Currently supported structures are struct stat and struct sockaddr for the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK and AF_IPX is present but disabled, as I am unable to test it properly. Since 's' was already taken, the letter 't' is used by ktrace(1) to enable KTR_STRUCT trace points, and in kdump(1) to enable their decoding. Derived from patches by Andrew Li <andrew2.li@citi.com>. PR: kern/117836 MFC after: 3 weeks	2008-02-23 01:01:49 +00:00
Simon L. B. Nielsen	1b7089994c	Fix sendfile(2) write-only file permission bypass. Security: FreeBSD-SA-08:03.sendfile Submitted by: kib	2008-02-14 11:44:31 +00:00
Poul-Henning Kamp	b75a1171d8	Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs referencing the files VM pages are returned from the network stack, making changes to the file safe. This flag does not guarantee that the data has been transmitted to the other end.	2008-02-03 15:54:41 +00:00
Poul-Henning Kamp	cf827063a9	Give MEXTADD() another argument to make both void pointers to the free function controlable, instead of passing the KVA of the buffer storage as the first argument. Fix all conventional users of the API to pass the KVA of the buffer as the first argument, to make this a no-op commit. Likely break the only non-convetional user of the API, after informing the relevant committer. Update the mbuf(9) manual page, which was already out of sync on this point. Bump __FreeBSD_version to 800016 as there is no way to tell how many arguments a CPP macro needs any other way. This paves the way for giving sendfile(9) a way to wait for the passed storage to have been accessed before returning. This does not affect the memory layout or size of mbufs. Parental oversight by: sam and rwatson. No MFC is anticipated.	2008-02-01 19:36:27 +00:00

1 2 3 4 5 ...

313 Commits