freebsd-skq

Author	SHA1	Message	Date
attilio	bf1dc90446	MFC	2013-03-08 00:03:07 +00:00
pjd	702516e70b	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
attilio	e98f58faf6	MFC	2013-03-02 14:48:41 +00:00
pjd	f07ebb8888	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
attilio	15bf891afe	Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to their "write" versions. Sponsored by: EMC / Isilon storage division	2013-02-20 12:03:20 +00:00
attilio	658534ed5a	Switch vm_object lock to be a rwlock. * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h	2013-02-20 10:38:34 +00:00
pjd	1d08d5f677	Audit sockaddr argument for bind(2), connect(2), accept(2), sendto(2) and recvfrom(2) syscalls. Sponsored by: The FreeBSD Foundation	2013-02-07 00:36:00 +00:00
pjd	8753fea2d8	Minor style tweaks.	2013-02-07 00:27:11 +00:00
glebius	8e20fa5ae9	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
kib	560aa751e0	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
kib	cac2fe116f	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks	2012-08-05 14:11:42 +00:00
pjd	859bb04daa	Style fixes and simplifications. MFC after: 1 month	2012-06-11 16:08:03 +00:00
mjg	1d2ca7b8d8	Plug socket refcount leak on error in sys_sctp_peeloff. Reviewed by: tuexen Approved by: trasz (mentor) MFC after: 3 days	2012-06-08 08:04:51 +00:00
glebius	2bdbc6913f	style(9) for r236563.	2012-06-05 05:16:04 +00:00
glebius	b0d113b96e	Microoptimisation of code from r236560, also coming from Nginx Inc. Submitted by: ru	2012-06-04 14:18:13 +00:00
glebius	df2b290f0d	Optimise kern_sendfile(): skip cycling through the entire mbuf chain in m_cat(), storing pointer to last mbuf in chain in local variable and attaching new mbuf to the end of chain. Submitter reports that CPU load dropped for > 10% on a web server serving large files with this optimisation. Submitted by: Sergey Budnevitch <sb nginx.com>	2012-06-04 12:49:21 +00:00
tuexen	b8b34b6ecf	Fix bugs which can result in a panic when an non-SCTP socket it used with an sctp_ system-call which expects an SCTP socket. MFC after: 3 days.	2012-03-15 14:13:38 +00:00
kib	80ae8fe82c	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month	2012-02-21 01:05:12 +00:00
kmacy	99851f359e	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
rwatson	4af919b491	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc	2011-08-11 12:30:23 +00:00
jonathan	8c932faae4	Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)	2011-06-30 10:56:02 +00:00
jhb	b5ffa4ca36	Log the socket address passed as the destination to sendto() and sendmsg() via ktrace. MFC after: 1 week	2011-06-07 17:40:33 +00:00
kib	eb730d92e4	After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9) and remove the falloc() version that lacks flag argument. This is done to reduce the KPI bloat. Requested by: jhb X-MFC-note: do not	2011-04-01 13:28:34 +00:00
bz	b9b7d3e93a	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
alc	3e042452ae	Eliminate unnecessary page hold_count checks. These checks predate r90944, which introduced a general mechanism for handling the freeing of held pages. Reviewed by: kib@	2011-02-03 14:42:46 +00:00
kib	42cc2c0d98	If more than one thread allocated sf buffers for sendfile(2), and each of the threads needs more while current pool of the buffers is exhausted, then neither thread can make progress. Switch to nowait allocations after we got first buffer already. Reported by: az Reviewed by: alc (previous version) Tested by: pho MFC after: 1 week	2011-01-28 17:37:09 +00:00
pjd	627e6dcc72	Just pass M_ZERO to malloc(9) instead of clearing allocated memory separately.	2010-12-14 06:19:13 +00:00
tuexen	908a61906a	Implement correct handling of address parameter and sendinfo for SCTP send calls. MFC after: 4 weeks.	2010-09-05 20:13:07 +00:00
jhb	44b49a3eaa	Send SIGPIPE to the thread that issued the offending system call rather than to the entire process. Reported by: Anit Chakraborty Reviewed by: kib, deischen (concept) MFC after: 1 week	2010-06-29 20:44:19 +00:00
tuexen	d27c0f60a0	* Do not dereference a NULL pointer when calling an SCTP send syscall not providing a destination address and using ktrace. * Do not copy out kernel memory when providing sinfo for sctp_recvmsg(). Both bug where reported by Valentin Nechayev. The first bug results in a kernel panic. MFC after: 3 days.	2010-06-26 19:26:20 +00:00
ed	76489ac1ea	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.	2010-06-21 09:55:56 +00:00
alc	e6c77ecaea	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
alc	a4eb017f2a	Eliminate a small bit of unneeded code from kern_sendfile(): While kern_sendfile() is running, the file's vm object can't be destroyed because kern_sendfile() increments the vm object's reference count. (Once kern_sendfile() decrements the reference count and returns, the vm object can, however, be destroyed. So, sf_buf_mext() must handle the case where the vm object is destroyed.) Reviewed by: kib	2010-05-06 15:52:08 +00:00
alc	387e15c45a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
kib	9c4f2e9ab2	Lock the page around hold_count access. Reviewed by: alc	2010-05-02 19:25:22 +00:00
kib	b145781d49	Properly handle compat32 calls to sctp generic sendmsd/recvmsg functions that take iov. Reviewed by: tuexen MFC after: 2 weeks	2010-03-19 10:46:54 +00:00
kib	0c73b78495	Remove dead statement. Reviewed by: tuexen MFC after: 2 weeks	2010-03-19 10:44:02 +00:00
kib	08e36289a4	Fix two style issues. MFC after: 2 weeks	2010-03-19 10:41:32 +00:00
pjd	f061fb129c	Use NULL instead of 0 when setting up pointer.	2010-02-18 22:12:40 +00:00
mjacob	c0a38f802d	Fix argument order in a call to mtx_init. MFC after: 1 week	2009-12-17 00:22:56 +00:00
kib	bd5fd681f2	If socket buffer space appears to be lower then sum of count of already prepared bytes and next portion of transfer, inner loop of kern_sendfile() aborts, not preparing next mbuf for socket buffer, and not modifying any outer loop invariants. The thread loops in the outer loop forever. Instead of breaking from inner loop, prepare only bytes that fit into the socket buffer space. In collaboration with: pho Reviewed by: bz PR: kern/138999 MFC after: 2 weeks	2009-11-03 12:52:35 +00:00
kib	4e178e30ad	Fix style issue.	2009-10-29 10:03:08 +00:00
kib	6f65ac4277	Do not dereference vp->v_mount without holding vnode lock and checking that the vnode is not reclaimed. Noted by: Igor Sysoev <is rambler-co ru> MFC after: 1 week	2009-10-01 12:50:26 +00:00
tuexen	de3c71bd61	Get SCTP working in combination with VIMAGE. Contains code from bz. Approved by: rrs (mentor) MFC after: 1 month.	2009-09-19 14:02:16 +00:00
rwatson	fb9ffed650	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
rwatson	cd692bb0b9	Audit file descriptor numbers for various socket-related system calls. Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 19:55:11 +00:00
rwatson	4aa2654586	Define missing audit argument macro AUDIT_ARG_SOCKET(), and capture the domain, type, and protocol arguments to socket(2) and socketpair(2). Approved by: re (audit argument blanket) MFC after: 3 days	2009-07-01 18:54:49 +00:00
bz	0b5c06357f	SCTP needs either IPv4 or IPv6 as lower layer[1]. So properly hide the already #ifdef SCTP code with #if defined(INET) \|\| defined(INET6) as well to get us closer to a non-INET/INET6 kernel. Discussed with: tuexen [1]	2009-06-10 14:36:59 +00:00
rwatson	f4934662e5	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd	2009-06-05 14:55:22 +00:00
rwatson	0f9e858440	Add internal 'mac_policy_count' counter to the MAC Framework, which is a count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project	2009-06-02 18:26:17 +00:00

1 2 3 4 5 ...

327 Commits