freebsd-skq

Author	SHA1	Message	Date
kib	77c9d3f4e8	The VOP_LOOKUP() implementations for CREATE op do not put the name into namecache, to avoid cache trashing when doing large operations. E.g., tar archive extraction is not usually followed by access to many of the files created. Right now, each VOP_LOOKUP() implementation explicitely knowns about this quirk and tests for both MAKEENTRY flag presence and op != CREATE to make the call to cache_enter(). Centralize the handling of the quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP. VFS now sets NOCACHE flag for CREATE namei() calls. Note that the change in semantic is backward-compatible and could be merged to the stable branch, and is compatible with non-changed third-party filesystems which correctly handle MAKEENTRY. Suggested by: Chris Torek <torek@pi-coral.com> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-18 10:01:12 +00:00
glebius	3dd3d6d9ff	Merge from projects/sendfile: Provide pru_ready for AF_LOCAL sockets. Local sockets sendsdata directly to the receive buffer of the peer, thus pru_ready also works on the peer socket. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 13:40:58 +00:00
glebius	834e6d1d30	Merge from projects/sendfile: - Use KASSERT()s instead of panic(). - Use sbavail() instead of sb_cc. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-12 10:17:46 +00:00
peter	f53e346683	Partial revert of r262867. r262867 was described as fixing socket buffer checks for SOCK_SEQPACKET, but also changed one of the SOCK_DGRAM code paths to use the new sbappendaddr_nospacecheck_locked() function. This lead to SOCK_DGRAM bypassing socket buffer limits.	2014-08-03 22:37:21 +00:00
mjg	c9a982764b	Cosmetic changes to unp_internalize Don't throw away the result of fget_unlocked. Move fdp increment to for loop to make it consistent with similar code elsewhere. MFC after: 1 week	2014-07-23 18:04:52 +00:00
kevlo	940bebdcf2	Deprecate m_act. Use m_nextpkt always.	2014-07-17 05:21:16 +00:00
kevlo	c4fd7b3987	Make bind(2) and connect(2) return EAFNOSUPPORT for AF_UNIX on wrong address family. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191586 for the original discussion. Reviewed by: terry	2014-07-14 06:00:01 +00:00
mjg	21b16efdd1	Check lower bound of cmsg_len. If passed cm->cmsg_len was below cmsghdr size the experssion: datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data; would give negative result. However, in practice it would not result in a crash because the kernel would try to obtain garbage fds for given process and would error out with EBADF. PR: 124908 Submitted by: campbell mumble.net (modified a little) MFC after: 1 week	2014-06-27 05:04:36 +00:00
rwatson	33fdc14c0c	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
asomers	b56bd6b02a	Replace 4.4BSD Lite's unix domain socket backpressure hack with a cleaner mechanism, based on the new SB_STOP sockbuf flag. The old hack dynamically changed the sending sockbuf's high water mark whenever adding or removing data from the receiving sockbuf. It worked for stream sockets, but it never worked for SOCK_SEQPACKET sockets because of their atomic nature. If the sockbuf was partially full, it might return EMSGSIZE instead of blocking. The new solution is based on DragonFlyBSD's fix from commit 3a6117bbe0ed6a87605c1e43e12a1438d8844380 on 2008-05-27. It adds an SB_STOP flag to sockbufs. Whenever uipc_send surpasses the socket's size limit, it sets SB_STOP on the sending sockbuf. sbspace() will then return 0 for that sockbuf, causing sosend_generic and friends to block. uipc_rcvd will likewise clear SB_STOP. There are two fringe benefits: uipc_{send,rcvd} no longer need to call chgsbsize() on every send and receive because they don't change the sockbuf's high water mark. Also, uipc_sense no longer needs to acquire the UIPC linkage lock, because it's simpler to compute the st_blksizes. There is one drawback: since sbspace() will only ever return 0 or the maximum, sosend_generic will allow the sockbuf to exceed its nominal maximum size by at most one packet of size less than the max. I don't think that's a serious problem. In fact, I'm not even positive that FreeBSD guarantees a socket will always stay within its nominal size limit. sys/sys/sockbuf.h Add the SB_STOP flag and adjust sbspace() sys/sys/unpcb.h Delete the obsolete unp_cc and unp_mbcnt fields from struct unpcb. sys/kern/uipc_usrreq.c Adjust uipc_rcvd, uipc_send, and uipc_sense to use the SB_STOP backpressure mechanism. Removing obsolete unpcb fields from db_show_unpcb. tests/sys/kern/unix_seqpacket_test.c Clear expected failures from ATF. Obtained from: DragonFly BSD PR: kern/185812 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-13 18:42:12 +00:00
asomers	e2a82966a6	Partial revert of change 262914. I screwed up subversion syntax with perforce syntax and committed some unrelated files. Only devd files should've been committed. Reported by: imp Pointy hat to: asomers MFC after: 3 weeks X-MFC-With: r262914	2014-03-07 23:40:36 +00:00
asomers	94889417b9	sbin/devd/devd.8 sbin/devd/devd.cc Add a -q flag to devd that will suppress syslog logging at LOG_NOTICE or below. Requested by: ian@ and imp@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-07 23:30:48 +00:00
asomers	2a6c6c59a2	Fix PR kern/185813 "SOCK_SEQPACKET AF_UNIX sockets with asymmetrical buffers drop packets". It was caused by a check for the space available in a sockbuf, but it was checking the wrong sockbuf. sys/sys/sockbuf.h sys/kern/uipc_sockbuf.c Add sbappendaddr_nospacecheck_locked(), which is just like sbappendaddr_locked but doesn't validate the receiving socket's space. Factor out common code into sbappendaddr_locked_internal(). We shouldn't simply make sbappendaddr_locked check the space and then call sbappendaddr_nospacecheck_locked, because that would cause the O(n) function m_length to be called twice. sys/kern/uipc_usrreq.c Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets, because the receiving sockbuf's size limit is irrelevant. tests/sys/kern/unix_seqpacket_test.c Now that 185813 is fixed, pipe_128k_8k fails intermittently due to 185812. Make it fail every time by adding a usleep after starting the writer thread and before starting the reader thread in test_pipe. That gives the writer time to fill up its send buffer. Also, clear the expected failure message due to 185813. It actually said "185812", but that was a typo. PR: kern/185813 Reviewed by: silence from freebsd-net@ and rwatson@ MFC after: 3 weeks Sponsored by: Spectra Logic Corporation	2014-03-06 20:24:15 +00:00
glebius	3d45194f2c	Provide pr_ctloutput method for AF_LOCAL/SOCK_SEQPACKET sockets. This makes setsockopt() on them working. Reported by: Yuri <yuri rawbw.com> Approved by: re (kib)	2013-09-11 18:22:30 +00:00
pjd	029a6f5d92	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
mjg	80f69955c8	Fix receiving fd over unix socket broken in r247740. If n fds were passed, it would receive the first one n times. Reported by: Shawn Webb <lattera@gmail.com>, koobs, gleb Tested by: koobs, gleb Reviewed by: pjd	2013-07-02 07:36:04 +00:00
glebius	1bf8d856bd	Improve r250890, so that we stop processing of a message with zero descriptors as early as possible, and assert that number of descriptors is positive in unp_freerights(). Reviewed by: mjg, pjd, jilles	2013-06-04 11:19:08 +00:00
mjg	4072f856eb	passing fd over unix socket: fix a corner case where caller wants to pass no descriptors. Previously the kernel would leak memory and try to free a potentially arbitrary pointer. Reviewed by: pjd	2013-05-21 21:58:00 +00:00
eadler	4f9ab6c580	Fxi a bunch of typos. PR: misc/174625 Submitted by: Jeremy Chadwick <jdc@koitsu.org>	2013-05-10 16:41:26 +00:00
mjg	1798a915c4	Add fdallocn function and use it when passing fds over unix socket. This gets rid of "unp_externalize fdalloc failed" panic. Reviewed by: pjd MFC after: 1 week	2013-04-14 17:08:34 +00:00
jilles	c9066bd014	Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC. This change allows creating file descriptors with close-on-exec set in some situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file descriptors (SCM_RIGHTS) atomically close-on-exec. The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD. MSG_CMSG_CLOEXEC is the first free bit for MSG_. The SOCK_ flags are not passed to MAC because this may cause incorrect failures and can be done later via fcntl() anyway. On the other hand, audit is expected to cope with the new flags. For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags argument. Reviewed by: kib	2013-03-19 20:58:17 +00:00
pjd	46a0405f64	Fix memory leak when one process send descriptor over UNIX domain socket, but the other process exited before receiving it.	2013-03-11 22:59:07 +00:00
pjd	c3b73942d3	For some reason when I started to pass filedescent structures instead of pointers to the file structure receiving descriptors stopped to work when also at least few kilobytes of data is being send. In the kernel the soreceive_generic() function doesn't see control mbuf as the first mbuf and unp_externalize() is never called, first 6(?) kilobytes of data is missing as well on receiving end. This breaks for example tmux. I don't know yet why going from 8 bytes to sizeof(struct filedescent) per descriptor (or even to 16 bytes per descriptor) breaks things, but to work-around it for now use 8 bytes per file descriptor at the cost of memory allocation. Reported by: flo, Diane Bruce, Jan Beich <jbeich@tormail.org> Simple testcase provided by: mjg	2013-03-03 23:39:30 +00:00
pjd	1df614f5db	Plug memory leaks in file descriptors passing.	2013-03-03 23:23:35 +00:00
pjd	702516e70b	- Implement two new system calls: int bindat(int fd, int s, const struct sockaddr addr, socklen_t addrlen); int connectat(int fd, int s, const struct sockaddr name, socklen_t namelen); which allow to bind and connect respectively to a UNIX domain socket with a path relative to the directory associated with the given file descriptor 'fd'. - Add manual pages for the new syscalls. - Make the new syscalls available for processes in capability mode sandbox. - Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on the directory descriptor for the syscalls to work. - Update audit(4) to support those two new syscalls and to handle path in sockaddr_un structure relative to the given directory descriptor. - Update procstat(1) to recognize the new capability rights. - Document the new capability rights in cap_rights_limit(2). Sponsored by: The FreeBSD Foundation Discussed with: rwatson, jilles, kib, des	2013-03-02 21:11:30 +00:00
pjd	f07ebb8888	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
pluknet	b585f0db6b	Add support of passing SCM_BINTIME ancillary data object for PF_LOCAL sockets. PR: kern/175883 Submitted by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Discussed with: glebius, phk MFC after: 2 weeks	2013-02-15 13:00:20 +00:00
pjd	24b0c94606	Configure UMA warnings for the following zones: - unp_zone: kern.ipc.maxsockets limit reached - socket_zone: kern.ipc.maxsockets limit reached - zone_mbuf: kern.ipc.nmbufs limit reached - zone_clust: kern.ipc.nmbclusters limit reached - zone_jumbop: kern.ipc.nmbjumbop limit reached - zone_jumbo9: kern.ipc.nmbjumbo9 limit reached - zone_jumbo16: kern.ipc.nmbjumbo16 limit reached Note that those warnings are printed not often than every five minutes and can be globally turned off by setting sysctl/tunable vm.zone_warnings to 0. Discussed on: arch Obtained from: WHEEL Systems MFC after: 2 weeks	2012-12-07 22:30:30 +00:00
kib	6d46d7b7ab	Schedule garbage collection run for the in-flight rights passed over the unix domain sockets to the next tick, coalescing the serial calls until the collection fires. The thought is that more work for the collector could arise in the near time, allowing to clean more and not spend too much CPU on repeated collection when there is no garbage. Currently the collection task is fired immediately upon unix domain socket close if there are any rights in flight, which caused excessive CPU usage and too long blocking of the threads waiting for unp_list_lock and unp_link_rwlock in write mode. Robert noted that it would be nice if we could find some heuristic by which we decide whether to run GC a bit more quickly. E.g., if the number of UNIX domain sockets is close to its resource limit, but not quite. Reported and tested by: Markus Gebert <markus.gebert@hostpoint.ch> Reviewed by: rwatson MFC after: 2 weeks	2012-11-20 15:45:48 +00:00
glebius	b821e8658c	Update comment.	2012-11-16 14:00:54 +00:00
kib	560aa751e0	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
mdf	394f27b845	Fix up kernel sources to be ready for a 64-bit ino_t. Original code by: Gleb Kurtsou	2012-09-27 23:30:49 +00:00
glebius	b0ab73e8f6	Supply the pr_ctloutput method for local datagram sockets, so that setsockopt() and getsockopt() work on them. This makes 'tools/regression/sockets/unix_cmsg -t dgram' more successful.	2012-09-07 21:06:54 +00:00
pjd	0ca632f7e9	When checking if file descriptor number is valid, explicitely check for 'fd' being less than 0 instead of using cast-to-unsigned hack. Today's commit was brought to you by the letters 'B', 'D' and 'E' :)	2012-06-13 22:12:10 +00:00
trociny	1aad0004ee	Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH() operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks	2012-02-29 21:38:31 +00:00
trociny	87f7f0cfe8	When detaching an unix domain socket, uipc_detach() checks unp->unp_vnode pointer to detect if there is a vnode associated with (binded to) this socket and does necessary cleanup if there is. The issue is that after forced unmount this check may be too late as the unp_vnode is reclaimed and the reference is stale. To fix this provide a helper function that is called on a socket vnode reclamation to do necessary cleanup. Pointed by: kib Reviewed by: kib MFC after: 2 weeks	2012-02-25 10:15:41 +00:00
trociny	05b0baa9d7	unp_connect() may use a shared lock on the vnode to fetch the socket. Suggested by: jhb Reviewed by: jhb, kib, rwatson MFC after: 2 weeks	2012-02-21 19:40:13 +00:00
ed	0c56cf839d	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
bz	2e8e044c2f	Fix handling of corrupt compress(1)ed data. [11:04] Add missing length checks on unix socket addresses. [11:05] Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-11:04.compress Security: CVE-2011-2895 [11:04] Security: FreeBSD-SA-11:05.unix	2011-09-28 08:47:17 +00:00
kib	1d5badd36f	Prevent the hiwatermark for the unix domain socket from becoming effectively negative. Often seen as upstream fastcgi connection timeouts in nginx when using sendfile over unix domain sockets for communication. Sendfile(2) may send more bytes then currently allowed by the hiwatermark of the socket, e.g. because the so_snd sockbuf lock is dropped after sbspace() call in the kern_sendfile() loop. In this case, recalculated hiwatermark will overflow. Since lowatermark is renewed as half of the hiwatermark by sendfile code, and both are unsigned, the send buffer never reaches the free space requested by lowatermark, causing indefinite wait in sendfile. Reviewed by: rwatson Approved by: re (bz) MFC after: 2 weeks	2011-08-20 16:12:29 +00:00
bz	b9b7d3e93a	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
kib	d690434e20	The unp_gc() function drops and reaquires lock between scan and collect phases. The unp_discard() function executes unp_externalize_fp(), which might make the socket eligible for gc-ing, and then, later, taskqueue will close the socket. Since unp_gc() dropped the list lock to do the malloc, close might happen after the mark step but before the collection step, causing collection to not find the socket and miss one array element. I believe that the race was there before r216158, but the stated revision made the window much wider by postponing the close to taskqueue sometimes. Only process as much array elements as we find the sockets during second phase of gc [1]. Take linkage lock and recheck the eligibility of the socket for gc, as well as call fhold() under the linkage lock. Reported and tested by: jmallett Submitted by: jmallett [1] Reviewed by: rwatson, jeff (possibly) MFC after: 1 week	2011-02-01 13:33:49 +00:00
mdf	ca68749f2a	Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need to rely on the format string.	2011-01-18 21:14:18 +00:00
kib	1ffd755b88	Trim whitespaces at the end of lines. Use the commit to record proper log message for r216150. MFC after: 1 week If unix socket has a unix socket attached as the rights that has a unix socket attached as the rights that has a unix socket attached as the rights ... Kernel may overflow the stack on attempt to close such socket. Only close the rights file in the context of the current close if the file is not unix domain socket. Otherwise, postpone the work to taskqueue, preventing unlimited recursion. The pass of the unix domain sockets over the SCM_RIGHTS message control is not widely used, and more, the close of the socket with still attached rights is mostly an application failure. The change should not affect the performance of typical users of SCM_RIGHTS. Reviewed by: jeff, rwatson	2010-12-03 20:39:06 +00:00
kib	44fb3ef253	Reviewed by: jeff, rwatson MFC after: 1 week	2010-12-03 16:15:44 +00:00
trasz	a5239fb269	Remove spurious '/*-' marks and fix some other style problems. Submitted by: bde@	2010-07-22 05:42:29 +00:00
trasz	3e54021797	Revert r210225 - turns out I was wrong; the "/*-" is not license-only thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@	2010-07-18 20:57:53 +00:00
trasz	935237a66a	The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright occurences from sys/sys/ and sys/kern/.	2010-07-18 20:23:10 +00:00
rwatson	60dba30ff0	Fix build on amd64, where sysctl arg1 is a pointer. Reported by: Mr Tinderbox MFC after: 3 months	2009-10-05 22:23:12 +00:00
rwatson	8579778ae2	First cut at implementing SOCK_SEQPACKET support for UNIX (local) domain sockets. This allows for reliable bi-directional datagram communication over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing UNIX domain socket code. This allows applications requiring record- oriented semantics to do so reliably via local IPC. Some implementation notes (also present in XXX comments): - Currently we lack an sbappend variant able to do datagrams and control data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR. Adding a new variant will solve this problem. - UNIX domain sockets on FreeBSD provide back-pressure/flow control notification for stream sockets by manipulating the send socket buffer's size during pru_send and pru_rcvd. This trick works less well for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to manage blocking, but also to determine maximum datagram size. Fixing this requires rethinking how back-pressure is done for SOCK_SEQPACKET; in the mean time, it's possible to get EMSGSIZE when buffers fill, instead of blocking. Discussed with: benl Reviewed by: bz, rpaulo MFC after: 3 months Sponsored by: Google	2009-10-05 14:49:16 +00:00

1 2 3 4 5 ...

283 Commits