freebsd-nq

Author	SHA1	Message	Date
Ed Schouten	f3fe76ecd8	Unignore signals when starting CloudABI processes. As CloudABI processes cannot adjust their signal handlers, we need to make sure that we start up CloudABI processes with consistent signal masks. Though the POSIx standard signal behavior is all right, we do need to make sure that we ignore SIGPIPE, as it would otherwise be hard to interact with pipes and sockets. Extend execsigs() to iterate over ps_sigignore and call sigdflt() for each of the ignored signals. Reviewed by: kib Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3365	2015-08-12 11:30:31 +00:00
Ed Schouten	e26f6b5f6b	Add support for anonymous kqueues. CloudABI's polling system calls merge the concept of one-shot polling (poll, select) and stateful polling (kqueue). They share the same data structures. Extend FreeBSD's kqueue to provide support for waiting for events on an anonymous kqueue. Unlike stateful polling, there is no need to support timeouts, as an additional timer event could be used instead. Furthermore, it makes no sense to use a different number of input and output kevents. Merge this into a single argument. Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3307	2015-08-11 13:47:23 +00:00
Ed Schouten	aa04a06df5	Introduce kern_cap_rights_limit(). The existing sys_cap_rights_limit() expects that a cap_rights_t object lives in userspace. It is therefore hard to call into it from kernelspace. Move the interesting bits of sys_cap_rights_limit() into kern_cap_rights_limit(), so that we can call into it from the CloudABI compatibility layer. Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3314	2015-08-11 08:43:50 +00:00
Konstantin Belousov	edc8222303	Make kstack_pages a tunable on arm, x86, and powepc. On i386, the initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week	2015-08-10 17:18:21 +00:00
Alexander V. Chernikov	0cbefd30cb	Add const-qualifiers for source mbuf argument in m_dup(), m_copym(), m_dup_pkthdr() and m_tag_copy_chain().	2015-08-08 15:50:46 +00:00
Ian Lepore	721b581722	Only process the PPS event types currently enabled in pps_params.mode. This makes the PPS API behave correctly, but isn't ideal -- we still end up capturing PPS data for non-enabled edges, we just don't process the data into an event that becomes visible outside of kern_tc. That's because the event type isn't passed to pps_capture(), so it can't do the filtering. Any solution for capture filtering is going to require touching every driver.	2015-08-07 23:31:31 +00:00
Ian Lepore	6f7a9f7c8d	RFC 2783 requires a status of ETIMEDOUT, not EWOULDBLOCK, on a timeout.	2015-08-07 21:14:19 +00:00
Ed Schouten	a2034cc98a	Allow the creation of kqueues with a restricted set of Capsicum rights. On CloudABI we want to create file descriptors with just the minimal set of Capsicum rights in place. The reason for this is that it makes it easier to obtain uniform behaviour across different operating systems. By explicitly whitelisting the operations, we can return consistent error codes, but also prevent applications from depending OS-specific behaviour. Extend kern_kqueue() to take an additional struct filecaps that is passed on to falloc_caps(). Update the existing consumers to pass in NULL. Differential Revision: https://reviews.freebsd.org/D3259	2015-08-05 07:36:50 +00:00
Ed Schouten	2433a4eb04	Make it possible to implement poll(2) on top of kqueue(2). It looks like EVFILT_READ and EVFILT_WRITE trigger under the same conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX. The only difference is that POLLRDNORM has to be triggered on regular files unconditionally, whereas EVFILT_READ only triggers when not EOF. Introduce a new flag, NOTE_FILE_POLL, that can be used to make EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag will be used by cloudlibc's poll() function. Reviewed by: jmg Differential Revision: https://reviews.freebsd.org/D3303	2015-08-05 07:34:29 +00:00
Konstantin Belousov	35dfc644f5	Copy the fencing of the algorithm to do lock-less update and reading of the timehands, from the kern_tc.c implementation to vdso. Add comments giving hints where to look for the algorithm explanation. To compensate the removal of rmb() in userspace binuptime(), add explicit lfence instruction before rdtsc. On i386, add usual complications to detect SSE2 presence; assume that old CPUs which do not implement SSE2 also execute rdtsc almost in order. Reviewed by: alc, bde (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-04 12:33:51 +00:00
Edward Tomasz Napierala	57a73b26e0	Mark vgonel() as static. It was already declared static earlier; no idea why compilers don't warn about this. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-08-04 08:51:56 +00:00
Ed Schouten	dc4b532479	Fix bad arithmetic in umtx_key_get() to compute object offset. It looks like umtx_key_get() has the addition and subtraction the wrong way around, meaning that it fails to match in certain cases. This causes the cloudlibc unit tests to deadlock in certain cases. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3287	2015-08-04 06:01:13 +00:00
Ed Schouten	52942c1eae	Add missing const keyword to function parameter. The umtx_key_get() function does not dereference the address off the userspace object. The pointer can safely be const.	2015-08-03 21:11:33 +00:00
John Baldwin	92de34df2c	kgdb uses td_oncpu to determine if a thread is running and should use a pcb from stoppcbs[] rather than the thread's PCB. However, exited threads retained td_oncpu from the last time they ran, and newborn threads had their CPU fields cleared to zero during fork and thread creation since they are in the set of fields zeroed when threads are setup. To fix, explicitly update the CPU fields for exiting threads in sched_throw() to reflect the switch out and reset the CPU fields for new threads in sched_fork_thread() to NOCPU. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3193	2015-08-03 20:43:36 +00:00
Ed Schouten	39f5ebb774	Add sysent flag to switch to capabilities mode on startup. CloudABI processes should run in capabilities mode automatically. There is no need to switch manually (e.g., by calling cap_enter()). Add a flag, SV_CAPSICUM, that can be used to call into cap_enter() during execve(). Reviewed by: kib	2015-08-03 13:41:47 +00:00
Mark Johnston	ce1c953ee0	Don't modify curthread->td_locks unless INVARIANTS is enabled. This field is only used in a KASSERT that verifies that no locks are held when returning to user mode. Moreover, the td_locks accounting is only correct when LOCK_DEBUG > 0, which is implied by INVARIANTS. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D3205	2015-08-02 00:03:08 +00:00
John Baldwin	98685dc8af	Clear P_TRACED before reparenting a detached process back to its original parent. Otherwise the debugee will be set as an orphan of the debugger. Add tests for tracing forks via PT_FOLLOW_FORK. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D2809	2015-08-01 16:27:52 +00:00
Ed Schouten	7ee1b208c3	Add kern_shm_open(). This allows you to specify the capabilities that the new file descriptor should have. This allows us to create shared memory objects that only have the rights we're interested in. The idea behind restricting the rights is that it makes it a lot easier for CloudABI to get consistent behaviour across different operating systems. We only need to make sure that a shared memory implementation consistently implements the operations that are whitelisted. Approved by: kib Obtained from: https://github.com/NuxiNL/freebsd	2015-08-01 07:21:14 +00:00
Ed Schouten	6236e71bfe	Fix accidental line wrapping introduced in r286122.	2015-07-31 10:46:45 +00:00
Ed Schouten	367a13f905	Limit rights on process descriptors. On CloudABI, the rights bits returned by cap_rights_get() match up with the operations that you can actually perform on the file descriptor. Limiting the rights is good, because it makes it easier to get uniform behaviour across different operating systems. If process descriptors on FreeBSD would suddenly gain support for any new file operation, this wouldn't become exposed to CloudABI processes without first extending the rights. Extend fork1() to gain a 'struct filecaps' argument that allows you to construct process descriptors with custom rights. Use this in cloudabi_sys_proc_fork() to limit the rights to just fstat() and pdwait(). Obtained from: https://github.com/NuxiNL/freebsd	2015-07-31 10:21:58 +00:00
Konstantin Belousov	8917728875	vn_io_fault() handling of the LOR for i/o into the file-backed buffers has observable overhead when the buffer pages are not resident or not mapped. The overhead comes at least from two factors, one is the additional work needed to detect the situation, prepare and execute the rollbacks. Another is the consequence of the i/o splitting into the batches of the held pages, causing filesystems see series of the smaller i/o requests instead of the single large request. Note that expected case of the resident i/o buffer does not expose these issues. Provide a prefaulting for the userspace i/o buffers, disabled by default. I am careful of not enabling prefaulting by default for now, since it would be detrimental for the applications which speculatively pass extra-large buffers of anonymous memory to not deal with buffer sizing (if such apps exist). Found and tested by: bde, emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-07-31 04:12:51 +00:00
Mateusz Guzik	4ae1e3c752	Revert r285125 until rmlocks get fixed. Right now there is a chance that sysctl unregister will cause reader to block on the sx lock associated with sysctl rmlock, in which case kernels with debug enabled will panic.	2015-07-30 19:52:43 +00:00
Roger Pau Monné	c023d8234b	vfs: fill fallout from r286076 This right operator is >= not =>. Reported by: cem	2015-07-30 15:43:26 +00:00
Roger Pau Monné	8f89a299e2	vfs: fix off-by-one error in vfs_buf_check_mapped The check added in r285872 can trigger for valid buffers if the buffer space used happens to be just after unmapped_buf in KVA space. Discussed with: kib Sponsored by: Citrix Systems R&D	2015-07-30 15:28:06 +00:00
Ed Schouten	8328babdd0	Make pipes in CloudABI work. Summary: Pipes in CloudABI are unidirectional. The reason for this is that CloudABI attempts to provide a uniform runtime environment across different flavours of UNIX. Instead of implementing a custom pipe that is unidirectional, we can simply reuse Capsicum permission bits to support this. This is nice, because CloudABI already attempts to restrict permission bits to correspond with the operations that apply to a certain file descriptor. Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes a pair of filecaps. These filecaps are passed to the newly introduced falloc_caps() function that creates the descriptors with rights in place. Test Plan: CloudABI pipes seem to be created with proper rights in place: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44 Reviewers: jilles, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3236	2015-07-29 17:18:27 +00:00
Ed Schouten	e555b4309c	Introduce falloc_caps() to create descriptors with capabilties in place. falloc_noinstall() followed by finstall() allows you to create and install file descriptors with custom capabilities. Add falloc_caps() that can do both of these actions in one go. This will be used by CloudABI to create pipes with custom capabilities. Reviewed by: mjg	2015-07-29 17:16:53 +00:00
Konstantin Belousov	6cebf7e2be	Move bufshutdown() out of the #ifdef INVARIANTS block.	2015-07-29 09:57:34 +00:00
Jeff Roberson	98082691bb	- Make 'struct buf *buf' private to vfs_bio.c. Having a global variable 'buf' is inconvenient and has lead me to some irritating to discover bugs over the years. It also makes it more challenging to refactor the buf allocation system. - Move swbuf and declare it as an extern in vfs_bio.c. This is still not perfect but better than it was before. - Eliminate the unused ffs function that relied on knowledge of the buf array. - Move the shutdown code that iterates over the buf array into vfs_bio.c. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-07-29 02:26:57 +00:00
Jeff Roberson	38750ada8f	- Eliminate the EMPTYKVA queue. It served as a cache of KVA allocations attached to bufs to avoid the overhead of the vm. This purposes is now better served by vmem. Freeing the kva immediately when a buf is destroyed leads to lower fragmentation and a much simpler scan algorithm. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-07-28 20:24:09 +00:00
Ed Schouten	b114aa7959	Make shutdown() return ENOTCONN as required by POSIX, part deux. Summary: Back in 2005, maxim@ attempted to fix shutdown() to return ENOTCONN in case the socket was not connected (r150152). This had to be rolled back (r150155), as it broke some of the existing programs that depend on this behavior. I reapplied this change on my system and indeed, syslogd failed to start up. I fixed this back in February (279016) and MFC'ed it to the supported stable branches. Apart from that, things seem to work out all right. Since at least Linux and Mac OS X do the right thing, I'd like to go ahead and give this another try. To keep old copies of syslogd working, only start returning ENOTCONN for recent binaries. I took a look at the XNU sources and they seem to test against both SS_ISCONNECTED, SS_ISCONNECTING and SS_ISDISCONNECTING, instead of just SS_ISCONNECTED. That seams reasonable, so let's do the same. Test Plan: This issue was uncovered while writing tests for shutdown() in CloudABI: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/sys/socket/shutdown_test.c#L26 Reviewers: glebius, rwatson, #manpages, gnn, #network Reviewed By: gnn, #network Subscribers: bms, mjg, imp Differential Revision: https://reviews.freebsd.org/D3039	2015-07-27 13:17:57 +00:00
Andrey V. Elsukov	41f5f69f96	Build debug version of rmlock's methods only when LOCK_DEBUG > 0. Currently LOCK_DEBUG is always defined in sys/lock.h (0 or 1). This means that debugging code always built. In addition the kernel modules have always defined LOCK_DEBUG as 1. So, debugging rmlock code is always used by kernel modules. MFC after: 1 week	2015-07-26 10:53:32 +00:00
Konstantin Belousov	6fd04eff66	With the removal of b_saveaddr in the r285819, b_data must be reset to b_kvabase when the buffer is reclaimed. Otherwise, if b_data for the mapped buffer was adjusted with the page-offset portion of b_offset, nothing would re-adjust the b_data, which breaks buffer management code which expects page-aligned b_data (see e.g. bpman_qenter(), which skips partial pages). Fix a minor issue with the GB_KVAALLOC requests, which could result in returning the mapped buffer if the reused buffer is mapped and have the right amount of KVA reserved. Improve assertion in the vfs_buf_check_mapped() to catch unmapped buffers which have their b_data incorrectly adjusted with offset. Reported and tested by: pho (previous version) Reviewed by: jeff (previous version) Sponsored by: The FreeBSD Foundation	2015-07-25 15:00:14 +00:00
Xin LI	1a7c14aec7	Fix a typo in comment. Submitted by: Yanhui Shen via twitter MFC after: 3 days	2015-07-24 22:13:39 +00:00
Marius Strobl	7815d3948c	o Revert the other functional half of r239864, i. e. the merge of r134227 from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus() but also for the MD cache invalidation, TLB demapping and remote register reading IPIs due to the following reasons: - The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on sparc64. That's because on sparc64, spin locks don't disable interrupts completely but only raise the processor interrupt level to PIL_TICK. This means that IPIs still get delivered and direct dispatch IPIs such as the cache invalidation etc. IPIs in question are still executed. - In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action(). Consequently, smp_ipi_mtx may be locked for an extended amount of time as queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are scheduled via a soft interrupt. Moreover, given that this soft interrupt is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action() on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in turn leading to the target in question trying to send an IPI by itself while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a deadlock. o As mentioned in the commit message of r245850, on least some sun4u platforms concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers, i. e. ipi_{all_but_self,cpu,selected}(). o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually are started, when smp_started is no longer 0. o In all MD and MI IPI helpers, check for smp_started == 1 rather than for smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs trying to be delivered to APs that in fact aren't up and running, yet. While at it, move setting of the cpu_ipi_{selected,single}() pointers to the appropriate delivery functions from mp_init() to cpu_mp_start() where it's better suited and allows to get rid of the global isjbus variable. o Given that now concurrent IPI delivery no longer is possible, also nuke the delays before completely disabling interrupts again in the CPU-specific cross-trap delivery functions, previously giving other CPUs a window for sending IPIs on their part. Actually, we now should be able to entirely get rid of completely disabling interrupts in these functions. Such a change needs more testing, though. o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not necessary for correctness, this avoids page faults when accessing the stack of a foreign CPU as {s,}tick now is locked into the TLBs as part of static kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as possible, avoiding jitter. PR: 201245 MFC after: 3 days	2015-07-24 15:13:21 +00:00
Sergey Kandaurov	ef88ae77ea	Call ksem_get() with initialized 'rights'. ksem_get() consumes fget(), and it's mandatory there. Reported by: truckman Reviewed by: mjg	2015-07-23 23:18:03 +00:00
Jeff Roberson	fade8dd714	Refactor unmapped buffer address handling. - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division	2015-07-23 19:13:41 +00:00
Jeff Roberson	1c1ddc0351	- Don't defeat the FIFO nature of the buffer cache by eliminating the most recently used buffer when we are under paging pressure. This is a perversion of the buffer and page replacement algorithms and recent improvements to the page daemon have rendered it unnecessary. In the event that low-memory deadlocks become an issue it would be possible to make a daemon or event handler that performs a similar action on the oldest buffers rather than the newest. Since the buf cache is analogous to the page cache and some minimum working set is desired another possibility is to simply shrink the minimum working set which has less downside now that file pages are not directly mapped. Sponsored by: EMC / Isilon Reviewed by: alc, kib (with some minor objection) Tested by: pho	2015-07-23 02:20:41 +00:00
Konstantin Belousov	e637a6e3f9	The smp_rendezvous_cpus() function should ensure that all accesses done by the functions called on other CPUs, are visible to the caller. Pair otherwise useless acquire on smp_rv_waiters[3] with a release add to ensure synchronized with relation, which guarantees visibility. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-07-21 22:56:46 +00:00
Konstantin Belousov	01f5e0866b	The part of r285680 which removed release semantic for two stores to it_need was wrong []. Restore the releases and add a comment explaining why it is needed. Noted by: alc [] Reviewed by: bde [*] Sponsored by: The FreeBSD Foundation	2015-07-21 14:39:34 +00:00
Sergey Kandaurov	94df6fad1d	Fix sb_state constant names as used e.g. to display in DDB ``show sockbuf''. MFC after: 1 week	2015-07-21 09:57:13 +00:00
Ed Schouten	5a170c1b0e	Add an API for easily creating userspace threads in kernelspace. This change refactors the existing create_thread() function to be more generic. It replaces almost all of its arguments by a callback that can be used to extract the thread ID and copy it out to the right place, but also to perform additional initialization steps, such as setting the trapframe. This also makes the difference between thr_new() and thr_create() more clear in my opinion. This function is going to be used by the CloudABI compatibility layer. It looks like the OpenSolaris compatibility framework already provides a function called thread_create(). Rename this function to do_thread_create() and use a macro to deal with the namespacing conflict. A similar approach is already used for thread_exit(). MFC after: 1 month	2015-07-20 10:20:04 +00:00
Alexander Motin	d3e2e28e74	Fix typo in comment. Submitted by: Masao Uebayashi	2015-07-20 09:37:42 +00:00
Mark Johnston	97cc6870f6	Don't increment the spin count until after the first attempt to acquire a rwlock read lock. Otherwise the lockstat:::rw-spin probe will fire spuriously. MFC after: 1 week	2015-07-19 22:26:02 +00:00
Kirk McKusick	1b79b9498b	Restructure code for readability improvement. No functional change. Reviewed by: kib	2015-07-19 22:25:16 +00:00
Mark Johnston	de2c95cc00	Consistently use a reader/writer flag for lockstat probes in rwlock(9) and sx(9), rather than using the probe function name to determine whether a given lock is a read lock or a write lock. Update lockstat(1) accordingly.	2015-07-19 22:24:33 +00:00
Mark Johnston	32cd0147fa	Implement the lockstat provider using SDT(9) instead of the custom provider in lockstat.ko. This means that lockstat probes now have typed arguments and will utilize SDT probe hot-patching support when it arrives. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D2993	2015-07-19 22:14:09 +00:00
Marcelo Araujo	f19e47d691	Add support to the jail framework to be able to mount linsysfs(5) and linprocfs(5). Differential Revision: D2846 Submitted by: Nikolai Lifanov <lifanov@mail.lifanov.com> Reviewed by: jamie	2015-07-19 08:52:35 +00:00
Konstantin Belousov	283dfee925	Further cleanup after r285607. Remove useless release semantic for some stores to it_need. For stores where the release is needed, add a comment explaining why. Fence after the atomic_cmpset() op on the it_need should be acquire only, release is not needed (see above). The combination of atomic_cmpset() + fence_acq() is better expressed there as atomic_cmpset_acq(). Use atomic_cmpset() for swi' ih_need read and clear. Discussed with: alc, bde Reviewed by: bde Comments wording provided by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-07-18 19:59:29 +00:00
Konstantin Belousov	b4490c6e93	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation	2015-07-18 09:02:50 +00:00
Mark Johnston	c6d48c8752	Fix the !KDTRACE_HOOKS build. X-MFC-With: r285664	2015-07-18 04:38:11 +00:00

1 2 3 4 5 ...

14423 Commits