Commit Graph

14444 Commits

Author SHA1 Message Date
Julien Charbon
2ea3089cb1 Revert r286880: If at first this change made sense, it turns out
it helps only the TCP timers callout(9) usage.  As the benefit for
others callout(9) usages did not reach a consensus the historical
usage should prevail.

Differential Revision:      https://reviews.freebsd.org/D3078
2015-08-30 13:44:46 +00:00
Warner Losh
de830d432c Remove now obsolete comment.
MFC After: 2 days
2015-08-28 20:06:58 +00:00
Warner Losh
3f27281613 Per overwhelming sentiment in the code review, use FEATURE instead.
Differential Revision: https://reviews.freebsd.org/D3488
MFC After: 2 days
2015-08-28 19:53:19 +00:00
Ed Schouten
bc1ace0b96 Decompose linkat()/renameat() rights to source and target.
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.

This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.

Reviewed by:	rwatson, wblock
Differential Revision:	https://reviews.freebsd.org/D3411
2015-08-27 15:16:41 +00:00
Julien Charbon
cd252ea74d Silent a compilation warning on callout_stop() 2015-08-27 10:43:35 +00:00
Julien Charbon
682d0e15b5 In callout_stop(), do not forget to initialize not_running variable.
Thanks to hselasky for noticing that.

Differential Revision:	https://reviews.freebsd.org/D3078 (Updated)
Submitted by:		hselasky
Pointy hat to:		jch
2015-08-27 08:58:03 +00:00
Julien Charbon
0cfae4b4bc In callout_stop(), if a callout is both pending and currently
being serviced return 0 (fail) but it is applicable only
mpsafe callouts.  Thanks to hselasky for finding this.

Differential Revision:	https://reviews.freebsd.org/D3078 (Updated)
Submitted by:		hselasky
Reviewed by:		jch
2015-08-27 08:15:32 +00:00
Marcel Moolenaar
898b510468 An error of -1 from parse_mount() indicates that the specification
was invalid. Don't trigger a mount failure (which by default means
a panic), but instead just move on to the next directive in the
configuration. This typically has us ask for the root mount.

PR:		163245
2015-08-27 04:25:27 +00:00
Warner Losh
135342777c When the kernel is compiled with INVARIANTS, export that as
debug.invariants.

Differential Revision: https://reviews.freebsd.org/D3488
MFC after: 3 days
2015-08-26 23:58:03 +00:00
George V. Neville-Neil
57031f7912 Summary: Add the interactivity equations to the header comment for our
interactivity calculation routine.

Suggested by: rwatson
2015-08-26 16:36:41 +00:00
Edward Tomasz Napierala
c9ba65040f Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by:	kib@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3467
2015-08-24 13:18:13 +00:00
Edward Tomasz Napierala
6e572e084b After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-23 14:53:54 +00:00
Roger Pau Monné
e8234cfef6 preload_search_info: make sure mod is set
Add a check to preload_search_info to make sure mod is set. Most of the
callers of preload_search_info don't check that the mod parameter is
set, which can cause page faults. While at it, remove some now unnecessary
checks before calling preload_search_info.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D3440
2015-08-21 15:57:57 +00:00
Konstantin Belousov
41d50cd6b7 If process becomes reaper (procctl(PROC_REAP_ACQUIRE)) while already
having some children, the children' reaper is not reset to the parent.
This allows for the situation where reaper has children but not
descendands and the too strict asserts in the reap_status() fire.

Remove the wrong asserts, add some clarification for the situation to
the procctl(2) REAP_STATUS.

Reported and tested by:	feld
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-20 22:44:26 +00:00
Konstantin Belousov
fe5ec54b50 fget_unlocked() depends on the freed struct file f_count field being
zero.  The file_zone if no-free, but r284861 added trashing of the
freed memory.  Most visible manifestation of the issue were 'memory
modified after free' panics for the file zone, triggered from
falloc_noinstall().

Add UMA_ZONE_ZINIT flag to turn off trashing.  Mjg noted that it makes
sense to not trash freed memory for any non-free zone, which will be
done later.

Reported and tested by:	pho
Discussed with:	mjg
Sponsored by:	The FreeBSD Foundation
2015-08-19 11:53:32 +00:00
Julien Charbon
a1e6f8ff27 callout_stop() should return 0 (fail) when the callout is currently
being serviced and indeed unstoppable.

A scenario to reproduce this case is:

- the callout is being serviced and at same time,
- callout_reset() is called on this callout that sets
  the CALLOUT_PENDING flag and at same time,
- callout_stop() is called on this callout and returns 1 (success)
  even if the callout is indeed currently running and unstoppable.

This issue was caught up while making r284245 (D2763) workaround, and
was discussed at BSDCan 2015.  Once applied the r284245 workaround
is not needed anymore and will be reverted.

Differential Revision:	https://reviews.freebsd.org/D3078
Reviewed by:		jhb
Sponsored by:		Verisign, Inc.
2015-08-18 10:15:09 +00:00
Rui Paulo
aea3463e34 genassym.sh: call nm(1) with NMFLAGS. 2015-08-14 22:57:13 +00:00
Ian Lepore
e8bac3f240 If a specific timecounter has been chosen via sysctl, and a new timecounter
with higher quality registers (presumably in a module that has just been
loaded), do not undo the user's choice by switching to the new timecounter.

Document that behavior, and also the fact that there is no way to unregister
a timecounter (and thus no way to unload a module containing one).
2015-08-12 20:50:20 +00:00
Mariusz Zaborski
53bf545ddb When the wait*(2) syscalls wait for any process (P_ALL), they should
ignore processes created with the pdfork(2) syscall.

PR:		201054
Approved by:	pjd (mentor)
Discussed with:	emaste, rwatson
2015-08-12 20:08:54 +00:00
Ed Schouten
880e2c6c52 Perform cleanups in response to D3307.
- Document the kern_kevent_anonymous() function.
- Add assertions to ensure that we don't silently leave the kqueue
  linked from a file descriptor table.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3364
2015-08-12 17:46:26 +00:00
Ed Schouten
8c43e4ccfa Properly return ENOTDIR when calling *at() on a non-vnode.
We already properly return ENOTDIR when calling *at() on a non-directory
vnode, but it turns out that if you call it on a socket, we see EINVAL.
Patch up namei to properly translate this to ENOTDIR.
2015-08-12 16:17:00 +00:00
Ed Schouten
f3fe76ecd8 Unignore signals when starting CloudABI processes.
As CloudABI processes cannot adjust their signal handlers, we need to
make sure that we start up CloudABI processes with consistent signal
masks. Though the POSIx standard signal behavior is all right, we do
need to make sure that we ignore SIGPIPE, as it would otherwise be
hard to interact with pipes and sockets.

Extend execsigs() to iterate over ps_sigignore and call sigdflt() for
each of the ignored signals.

Reviewed by:	kib
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3365
2015-08-12 11:30:31 +00:00
Ed Schouten
e26f6b5f6b Add support for anonymous kqueues.
CloudABI's polling system calls merge the concept of one-shot polling
(poll, select) and stateful polling (kqueue). They share the same data
structures.

Extend FreeBSD's kqueue to provide support for waiting for events on an
anonymous kqueue. Unlike stateful polling, there is no need to support
timeouts, as an additional timer event could be used instead.
Furthermore, it makes no sense to use a different number of input and
output kevents. Merge this into a single argument.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3307
2015-08-11 13:47:23 +00:00
Ed Schouten
aa04a06df5 Introduce kern_cap_rights_limit().
The existing sys_cap_rights_limit() expects that a cap_rights_t object
lives in userspace. It is therefore hard to call into it from
kernelspace.

Move the interesting bits of sys_cap_rights_limit() into
kern_cap_rights_limit(), so that we can call into it from the CloudABI
compatibility layer.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3314
2015-08-11 08:43:50 +00:00
Konstantin Belousov
edc8222303 Make kstack_pages a tunable on arm, x86, and powepc. On i386, the
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.

The tunable was tested on x86 only.  From the visual inspection, it
seems that it might work on arm and powerpc.  The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size.  I only
changed the macros to use variable instead of constant, since I cannot
test.

On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
2015-08-10 17:18:21 +00:00
Alexander V. Chernikov
0cbefd30cb Add const-qualifiers for source mbuf argument in m_dup(), m_copym(),
m_dup_pkthdr() and m_tag_copy_chain().
2015-08-08 15:50:46 +00:00
Ian Lepore
721b581722 Only process the PPS event types currently enabled in pps_params.mode.
This makes the PPS API behave correctly, but isn't ideal -- we still end
up capturing PPS data for non-enabled edges, we just don't process the
data into an event that becomes visible outside of kern_tc.  That's because
the event type isn't passed to pps_capture(), so it can't do the filtering.
Any solution for capture filtering is going to require touching every driver.
2015-08-07 23:31:31 +00:00
Ian Lepore
6f7a9f7c8d RFC 2783 requires a status of ETIMEDOUT, not EWOULDBLOCK, on a timeout. 2015-08-07 21:14:19 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
2433a4eb04 Make it possible to implement poll(2) on top of kqueue(2).
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3303
2015-08-05 07:34:29 +00:00
Konstantin Belousov
35dfc644f5 Copy the fencing of the algorithm to do lock-less update and reading
of the timehands, from the kern_tc.c implementation to vdso.  Add
comments giving hints where to look for the algorithm explanation.

To compensate the removal of rmb() in userspace binuptime(), add
explicit lfence instruction before rdtsc.  On i386, add usual
complications to detect SSE2 presence; assume that old CPUs which do
not implement SSE2 also execute rdtsc almost in order.

Reviewed by:	alc, bde (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 12:33:51 +00:00
Edward Tomasz Napierala
57a73b26e0 Mark vgonel() as static. It was already declared static earlier;
no idea why compilers don't warn about this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-04 08:51:56 +00:00
Ed Schouten
dc4b532479 Fix bad arithmetic in umtx_key_get() to compute object offset.
It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3287
2015-08-04 06:01:13 +00:00
Ed Schouten
52942c1eae Add missing const keyword to function parameter.
The umtx_key_get() function does not dereference the address off the
userspace object. The pointer can safely be const.
2015-08-03 21:11:33 +00:00
John Baldwin
92de34df2c kgdb uses td_oncpu to determine if a thread is running and should use
a pcb from stoppcbs[] rather than the thread's PCB.  However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup.  To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3193
2015-08-03 20:43:36 +00:00
Ed Schouten
39f5ebb774 Add sysent flag to switch to capabilities mode on startup.
CloudABI processes should run in capabilities mode automatically. There
is no need to switch manually (e.g., by calling cap_enter()). Add a
flag, SV_CAPSICUM, that can be used to call into cap_enter() during
execve().

Reviewed by:	kib
2015-08-03 13:41:47 +00:00
Mark Johnston
ce1c953ee0 Don't modify curthread->td_locks unless INVARIANTS is enabled.
This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3205
2015-08-02 00:03:08 +00:00
John Baldwin
98685dc8af Clear P_TRACED before reparenting a detached process back to its
original parent. Otherwise the debugee will be set as an orphan of
the debugger.

Add tests for tracing forks via PT_FOLLOW_FORK.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2809
2015-08-01 16:27:52 +00:00
Ed Schouten
7ee1b208c3 Add kern_shm_open().
This allows you to specify the capabilities that the new file descriptor
should have. This allows us to create shared memory objects that only
have the rights we're interested in.

The idea behind restricting the rights is that it makes it a lot easier
for CloudABI to get consistent behaviour across different operating
systems. We only need to make sure that a shared memory implementation
consistently implements the operations that are whitelisted.

Approved by:	kib
Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-01 07:21:14 +00:00
Ed Schouten
6236e71bfe Fix accidental line wrapping introduced in r286122. 2015-07-31 10:46:45 +00:00
Ed Schouten
367a13f905 Limit rights on process descriptors.
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.

Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.

Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-31 10:21:58 +00:00
Konstantin Belousov
8917728875 vn_io_fault() handling of the LOR for i/o into the file-backed buffers
has observable overhead when the buffer pages are not resident or not
mapped.  The overhead comes at least from two factors, one is the
additional work needed to detect the situation, prepare and execute
the rollbacks.  Another is the consequence of the i/o splitting into
the batches of the held pages, causing filesystems see series of the
smaller i/o requests instead of the single large request.

Note that expected case of the resident i/o buffer does not expose
these issues.  Provide a prefaulting for the userspace i/o buffers,
disabled by default.  I am careful of not enabling prefaulting by
default for now, since it would be detrimental for the applications
which speculatively pass extra-large buffers of anonymous memory to
not deal with buffer sizing (if such apps exist).

Found and tested by:	bde, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-31 04:12:51 +00:00
Mateusz Guzik
4ae1e3c752 Revert r285125 until rmlocks get fixed.
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
2015-07-30 19:52:43 +00:00
Roger Pau Monné
c023d8234b vfs: fill fallout from r286076
This right operator is >= not =>.

Reported by: cem
2015-07-30 15:43:26 +00:00
Roger Pau Monné
8f89a299e2 vfs: fix off-by-one error in vfs_buf_check_mapped
The check added in r285872 can trigger for valid buffers if the buffer space
used happens to be just after unmapped_buf in KVA space.

Discussed with: kib
Sponsored by: Citrix Systems R&D
2015-07-30 15:28:06 +00:00
Ed Schouten
8328babdd0 Make pipes in CloudABI work.
Summary:
Pipes in CloudABI are unidirectional. The reason for this is that
CloudABI attempts to provide a uniform runtime environment across
different flavours of UNIX.

Instead of implementing a custom pipe that is unidirectional, we can
simply reuse Capsicum permission bits to support this. This is nice,
because CloudABI already attempts to restrict permission bits to
correspond with the operations that apply to a certain file descriptor.

Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes
a pair of filecaps. These filecaps are passed to the newly introduced
falloc_caps() function that creates the descriptors with rights in
place.

Test Plan:
CloudABI pipes seem to be created with proper rights in place:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44

Reviewers: jilles, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3236
2015-07-29 17:18:27 +00:00
Ed Schouten
e555b4309c Introduce falloc_caps() to create descriptors with capabilties in place.
falloc_noinstall() followed by finstall() allows you to create and
install file descriptors with custom capabilities. Add falloc_caps()
that can do both of these actions in one go.

This will be used by CloudABI to create pipes with custom capabilities.

Reviewed by:	mjg
2015-07-29 17:16:53 +00:00
Konstantin Belousov
6cebf7e2be Move bufshutdown() out of the #ifdef INVARIANTS block. 2015-07-29 09:57:34 +00:00
Jeff Roberson
98082691bb - Make 'struct buf *buf' private to vfs_bio.c. Having a global variable
'buf' is inconvenient and has lead me to some irritating to discover
   bugs over the years.  It also makes it more challenging to refactor
   the buf allocation system.
 - Move swbuf and declare it as an extern in vfs_bio.c.  This is still
   not perfect but better than it was before.
 - Eliminate the unused ffs function that relied on knowledge of the buf
   array.
 - Move the shutdown code that iterates over the buf array into vfs_bio.c.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-07-29 02:26:57 +00:00
Jeff Roberson
38750ada8f - Eliminate the EMPTYKVA queue. It served as a cache of KVA allocations
attached to bufs to avoid the overhead of the vm.  This purposes is now
   better served by vmem.  Freeing the kva immediately when a buf is
   destroyed leads to lower fragmentation and a much simpler scan algorithm.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-07-28 20:24:09 +00:00