it helps only the TCP timers callout(9) usage. As the benefit for
others callout(9) usages did not reach a consensus the historical
usage should prevail.
Differential Revision: https://reviews.freebsd.org/D3078
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.
This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.
Reviewed by: rwatson, wblock
Differential Revision: https://reviews.freebsd.org/D3411
being serviced return 0 (fail) but it is applicable only
mpsafe callouts. Thanks to hselasky for finding this.
Differential Revision: https://reviews.freebsd.org/D3078 (Updated)
Submitted by: hselasky
Reviewed by: jch
was invalid. Don't trigger a mount failure (which by default means
a panic), but instead just move on to the next directive in the
configuration. This typically has us ask for the root mount.
PR: 163245
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.
Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467
Add a check to preload_search_info to make sure mod is set. Most of the
callers of preload_search_info don't check that the mod parameter is
set, which can cause page faults. While at it, remove some now unnecessary
checks before calling preload_search_info.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3440
having some children, the children' reaper is not reset to the parent.
This allows for the situation where reaper has children but not
descendands and the too strict asserts in the reap_status() fire.
Remove the wrong asserts, add some clarification for the situation to
the procctl(2) REAP_STATUS.
Reported and tested by: feld
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
zero. The file_zone if no-free, but r284861 added trashing of the
freed memory. Most visible manifestation of the issue were 'memory
modified after free' panics for the file zone, triggered from
falloc_noinstall().
Add UMA_ZONE_ZINIT flag to turn off trashing. Mjg noted that it makes
sense to not trash freed memory for any non-free zone, which will be
done later.
Reported and tested by: pho
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
being serviced and indeed unstoppable.
A scenario to reproduce this case is:
- the callout is being serviced and at same time,
- callout_reset() is called on this callout that sets
the CALLOUT_PENDING flag and at same time,
- callout_stop() is called on this callout and returns 1 (success)
even if the callout is indeed currently running and unstoppable.
This issue was caught up while making r284245 (D2763) workaround, and
was discussed at BSDCan 2015. Once applied the r284245 workaround
is not needed anymore and will be reverted.
Differential Revision: https://reviews.freebsd.org/D3078
Reviewed by: jhb
Sponsored by: Verisign, Inc.
with higher quality registers (presumably in a module that has just been
loaded), do not undo the user's choice by switching to the new timecounter.
Document that behavior, and also the fact that there is no way to unregister
a timecounter (and thus no way to unload a module containing one).
- Document the kern_kevent_anonymous() function.
- Add assertions to ensure that we don't silently leave the kqueue
linked from a file descriptor table.
Reviewed by: jmg
Differential Revision: https://reviews.freebsd.org/D3364
We already properly return ENOTDIR when calling *at() on a non-directory
vnode, but it turns out that if you call it on a socket, we see EINVAL.
Patch up namei to properly translate this to ENOTDIR.
As CloudABI processes cannot adjust their signal handlers, we need to
make sure that we start up CloudABI processes with consistent signal
masks. Though the POSIx standard signal behavior is all right, we do
need to make sure that we ignore SIGPIPE, as it would otherwise be
hard to interact with pipes and sockets.
Extend execsigs() to iterate over ps_sigignore and call sigdflt() for
each of the ignored signals.
Reviewed by: kib
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3365
CloudABI's polling system calls merge the concept of one-shot polling
(poll, select) and stateful polling (kqueue). They share the same data
structures.
Extend FreeBSD's kqueue to provide support for waiting for events on an
anonymous kqueue. Unlike stateful polling, there is no need to support
timeouts, as an additional timer event could be used instead.
Furthermore, it makes no sense to use a different number of input and
output kevents. Merge this into a single argument.
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3307
The existing sys_cap_rights_limit() expects that a cap_rights_t object
lives in userspace. It is therefore hard to call into it from
kernelspace.
Move the interesting bits of sys_cap_rights_limit() into
kern_cap_rights_limit(), so that we can call into it from the CloudABI
compatibility layer.
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3314
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.
The tunable was tested on x86 only. From the visual inspection, it
seems that it might work on arm and powerpc. The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size. I only
changed the macros to use variable instead of constant, since I cannot
test.
On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.
Sponsored by: The FreeBSD Foundation
MFC after: 2 week
This makes the PPS API behave correctly, but isn't ideal -- we still end
up capturing PPS data for non-enabled edges, we just don't process the
data into an event that becomes visible outside of kern_tc. That's because
the event type isn't passed to pps_capture(), so it can't do the filtering.
Any solution for capture filtering is going to require touching every driver.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.
By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.
Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.
Differential Revision: https://reviews.freebsd.org/D3259
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.
Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.
Reviewed by: jmg
Differential Revision: https://reviews.freebsd.org/D3303
of the timehands, from the kern_tc.c implementation to vdso. Add
comments giving hints where to look for the algorithm explanation.
To compensate the removal of rmb() in userspace binuptime(), add
explicit lfence instruction before rdtsc. On i386, add usual
complications to detect SSE2 presence; assume that old CPUs which do
not implement SSE2 also execute rdtsc almost in order.
Reviewed by: alc, bde (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3287
a pcb from stoppcbs[] rather than the thread's PCB. However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup. To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3193
CloudABI processes should run in capabilities mode automatically. There
is no need to switch manually (e.g., by calling cap_enter()). Add a
flag, SV_CAPSICUM, that can be used to call into cap_enter() during
execve().
Reviewed by: kib
This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.
Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3205
original parent. Otherwise the debugee will be set as an orphan of
the debugger.
Add tests for tracing forks via PT_FOLLOW_FORK.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2809
This allows you to specify the capabilities that the new file descriptor
should have. This allows us to create shared memory objects that only
have the rights we're interested in.
The idea behind restricting the rights is that it makes it a lot easier
for CloudABI to get consistent behaviour across different operating
systems. We only need to make sure that a shared memory implementation
consistently implements the operations that are whitelisted.
Approved by: kib
Obtained from: https://github.com/NuxiNL/freebsd
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.
Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.
Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().
Obtained from: https://github.com/NuxiNL/freebsd
has observable overhead when the buffer pages are not resident or not
mapped. The overhead comes at least from two factors, one is the
additional work needed to detect the situation, prepare and execute
the rollbacks. Another is the consequence of the i/o splitting into
the batches of the held pages, causing filesystems see series of the
smaller i/o requests instead of the single large request.
Note that expected case of the resident i/o buffer does not expose
these issues. Provide a prefaulting for the userspace i/o buffers,
disabled by default. I am careful of not enabling prefaulting by
default for now, since it would be detrimental for the applications
which speculatively pass extra-large buffers of anonymous memory to
not deal with buffer sizing (if such apps exist).
Found and tested by: bde, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
The check added in r285872 can trigger for valid buffers if the buffer space
used happens to be just after unmapped_buf in KVA space.
Discussed with: kib
Sponsored by: Citrix Systems R&D
Summary:
Pipes in CloudABI are unidirectional. The reason for this is that
CloudABI attempts to provide a uniform runtime environment across
different flavours of UNIX.
Instead of implementing a custom pipe that is unidirectional, we can
simply reuse Capsicum permission bits to support this. This is nice,
because CloudABI already attempts to restrict permission bits to
correspond with the operations that apply to a certain file descriptor.
Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes
a pair of filecaps. These filecaps are passed to the newly introduced
falloc_caps() function that creates the descriptors with rights in
place.
Test Plan:
CloudABI pipes seem to be created with proper rights in place:
https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44
Reviewers: jilles, mjg
Reviewed By: mjg
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3236
falloc_noinstall() followed by finstall() allows you to create and
install file descriptors with custom capabilities. Add falloc_caps()
that can do both of these actions in one go.
This will be used by CloudABI to create pipes with custom capabilities.
Reviewed by: mjg
'buf' is inconvenient and has lead me to some irritating to discover
bugs over the years. It also makes it more challenging to refactor
the buf allocation system.
- Move swbuf and declare it as an extern in vfs_bio.c. This is still
not perfect but better than it was before.
- Eliminate the unused ffs function that relied on knowledge of the buf
array.
- Move the shutdown code that iterates over the buf array into vfs_bio.c.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
attached to bufs to avoid the overhead of the vm. This purposes is now
better served by vmem. Freeing the kva immediately when a buf is
destroyed leads to lower fragmentation and a much simpler scan algorithm.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division