If a system call wasn't listed in capabilities.conf, return ECAPMODE at
syscall entry.
Reviewed by: anderson
Discussed with: benl, kris, pjd
Sponsored by: Google, Inc.
Obtained from: Capsicum Project
MFC after: 3 months
Add a new system call flag, SYF_CAPENABLED, which indicates that a
particular system call is available in capability mode.
Add a new configuration file, kern/capabilities.conf (similar files
may be introduced for other ABIs in the future), which enumerates
system calls that are available in capability mode. When a new
system call is added to syscalls.master, it will also need to be
added here (if needed). Teach sysent parts to use this file to set
values for SYF_CAPENABLED for the native ABI.
Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
MFC after: 3 months
compiled conditionally on options CAPABILITIES:
Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a
subject (typically a process) is in capability mode.
Add two new system calls, cap_enter(2) and cap_getmode(2), which allow
setting and querying (but never clearing) the flag.
Export the capability mode flag via process information sysctls.
Sponsored by: Google, Inc.
Reviewed by: anderson
Discussed with: benl, kris, pjd
Obtained from: Capsicum Project
MFC after: 3 months
traced process by adding two new events which records value of process
sv_flags to the trace file at process creation/execing/exiting time.
MFC after: 1 Month.
1) do not take a lock around the single atomic operation.
2) do not lose the invariant of lock by dropping/acquiring
ktrace_mtx around free() or malloc().
MFC after: 1 Month.
PMC/SYSV/...).
No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.
Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: arch@ (parts by rwatson, trasz, jhb)
X-MFC after: to be determined in last commit with code from this project
file where they are used. Declare the kern.threads sysctl node at the
same location. Since no external use for the variables exists, make them
static.
Discussed with: dchagin
MFC after: 1 week
vfs_export() fails. Restoring old options and flags after successful
VFS_MOUNT(9) call may cause the file system internal state to become
inconsistent with mount options and flags. Specifically the FFS super
block fs_ronly field and the MNT_RDONLY flag may get out of sync.
PR: kern/133614
Discussed on: freebsd-hackers
the debugger back-end has changed. This means that switching from ddb
to gdb no longer requires a "step" which can be dangerous on an
already-crashed kernel.
Also add a capability to get from the gdb back-end back to ddb, by
typing ^C in the console window.
While here, simplify kdb_sysctl_available() by using
sbuf_new_for_sysctl(), and use strlcpy() instead of strncpy() since the
strlcpy semantic is desired.
MFC after: 1 month
VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.
While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.
The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec
Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks
Catch a set vnet upon return to user space. This usually
means return paths with CURVNET_RESTORE() missing.
If VNET_DEBUG is turned on we can even tell the function
that did the CURVNET_SET() which is really helpful; else
we print "N/A".
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
MFC after: 11 days
KASSERT()s and eliminate the rest.
Replace excessive printf()s and a panic() in bufdone_finish() with a
KASSERT() in vm_page_io_finish().
Reviewed by: kib
Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.
Change the syntax to match KASSERT() to allow more flexible panic
messages rather than having a printf with hardcoded arguments
before panic.
Adjust the few assertions we have to the new format (and enhance
the output).
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
MFC after: 2 weeks
attributes for preloaded modules/images. In particular, MODINFO_ADDR has
the added complexity of not always being relocated properly. Rather than
kluging this in the various components that are affected, we handle it
in a centralized place (preload_fetch_addr()). To that end, expose a new
variable, preload_addr_relocate, that MD initialization code can set and
that turns the address attribute into a valid kernel VA.
Architectures that need the relocation: arm & powerpc (at least).
Components that can utilize this: acpi(4), md(4), fb(4), pci(4), ZFS, geli.
Sponsored by: Juniper Networks
misnamed since it was introduced and should not be globally exposed
with this name. The equivalent functionality is now available using
kern_yield(curthread->td_user_pri). The function remains
undocumented.
Bump __FreeBSD_version.
- entirely eliminate some calls to uio_yeild() as being unnecessary,
such as in a sysctl handler.
- move should_yield() and maybe_yield() to kern_synch.c and move the
prototypes from sys/uio.h to sys/proc.h
- add a slightly more generic kern_yield() that can replace the
functionality of uio_yield().
- replace source uses of uio_yield() with the functional equivalent,
or in some cases do not change the thread priority when switching.
- fix a logic inversion bug in vlrureclaim(), pointed out by bde@.
- instead of using the per-cpu last switched ticks, use a per thread
variable for should_yield(). With PREEMPTION, the only reasonable
use of this is to determine if a lock has been held a long time and
relinquish it. Without PREEMPTION, this is essentially the same as
the per-cpu variable.
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.
Noted and discussed with: bde
MFC after: 2 weeks
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().
Change several checks for a magic number of iterations to use
should_yield() instead.
MFC after: 1 week
collect phases. The unp_discard() function executes
unp_externalize_fp(), which might make the socket eligible for gc-ing,
and then, later, taskqueue will close the socket. Since unp_gc()
dropped the list lock to do the malloc, close might happen after the
mark step but before the collection step, causing collection to not
find the socket and miss one array element.
I believe that the race was there before r216158, but the stated
revision made the window much wider by postponing the close to
taskqueue sometimes.
Only process as much array elements as we find the sockets during
second phase of gc [1]. Take linkage lock and recheck the eligibility
of the socket for gc, as well as call fhold() under the linkage lock.
Reported and tested by: jmallett
Submitted by: jmallett [1]
Reviewed by: rwatson, jeff (possibly)
MFC after: 1 week
each of the threads needs more while current pool of the buffers is
exhausted, then neither thread can make progress.
Switch to nowait allocations after we got first buffer already.
Reported by: az
Reviewed by: alc (previous version)
Tested by: pho
MFC after: 1 week
The fdcheckstd() function makes sure fds 0, 1 and 2 are open by opening
/dev/null. If this fails (e.g. missing devfs or wrong permissions),
fdcheckstd() will return failure and the process will exit as if it received
SIGABRT. The KASSERT is only to check that kern_open() returns the expected
fd, given that it succeeded.
Tripping the KASSERT is most likely if fd 0 is open but fd 1 or 2 are not.
MFC after: 2 weeks
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.
Inspired by: rwatson
unfunctional. Wiring the user buffer has only been done explicitly
since r101422.
Mark the kern.disks sysctl as MPSAFE since it is and it seems to have
been mis-using the NOLOCK flag.
Partially break the KPI (but not the KBI) for the sysctl_req 'lock'
field since this member should be private and the "REQ_LOCKED" state
seems meaningless now.
before checking the validity of the next buffer pointer. Otherwise, the
buffer might be reclaimed after the check, causing iteration to run into
wrong buffer.
Reported and tested by: pho
MFC after: 1 week
- Move the realtime priority range up above kernel sleep priorities and
just below interrupt thread priorities.
- Contract the interrupt and kernel sleep priority ranges a bit so that
the timesharing priority band can be increased. The new timeshare range
is now slightly larger than the old realtime + timeshare ranges.
- Change the ULE scheduler to no longer use realtime priorities for
interactive threads. Instead, the larger timeshare range is now split
into separate subranges for interactive and non-interactive ("batch")
threads. The end result is that interactive threads and non-interactive
threads still use the same priority ranges as before, but realtime
threads now have a separate, dedicated priority range.
- Do not modify the priority of non-timeshare threads in sched_sleep()
or via cv_broadcastpri(). Realtime and idle priority threads will
no longer have their priorities affected by sleeping in the kernel.
Reviewed by: jeff
interactive timeshare threads (PRI_*_INTERACTIVE) and non-interactive
timeshare threads (PRI_*_BATCH) and use these instead of PRI_*_REALTIME
and PRI_*_TIMESHARE. No functional change.
Reviewed by: jeff
PI_DISKLOW. While here, rename PI_TTYLOW to PI_TTY.
- Add a macro PI_SWI() that takes a SWI_* constant as an argument and
returns the suitable thread priority.