- security.bsd.hardlink_check_uid, when set, means, that unprivileged
users are not permitted to create hard links to files not
owned by them,
- security.bsd.hardlink_check_gid, when set, means, that unprivileged
users are not permitted to create hard links to files owned
by group they don't belong to.
OK'ed by: rwatson
that we (p1) are currently running, we hold a reference on p_textvp which
means the vnode cannot go away. p2 cannot run yet (and hence cannot exit)
so this should be safe to do at this point. As a bonus, it removes a
block of under-Giant code that was there to support the vref.
- Push Giant down a bit in coredump() and call coredump() with the proc
lock already held rather than unlocking it only to turn around and
relock it.
Requested by: peter
process group and session dereferences. Also, check that p_pgrp and
p_sesssion are NULL before dereferencing them.
- Push down Giant in fork1().
Requested by: peter
introduction of kern_mlock() and kern_munlock() in
src/sys/kern/kern_sysctl.c 1.150
src/sys/vm/vm_extern.h 1.69
src/sys/vm/vm_glue.c 1.190
src/sys/vm/vm_mmap.c 1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.
Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.
Combine the best parts of each of the original sets of implementations
with further code cleanup. Make the mclock() and vslock()
implementations as similar as possible.
Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.
Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.
Tested by: Cy Schubert <Cy.Schubert AT komquats.com>
DIAGNOSTIC instead of INVARIANTS. INVARIANTS is intended for tests
that don't substantially change code flow or behavior (passive), but
this test required locking both the proc lock and scheduler lock
in order to execute. It also appears to be a very advisory diagnostic
as opposed to an invariant violation.
Following discussion with: bde
race in between sleepq_add() and sleepq_catch_signals() in that setting
td_wchan and TDF_SINTR is not atomic to sched_lock but only to the sleepq
lock. This band-aid will stop assertion failures, but there is perhaps a
larger problem with the sleepq_add/sleepq_catch_signals race that I am not
sure how to solve. For the signals case the race is harmless because we
always call cursig() after setting TDF_SINTR. However, KSE doesn't do
anything in sleepq_catch_signals() to check that this race was lost, so I
am unsure if this race is harmful for this specific abort.
functions in kern_socket.c.
Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".
Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.
Submitted by: sam
than a "waitok" argument. Callers now passing M_WAITOK or M_NOWAIT
rather than 0 or 1. This simplifies the soalloc() logic, and also
makes the waiting behavior of soalloc() more clear in the calling
context.
Submitted by: sam
generic watchdoc(9) interface.
Make watchdogd(8) perform as watchdog(8) as well, and make it
possible to specify a check command to run, timeout and sleep
periods.
Update watchdog(4) to talk about the generic interface and add
new watchdog(8) page.
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
and condition variables. Having sleep qeueus in a hash table avoids
having to allocate a queue head for each wait channel. Thus, struct cv
has shrunk down to just a single char * pointer now. However, the
hash table does not hold threads directly, but queue heads. This means
that once you have located a queue in the hash bucket, you no longer have
to walk the rest of the hash chain looking for threads. Instead, you have
a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
differentiates between cv's and sleep/wakeup. For example, calls to
abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and
cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
sleep's no longer inherently bump the priority. Instead, this is soley
a propery of msleep() which explicitly calls sched_prio() before
blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used. The
associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
dropped and replaced with a single explicit clearing of td_wchan.
TD_SET_ONSLEEPQ() would really have only made sense if it had taken
the wait channel and message as arguments anyway. Now that that only
happens in one place, a macro would be overkill.
the process state to zombie when a process exits to avoid a lock order
reversal with the sleepqueue locks. This appears to be the only place
that we call wakeup() with sched_lock held.
to queue threads sleeping on a wait channel similar to how turnstiles are
used to queue threads waiting for a lock. This subsystem will be used as
the backend for sleep/wakeup and condition variables initially. Eventually
it will also be used to replace the ithread-specific iwait thread
inhibitor.
Sleep queues are also not locked by sched_lock, so this splits sched_lock
up a bit further increasing concurrency within the scheduler. Sleep queues
also natively support timeouts on sleeps and interruptible sleeps allowing
for the reduction of a lot of duplicated code between the sleep/wakeup and
condition variable implementations. For more details on the sleep queue
implementation, check the comments in sys/sleepqueue.h and
kern/subr_sleepqueue.c.
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire(). Split munlock() in a similar way.
Enable the RLIMIT_MEMLOCK checking code in kern_mlock().
Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.
Nuke the vslock() and vsunlock() implementations, which are no
longer used.
Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.
Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails. Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.
Modify the callers of sysctl_wire_old_buffer() to look for the
error return.
Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.
Reviewed by: bms
of all, PIPE_EOF is not checked pervasively after everything that can drop
the pipe mutex and msleep(), so fix. Additionally, though it might not
harm anything, pipelock() and pipeunlock() are not used consistently.
Third, the kqueue support functions do not use the pipe mutex correctly.
Last, but absolutely not least, is a race: if pipe_busy is not set on
the closing side of the pipe, the other side that is trying to write to
that will crash BECAUSE PIPE_EOF IS NOT SET! Unconditionally set
PIPE_EOF, and get rid of all the lockups/crashes I have seen trying
to build ports.
Now I believe it is done in the right way.
Removed some XXMAC cases, we now assume 'high' integrity level for all
sysctls, except those with CTLFLAG_ANYBODY flag set. No more magic.
Reviewed by: rwatson
Approved by: rwatson, scottl (mentor)
Tested with: LINT (compilation), mac_biba(4) (functionality)