execve_secure() system call, which permits a process to pass in a label
for a label change during exec. This permits SELinux to change the
label for the resulting exec without a race following a manual label
change on the process. Because this interface uses our general purpose
MAC label abstraction, we call it execve_mac(), and wrap our port of
SELinux's execve_secure() around it with appropriate sid mappings.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
unregister. Under some obscure (perhaps demented) circumstances,
this can result in a panic if a policy is unregistered, and then someone
foolishly unregisters it again.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
creation, deletion, and rename. There are one or two other stray
cases I'll catch in follow-up commits (such as unix domain socket
creation); this permits MAC policy modules to limit the ability to
perform these operations based on existing UNIX credential / vnode
attributes, extended attributes, and security labels. In the rename
case using MAC, we now have to lock the from directory and file
vnodes for the MAC check, but this is done only in the MAC case,
and the locks are immediately released so that the remainder of the
rename implementation remains the same. Because the create check
takes a vattr to know object type information, we now initialize
additional fields in the VATTR passed to VOP_SYMLINK() in the MAC
case.
Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.
Validated on: alpha, i386, ia64
link_elf_init(), link_elf_link_preload_finish() and
link_elf_load_file() to link_elf_link_common_finish().
Since link_elf_init() did initializations as a side-effect
of doing the common actions, keep the initialization in
that function. Consequently, link_elf_add_gdb() is now also
called to insert the very first link_map() (ie the kernel).
Move link_elf_add_gdb(), link_elf_delete_gdb() and link_elf_error()
near the top of the file. The *_gdb() functions are moved inside
the #ifdef DDB already present there.
cannot allocate ef->object, we freed ef before bailing out with
an error. This is wrong because ef=lf and when we have an error
and lf is non-NULL (which holds if we try to alloc ef->object),
we free lf and thus ef as part of the bailing-out.
to help clean up. After selecting a potential buffer to write, this
patch has it acquire a lock on the vnode that owns the buffer before
trying to write it. The vnode lock is necessary to avoid a race with
some other process holding the vnode locked and trying to flush its
dirty buffers. In particular, if the vnode in question is a snapshot
file, then the race can lead to a deadlock. To avoid slowing down the
buf_daemon, it does a non-blocking lock request when trying to lock
the vnode. If it fails to get the lock it skips over the buffer and
continues down its queue looking for buffers to flush.
Sponsored by: DARPA & NAI Labs.
(sizeof(destination_buffer) - 1) bytes into the destination buffer.
This was not harmful because they currently both provide space for
(MAXCOMLEN + 1) bytes.
linked in the kernel. When this condition is detected deep in the linker
internals the EEXIST error code that's returned is stomped on and instead
an ENOEXEC code is returned. This makes apps like sysinstall bitch.
the path including the terminating NUL character from
`struct sockaddr_un' rather than SOCK_MAXADDRLEN bytes.
- Use strlcpy() instead of strncpy() to copy strings.
contiguous space was being allocated from the clust_map
instead of the mbuf_map as the comments indicated. This resulted in
some address space wastage in mbuf_map.
Submitted by: Rohit Jalan <rohjal@yahoo.co.in>
o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
inpcb parameter to ip_output and ip6_output to allow the IPsec code to
locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version
Reviewed by: julian, luigi (silent), -arch, -net, darren
Approved by: julian, silence from everyone else
Obtained from: openbsd (mostly)
MFC after: 1 month
Ignoring a NULL dev in device_set_ivars() sounds wrong, KASSERT it to
non-NULL instead.
Do the same for device_get_ivars() for reasons of symmetry, though
it probably would have yielded a panic anyway, this gives more precise
diagnostics.
Absentmindedly nodded OK to by: jhb
were improperly relocated due to faulty logic in lookup_fdesc()
in elf_machdep.c. The symbol index (symidx) was bogusly used for
load modules other than the one the relocation applied to. This
resulted in bogus bindings and consequently runtime failures.
The fix is to use the symbol index only for the module being
relocated and to use the symbol name for look-ups in the
modules in the dependent list. As such, we need a function to
return the symbol name given the linker file and symbol index.
processes forked with RFTHREAD.
- Use a goto to a label for common code when exiting from fork1() in case
of an error.
- Move the RFTHREAD linkage setup code later in fork since the ppeers_lock
cannot be locked while holding a proc lock. Handle the race of a task
leader exiting and killing its peers while a peer is forking a new child.
In that case, go ahead and let the peer process proceed normally as the
parent is about to kill it. However, the task leader may have already
gone to sleep to wait for the peers to die, so the new child process may
not receive a SIGKILL from the task leader. Rather than try to destruct
the new child process, just go ahead and send it a SIGKILL directly and
add it to the p_peers list. This ensures that the task leader will wait
until both the peer process doing the fork() and the new child process
have received their KILL signals and exited.
Discussed with: truckman (earlier versions)
It must be removed because it is done without the pipe being locked
via pipelock() and therefore is vulnerable to races with pipespace()
erroneously triggering it by temporarily zero'ing out the structure
backing the pipe.
It looks as if this assertion is not needed because all manipulation
of the data changed by pipespace() _is_ protected by pipelock().
Reported by: kris, mckusick
be sure to exit the loop with vp == NULL if no candidates are found.
Formerly, this bug would cause the last vnode inspected to be used,
even if it was not available. The result was a panic "vn_finished_write:
neg cnt".
Sponsored by: DARPA & NAI Labs.
vclean() function (e.g., vp->v_vnlock = &vp->v_lock) rather
than requiring filesystems that use alternate locks to do so
in their vop_reclaim functions. This change is a further cleanup
of the vop_stdlock interface.
Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sponsored by: DARPA & NAI Labs.
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).
In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.
Sponsored by: DARPA & NAI Labs.
- Begin moving scheduler specific functionality into sched_4bsd.c
- Replace direct manipulation of scheduler data with hooks provided by the
new api.
- Remove KSE specific state modifications and single runq assumptions from
kern_switch.c
Reviewed by: -arch
the locking of the proc lock after the goto to done1 to avoid locking
the lock in an error case just so we can turn around and unlock it.
- Move the exec_setregs() stuff out from under the proc lock and after
the p_args stuff. This allows exec_setregs() to be able to sleep or
write things out to userland, etc. which ia64 does.
Tested by: peter
vcanrecycle to check a free vnode's availability. If it is
available, vcanrecycle returns an error code of zero and the
vnode in question locked. The getnewvnode routine then used
to call vn_start_write with the V_NOWAIT flag. If the filesystem
was suspended while taking a snapshot, the vn_start_write would
fail but getnewvnode would fail to unlock the vnode, instead
leaving it locked on the freelist. The result would be that the
vnode would be locked forever and would eventually hang the
system with a race to the root when it was attempted to recycle
it. This fix moves the vn_start_write check into vcanrecycle
where it will properly unlock the vnode if it is unavailable
for recycling due to filesystem suspension.
Sponsored by: DARPA & NAI Labs.
on the _file() theme that do not follow symlinks. Sync to MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
sched_lock. This means that we no longer access p_limit in mi_switch()
and the p_limit pointer can be protected by the proc lock.
- Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE
processes don't call mi_switch(), and even if they did there is no longer
the danger of p_limit being NULL (which is what the original zombie check
was added for).
- When we bump the current processes soft CPU limit in ast(), just bump the
private p_cpulimit instead of the shared rlimit. This fixes an XXX for
some value of fix. There is still a (probably benign) bug in that this
code doesn't check that the new soft limit exceeds the hard limit.
Inspired by: bde (2)
in specific situations. The owner thread must be blocked, and the
borrower can not proceed back to user space with the borrowed KSE.
The borrower will return the KSE on the next context switch where
teh owner wants it back. This removes a lot of possible
race conditions and deadlocks. It is consceivable that the
borrower should inherit the priority of the owner too.
that's another discussion and would be simple to do.
Also, as part of this, the "preallocatd spare thread" is attached to the
thread doing a syscall rather than the KSE. This removes the need to lock
the scheduler when we want to access it, as it's now "at hand".
DDB now shows a lot mor info for threaded proceses though it may need
some optimisation to squeeze it all back into 80 chars again.
(possible JKH project)
Upcalls are now "bound" threads, but "KSE Lending" now means that
other completing syscalls can be completed using that KSE before the upcall
finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is
one of the highest priorities in the KSE system.) The upcall when it happens
will present all the completed syscalls to the KSE for selection.
configuration device hierarchy. Device arrival, departure and not
matched are presently reported. This will be the basis for devd, which
I still need to polish a little more before I commit it. If you don't
use /dev/devctl, it will be a noop.
o Allow the bus_debug variable to be set via the bus.debug tunable.
o Return pnpinfo and location info via the devinfo interface to userland.
devinfo(8) needs to be updated to print it.
revision 1.218. This bug caused a "struct file" reference to be
leaked if VOP_ADVLOCK(), vn_start_write(), or mac_check_vnode_write()
failed during the open operation.
PR: kern/43739
Reported by: Arne Woerner <woerner@mediabase-gmbh.de>
Don't use snprintf where strlcpy() will do the job.
Also, a NUL is '\0' not 0 in our style (C doesn't care), so spell it like.
Remove useless {} and () in the general area of this change.
checks from the MAC tree: allow policies to perform access control
for the ability of a process to send and receive data via a socket.
At some point, we might also pass in additional address information
if an explicit address is requested on send.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
seperate entry points for each occasion:
mac_check_vnode_mmap() Check at initial mapping
mac_check_vnode_mprotect() Check at mapping protection change
mac_check_vnode_mmap_downgrade() Determine if a mapping downgrade
should take place following
subject relabel.
Implement mmap() and mprotect() entry points for labeled vnode
policies. These entry points are currently not hooked up to the
VM system in the base tree. These changes improve the consistency
of the access control interface and offer more flexibility regarding
limiting access to vnode mmaping.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
flags so that we can call malloc with M_NOWAIT if necessary, avoiding
potential sleeps while holding mutexes in the TCP syncache code.
Similar to the existing support for mbuf label allocation: if we can't
allocate all the necessary label store in each policy, we back out
the label allocation and fail the socket creation. Sync from MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
devfs VOP symlink creation by introducing a new entry point to determine
the label of the devfs_dirent prior to allocation of a vnode for the
symlink.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
point that instruments the creation of hard links. Policy implementations
to follow.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
to mbuf label initialization, that functionality was never merged to
the main tree. Go ahead and merge that functionality now. Note that
this requires policy modules to accept the case where the label
element may be destroyed even if init has not succeeded on it (in
the event that policy failed the init). This will shortly also
apply to sockets.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
order used in mac_policy.h and elsewhere. Sort order is basically
"by operation category", then "alphabetically by object". Sync to
MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
externalization, and cred label life cycle events to entirely above
devfs and vnode events. Sync from MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
entry points to better match the entry point ordering in mac_policy.h.
Big diff, no functional change; merge from the MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
- If a policy isn't registered when a policy module unloads, silently
succeed.
- Hold the policy list lock across more of the validity tests to avoid
races.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
NB: But it will enable it in all kernels not having options "NO_GEOM"
Put the GEOM related options into the intended order.
Add "options NO_GEOM" to all kernel configs apart from NOTES.
In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.
There are currently three known issues which may force people to
need the NO_GEOM option:
boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.
SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.
PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)
These issues are all being worked.
Sponsored by: DARPA & NAI Labs.
- Change mpo_init_foo(obj, label) and mpo_destroy_foo(obj, label) policy
entry points to mpo_init_foo_label(label) and
mpo_destroy_foo_label(label). This will permit the use of the same
entry points for holding temporary type-specific label during
internalization and externalization, as well as for caching purposes.
- Because of this, break out mpo_{init,destroy}_socket() and
mpo_{init,destroy}_mount() into seperate entry points for socket
main/peer labels and mount main/fs labels.
- Since the prototype for label initialization is the same across almost
all entry points, implement these entry points using common
implementations for Biba, MLS, and Test, reducing the number of
almost identical looking functions.
This simplifies policy implementation, as well as preparing us for the
merge of the new flexible userland API for managing labels on objects.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
treat it as an invalid partition.
This fixes a bug where ``dumpon <device>'' will configure the dump
device at a random offset on the disk if <device> isn't a valid
partition.
Reviewed by: phk
around limitations in the ia64 kernel stack handling code. Basically
preallocate a bunch of threads (and hence kstacks) while contigmalloc()
still works, and never free them back to the general memory pool. After
the system has been running for a while, contigmalloc() eventually fails
at a critical momemt and panics the system.
totally bogus but will hide the occurances of access of 0xbc(NULL) which
people have run into lately. This is not a proper fix, just a bandaid, until
the cause of this happening is tracked down and fixed.
Reviewed by: rwatson
dereference the struct sigio pointer without any locking. Change
fgetown() to take a reference to the pointer instead of a copy of the
pointer and call SIGIO_LOCK() before copying the pointer and
dereferencing it.
Reviewed by: rwatson
name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of
TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that
will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will
be used internally by the turnstile code and will not be specific to
mutexes. Making the change now ensures that turnstiles can be dropped
in at a later date without affecting the ABI of userland applications.