implementation. Move from direct uid 0 comparision to using suser_xxx()
call with the same semantics. Simplify CAN_AFFECT() macro as passed
pcred was redundant. The checks here still aren't "right", but they
are probably "better".
Obtained from: TrustedBSD Project
%fs and %gs registers instead of setting them to known sane values.
%fs is going to be used for thread/KSE specific data by the new
threads library; we'll want it to be valid inside of signal handlers.
According to bde, Linux preserves the state of %fs and %gs when setting
up signal handlers, so there is precedent for doing this.
The same changes should be made in the Linux emulator, but when made,
they seem to break (at least one version of) the IBM JDK for Linux
(reported by drew).
Approved by: bde
struct lock_instance that is stored in the per-process and per-CPU lock
lists. Previously, the lock lists just kept a pointer to each lock held.
That pointer is now replaced by a lock instance which contains a pointer
to the lock object, the file and line of the last acquisition of a lock,
and various flags about a lock including its recursion count.
- If we sleep while holding a sleepable lock, then mark that lock instance
as having slept and ignore any lock order violations that occur while
acquiring Giant when we wake up with slept locks. This is ok because of
Giant's special nature.
- Allow witness to differentiate between shared and exclusive locks and
unlocks of a lock. Witness will now detect the case when a lock is
acquired first in one mode and then in another. Mutexes are always
locked and unlocked exclusively. Witness will also now detect the case
where a process attempts to unlock a shared lock while holding an
exclusive lock and vice versa.
- Fix a bug in the lock list implementation where we used the wrong
constant to detect the case where a lock list entry was full.
uses lockmgr locks and this leads to a lock order reversal. At this point
in wait1() the process is not on any process lists or in the process tree,
so no other process should be able to find it or have a reference to it
anyways, so the locking is not needed.
longer includes machine/elf.h.
* consumers of elf.h now use the minimalist elf header possible.
This change is motivated by Binutils 2.11.0 and too much clashing over
our base elf headers and the Binutils elf headers.
- Allocate zeroed memory in ether_resolvemulti() to prevent equal() from
comparing garbage and determining that two otherwise-equal sockaddr_dls
are different.
- Fill in all required fields of the sockaddr_dl
- Actually copy the multicast address into the sockaddr_dl when calling
if_addmulti()
- Don't claim that we don't have a way to resolve layer 3 addresses into
layer 2 addresses; use the ethernet way.
handling, SMPng always switches the npx context away from curproc
before calling the handler, so the handler always paniced. When using
exception 16 exception handling, SMPng sometimes switches the npx
context away from curproc before calling the handler, so the handler
sometimes paniced. Also, we didn't lock the context while using it,
so we sometimes didn't detect the switch and then paniced in a less
controlled way.
Just lock the context while using it, and return without doing anything
except clearing the busy latch if the context is not for curproc. This
fixes the exception 16 case and makes the IRQ13 case harmless. In both
cases, the instruction that caused the exception is restarted and the
exception repeats. In the exception 16 case, we soon get an exception
that can be handled without doing anything special. In the IRQ13 case,
we get an easy to kill hung process.
This driver supports PCI Xr-based and ISA Xem Digiboard cards.
dgm will go away soon if there are no problems reported. For now,
configuring dgm into your kernel warns that you should be using
digi. This driver is probably close to supporting Xi, Xe and Xeve
cards, but I wouldn't expect them to work properly (hardware
donations welcome).
The digi_* pseudo-drivers are not drivers themselves but contain
the BIOS and FEP/OS binaries for various digiboard cards and are
auto-loaded and auto-unloaded by the digi driver at initialisation
time. They *may* be configured into the kernel, but waste a lot
of space if they are. They're intended to be left as modules.
The digictl program is (mainly) used to re-initialise cards that
have external port modules attached such as the PC/Xem.
drivers.
- change daprevent() to set CAM_RETRY_SELTO and SF_RETRY_UA when it calls
cam_periph_runccb().
- change the pt(4) driver to ignore unit attentions
- change the targ(4) driver to retry selection timeouts
- clean up a few formatting glitches in the targ(4) driver
Reviewed by: gibbs
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
prevent scsi_sense_desc() from deferencing a NULL pointer when a drive
happens to return one of these sense keys.
Reported by: Michael Samuel <michael@miknet.net>
not to mention a compile-time warning about the critical function
becoming unused, by replacing spec_bmap() with vop_stdbmap().
ntfs seems to have the same bug.
The factor for converting specfs block numbers to physical block
numbers is 1, but vop_stdbmap() uses the bogus factor
btodb(ap->a_vp->v_mount->mnt_stat.f_iosize), which is 16 for ffs with
the default block size of 8K. This factor is bogus even for vop_stdbmap()
-- the correct factor is related to the filesystem blocksize which is not
necessarily the same to the optimal i/o size. vop_stdbmap() was apparently
cloned from nfs where these sizes happen to be the same.
There may also be a problem with a_vp->v_mount being null. spec_bmap()
still checks for this, but I think the checks in specfs are dead code
which used to support block devices.
- add a missing break which caused RTP_SET to always return EINVAL
- break instead of returning if p_can fails so proc_lock is always
dropped correctly
- only copyin data that is actually needed
- use break instead of goto
- make rtp_to_pri return EINVAL instead of -1 if the values are out
or range so we don't have to translate
and gid in the ACL, vaccess_acl_posix1e() was changed to accept
explicit file_uid and file_gid as arguments. However, in making the
change, I explicitly checked file_gid against cr->cr_groups[0], rather
than using groupmember, resulting in ACL_GROUP_OBJ entries being
compared to the caller's effective gid only, not the remainder of
its groups. This was recently corrected for the version of the
group call without privilege, but the second test (when privilege is
added) was missed. This change replaces an additiona cr->cr_groups[0]
check with groupmember().
Pointed out by: jedgar
Reviewed by: jedgar
Obtained from: TrustedBSD Project
Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.
This version has a step debugger, which now completely replaces the
old trace feature. Also, we moved all of the FreeBSD-specific MI
code to loader.c, reducing the diff between this and the official
FICL distribution.
The zone allocator's locks should be leaflocks, meaning that they
should never be held when entering into another subsystem, however
the sysctl grabs the zone global mutex and individual zone mutexes
while holding the lock it calls SYSCTL_OUT which recurses into the
VM subsystem in order to wire user memory to do a safe copy. This
can block and cause lock order reversals.
To fix this:
lock zone global.
get a count of the number of zones.
unlock global.
allocate temporary storage.
format and SYSCTL_OUT the banner.
lock global.
traverse list.
make sure we haven't looped more than the initial count taken
to avoid overflowing the allocated buffer.
lock each nodes.
read values and format into buffer.
unlock individual node.
unlock global.
format and SYSCTL_OUT the rest of the data.
free storage.
return.
Other problems included not checking for errors when doing sysctl out
of the column header. Fixed.
Inconsistant termination of the copied string. Fixed.
Objected to by: des (for not using sbuf)
Since the output is not variable length and I'm actually over
allocating signifigantly and I'd like to get this fixed now, I'll
work on the sbuf convertion at a later date. I would not object
to someone else taking it upon themselves to convert it to sbuf.
I hold no MAINTIANER rights to this code (for now).
been made machine independent and various other adjustments have been made
to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter
Looked over by: eivind
It might be more correct to make stathz as close as possible to 128,
but that would involve adding complexity to the clock intr path, which
I don't want to do.
modify the scheduling properties of processes with a different real
uid but the same effective uid (i.e., daemons, et al). (note: these
cases were previously commented out, so this does not change the
compiled code at al)
Obtained from: TrustedBSD Project
The constant I was using was correct, but I mislabeled it as 256K when
it should have been 512K. This doesn't actually change the code, but
it clarifies things somewhat.
Submitted by: Chuck Cranor <chuck@research.att.com>
sf_hdtr is used to provide writev(2) style headers/trailers on the
sent data the return value is actually either the result of writev(2)
from the trailers or headers of no tailers are specified.
Fix sendfile to comply with the documentation, by returning 0 on
success.
Ok'd by: dg
We are way too inconsistent with our setting of the "schg" flag, and in
our default install, it doesn't really offer any additional security.
Reviewed by: arch@
saves 32 registers) to do on every context switch. This is only required
for SMP, so only do it there.
We should also look at moving the critical enter/exit out to the callers
Long ago, bread() set b_blkno to the disk block number as a side effect
of doing physical i/o (or it just retained the setting from when the
i/o was done). The setting is lost when buffers go away and then are
reconsituted from VM. bread() originally compensated by doing a
VOP_BMAP() to recover b_blkno, but this was no good since it sometimes
caused extra i/o or even deadlock for bread()ing metadata to do the
bmap. This was fixed in vfs_bio.c 1.33 (1995/03/03) and ffs_balloc.c
1.5, etc., by removing the VOP_BMAP() from bread() and breadn(), and
changing all (?) places that used b_blkno to set it if necessary.
ext2fs was not imported until later in 1995 and was still depending on
the old behaviour of bread() in at least ext2_balloc(). This caused
filesystem and file corruption by clobbering direct block numbers in
inodes.
by the inactive routine. Because the freeing causes the filesystem
to be modified, the close must be held up during periods when the
filesystem is suspended.
For snapshots to be consistent across crashes, they must write
blocks that they copy and claim those written blocks in their
on-disk block pointers before the old blocks that they referenced
can be allowed to be written.
Close a loophole that allowed unwritten blocks to be skipped when
doing ffs_sync with a request to wait for all I/O activity to be
completed.
to struct mount.
This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.
Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().
At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>
required by POSIX.1e. This maintains the current 'struct acl'
in the kernel while providing the generic external acl_t
interface required to complete the ACL editing library.
o Add the acl_get_entry() function.
o Convert the existing ACL utilities, getfacl and setfacl, to
fully make use of the ACL editing library.
Obtained from: TrustedBSD Project
structure. This field keeps track of how many levels deep we are nested
into the kernel. The nesting level is bumped at the start of a trap,
interrupt, syscall, or exception and is decremented on return. This is
used to detect the case when the kernel is returning back to a kernel
context in exception_return(). If we are returning to the kernel we need
to update the globaldata pointer register saved in the stack frame in case
we have switched CPU's between taking the initial interrupt that saved the
frame and returning. If we don't do this fixup it is possible for a CPU to
use the wrong per-cpu data. On UP systems this is not a problem, so the
code is conditional on SMP.
A count was used instead of simply checking the process status register in
the frame during exception_return() since there are critical sections at
the very start and end of a trap, exception, or interrupt from userland in
which we could trash the t7 register being used in userland. The counter
is incremented after adn before these critical sections respectively so
that we will not overwrite the saved t7 register if we are interrupted
during one of these critical sections.
nam for an unbound socket instead of leaving nam untouched in that case.
This way, the getsockname() output can be used to determine the address
family of such sockets (AF_LOCAL).
Reviewed by: iedowse
Approved by: rwatson
linuxulator so as to allow privileged processes within a jail() to
invoke the Linux initgroups() system call. This allows the Linux
"su" to work properly (better) when running a complete Linux
environment under jail(). This problem was reported by Attila
Nagy <bra@fsn.hu>.
Reviewed by: marcel
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.
Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.
Reviewed by: mckusick
With the recent changes in the CAM error handling, some problems in
the error handling of sa(4) have been uncovered. Basically, a number
of conditions that are not actually errors have been mistreated as
genuine errors. In particular:
. Trying to read in variable length mode with a mismatched blocksize
between the on-tape (virtual) blocks and the read(2) supplied buffer
size, causing an ILI SCSI condition, have caused an attempt to retry
the supposedly `errored' transfer, causing the tape to be read
continuously until it eventually hit EOM. Since by default any
simple mt(1) operation does an initial test read, an `mt stat' was
sufficient to trigger this bug.
Note that it's Justin's opinion that treating a NO SENSE as an EIO
is another bug in CAM. I feel not authorized to fix cam_periph.c
without another confirmation that i'm on the right track, however.
. Hitting a filemark caused the read(2) syscall to return EIO, instead
of returning a `short read'. Note that the current fix only solves
this problem in variable length mode. Fixed length mode uses a
different code path, and since i didn't grok all the intentions behind
that handling, i did not touch it (IOW: it's still broken, and you get
an EIO upon hitting a filemark).
The solution is to keep track of those conditions inside saerror(),
and upon completion to not call cam_periph_error() in that case. We
need to make sure that the device gets unfrozen if needed though (in
case of actual errors, cam_periph_error() does this on our behalf).
Not objected by: mjacob (who currently doesn't have the time to
review the patch)
semantics don't: in practice, both policy and semantics permit
loop-back debugging operations, only it's just a subset of debugging
operations (i.e., a proc can open its own /dev/mem), and that's at a
higher layer.
This is will be required to prevent lowering the ipl when a critical_enter()
is present in the interrupt path when handling a machine check.
reviewed by: jhb
only aiod does this and is also marked P_SYSTEM, the locations that
reference p->p_vmspace usually do it within the context of the caller,
the async access from the vm system is protected by the fact that it
will skip over P_SYSTEM processes.
Ok'd by: jhb