The zone allocator's locks should be leaflocks, meaning that they
should never be held when entering into another subsystem, however
the sysctl grabs the zone global mutex and individual zone mutexes
while holding the lock it calls SYSCTL_OUT which recurses into the
VM subsystem in order to wire user memory to do a safe copy. This
can block and cause lock order reversals.
To fix this:
lock zone global.
get a count of the number of zones.
unlock global.
allocate temporary storage.
format and SYSCTL_OUT the banner.
lock global.
traverse list.
make sure we haven't looped more than the initial count taken
to avoid overflowing the allocated buffer.
lock each nodes.
read values and format into buffer.
unlock individual node.
unlock global.
format and SYSCTL_OUT the rest of the data.
free storage.
return.
Other problems included not checking for errors when doing sysctl out
of the column header. Fixed.
Inconsistant termination of the copied string. Fixed.
Objected to by: des (for not using sbuf)
Since the output is not variable length and I'm actually over
allocating signifigantly and I'd like to get this fixed now, I'll
work on the sbuf convertion at a later date. I would not object
to someone else taking it upon themselves to convert it to sbuf.
I hold no MAINTIANER rights to this code (for now).
been made machine independent and various other adjustments have been made
to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter
Looked over by: eivind
It might be more correct to make stathz as close as possible to 128,
but that would involve adding complexity to the clock intr path, which
I don't want to do.
modify the scheduling properties of processes with a different real
uid but the same effective uid (i.e., daemons, et al). (note: these
cases were previously commented out, so this does not change the
compiled code at al)
Obtained from: TrustedBSD Project
The constant I was using was correct, but I mislabeled it as 256K when
it should have been 512K. This doesn't actually change the code, but
it clarifies things somewhat.
Submitted by: Chuck Cranor <chuck@research.att.com>
sf_hdtr is used to provide writev(2) style headers/trailers on the
sent data the return value is actually either the result of writev(2)
from the trailers or headers of no tailers are specified.
Fix sendfile to comply with the documentation, by returning 0 on
success.
Ok'd by: dg
We are way too inconsistent with our setting of the "schg" flag, and in
our default install, it doesn't really offer any additional security.
Reviewed by: arch@
saves 32 registers) to do on every context switch. This is only required
for SMP, so only do it there.
We should also look at moving the critical enter/exit out to the callers
Long ago, bread() set b_blkno to the disk block number as a side effect
of doing physical i/o (or it just retained the setting from when the
i/o was done). The setting is lost when buffers go away and then are
reconsituted from VM. bread() originally compensated by doing a
VOP_BMAP() to recover b_blkno, but this was no good since it sometimes
caused extra i/o or even deadlock for bread()ing metadata to do the
bmap. This was fixed in vfs_bio.c 1.33 (1995/03/03) and ffs_balloc.c
1.5, etc., by removing the VOP_BMAP() from bread() and breadn(), and
changing all (?) places that used b_blkno to set it if necessary.
ext2fs was not imported until later in 1995 and was still depending on
the old behaviour of bread() in at least ext2_balloc(). This caused
filesystem and file corruption by clobbering direct block numbers in
inodes.
by the inactive routine. Because the freeing causes the filesystem
to be modified, the close must be held up during periods when the
filesystem is suspended.
For snapshots to be consistent across crashes, they must write
blocks that they copy and claim those written blocks in their
on-disk block pointers before the old blocks that they referenced
can be allowed to be written.
Close a loophole that allowed unwritten blocks to be skipped when
doing ffs_sync with a request to wait for all I/O activity to be
completed.
to struct mount.
This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.
Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().
At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>
required by POSIX.1e. This maintains the current 'struct acl'
in the kernel while providing the generic external acl_t
interface required to complete the ACL editing library.
o Add the acl_get_entry() function.
o Convert the existing ACL utilities, getfacl and setfacl, to
fully make use of the ACL editing library.
Obtained from: TrustedBSD Project
structure. This field keeps track of how many levels deep we are nested
into the kernel. The nesting level is bumped at the start of a trap,
interrupt, syscall, or exception and is decremented on return. This is
used to detect the case when the kernel is returning back to a kernel
context in exception_return(). If we are returning to the kernel we need
to update the globaldata pointer register saved in the stack frame in case
we have switched CPU's between taking the initial interrupt that saved the
frame and returning. If we don't do this fixup it is possible for a CPU to
use the wrong per-cpu data. On UP systems this is not a problem, so the
code is conditional on SMP.
A count was used instead of simply checking the process status register in
the frame during exception_return() since there are critical sections at
the very start and end of a trap, exception, or interrupt from userland in
which we could trash the t7 register being used in userland. The counter
is incremented after adn before these critical sections respectively so
that we will not overwrite the saved t7 register if we are interrupted
during one of these critical sections.
nam for an unbound socket instead of leaving nam untouched in that case.
This way, the getsockname() output can be used to determine the address
family of such sockets (AF_LOCAL).
Reviewed by: iedowse
Approved by: rwatson
linuxulator so as to allow privileged processes within a jail() to
invoke the Linux initgroups() system call. This allows the Linux
"su" to work properly (better) when running a complete Linux
environment under jail(). This problem was reported by Attila
Nagy <bra@fsn.hu>.
Reviewed by: marcel
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.
Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.
Reviewed by: mckusick
With the recent changes in the CAM error handling, some problems in
the error handling of sa(4) have been uncovered. Basically, a number
of conditions that are not actually errors have been mistreated as
genuine errors. In particular:
. Trying to read in variable length mode with a mismatched blocksize
between the on-tape (virtual) blocks and the read(2) supplied buffer
size, causing an ILI SCSI condition, have caused an attempt to retry
the supposedly `errored' transfer, causing the tape to be read
continuously until it eventually hit EOM. Since by default any
simple mt(1) operation does an initial test read, an `mt stat' was
sufficient to trigger this bug.
Note that it's Justin's opinion that treating a NO SENSE as an EIO
is another bug in CAM. I feel not authorized to fix cam_periph.c
without another confirmation that i'm on the right track, however.
. Hitting a filemark caused the read(2) syscall to return EIO, instead
of returning a `short read'. Note that the current fix only solves
this problem in variable length mode. Fixed length mode uses a
different code path, and since i didn't grok all the intentions behind
that handling, i did not touch it (IOW: it's still broken, and you get
an EIO upon hitting a filemark).
The solution is to keep track of those conditions inside saerror(),
and upon completion to not call cam_periph_error() in that case. We
need to make sure that the device gets unfrozen if needed though (in
case of actual errors, cam_periph_error() does this on our behalf).
Not objected by: mjacob (who currently doesn't have the time to
review the patch)
semantics don't: in practice, both policy and semantics permit
loop-back debugging operations, only it's just a subset of debugging
operations (i.e., a proc can open its own /dev/mem), and that's at a
higher layer.
This is will be required to prevent lowering the ipl when a critical_enter()
is present in the interrupt path when handling a machine check.
reviewed by: jhb
only aiod does this and is also marked P_SYSTEM, the locations that
reference p->p_vmspace usually do it within the context of the caller,
the async access from the vm system is protected by the fact that it
will skip over P_SYSTEM processes.
Ok'd by: jhb
means that the pcic98 functionality might now work (I've tested it on
my pcic machine, but not the pcic98). Since these functions are
rarely called, it is unlikely that this will have a measurable impact
on performance.
FreeBSD. This code doesn't work just yet, but does compile. We need
to start indirecting via the cinfo pointers, rather than directly
calling pcic_*. There may be other issues as well, but you gotta
start somewhere.
Obtained from: PAO3
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.
Pointed out by: bde
Also move the insertion of the request to after the request is validated,
there's still looks like there may be some problems if an invalid address
is passed to the aio routines, basically a possible leak or having a
not completely initialized structure on the queue may still be possible.
A new sig macro was made _SIG_VALID to check the validity of a signal,
it would be advisable to use it from now on (in kern/kern_sig.c) rather
than rolling your own.
PR: kern/17152
Protect pager object list manipulation with a mutex.
It doesn't look possible to combine them under a single sx lock because
creation may block and we can't have the object list manipulation block
on anything other than a mutex because of interrupt requests.
available.
Only directory vnodes holding no child directory vnodes held in
v_cache_src are recycled, so that directory vnodes near the root of
the filesystem hierarchy remain in namecache and directory vnodes are
not reclaimed in cascade.
The period of vnode reclaiming attempt and the number of vnodes
attempted to reclaim can be tuned via sysctl(2).
Suggested by: tegge
Approved by: phk
Add TI4451 as well.
These are untested since I don't have the hardware to test against.
Also, some O2Micro devices are #define w/o numbers as place holders so that
I can encourage people to submit them when they appear in the channels.