saves 32 registers) to do on every context switch. This is only required
for SMP, so only do it there.
We should also look at moving the critical enter/exit out to the callers
Long ago, bread() set b_blkno to the disk block number as a side effect
of doing physical i/o (or it just retained the setting from when the
i/o was done). The setting is lost when buffers go away and then are
reconsituted from VM. bread() originally compensated by doing a
VOP_BMAP() to recover b_blkno, but this was no good since it sometimes
caused extra i/o or even deadlock for bread()ing metadata to do the
bmap. This was fixed in vfs_bio.c 1.33 (1995/03/03) and ffs_balloc.c
1.5, etc., by removing the VOP_BMAP() from bread() and breadn(), and
changing all (?) places that used b_blkno to set it if necessary.
ext2fs was not imported until later in 1995 and was still depending on
the old behaviour of bread() in at least ext2_balloc(). This caused
filesystem and file corruption by clobbering direct block numbers in
inodes.
by the inactive routine. Because the freeing causes the filesystem
to be modified, the close must be held up during periods when the
filesystem is suspended.
For snapshots to be consistent across crashes, they must write
blocks that they copy and claim those written blocks in their
on-disk block pointers before the old blocks that they referenced
can be allowed to be written.
Close a loophole that allowed unwritten blocks to be skipped when
doing ffs_sync with a request to wait for all I/O activity to be
completed.
to struct mount.
This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.
Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().
At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>
required by POSIX.1e. This maintains the current 'struct acl'
in the kernel while providing the generic external acl_t
interface required to complete the ACL editing library.
o Add the acl_get_entry() function.
o Convert the existing ACL utilities, getfacl and setfacl, to
fully make use of the ACL editing library.
Obtained from: TrustedBSD Project
structure. This field keeps track of how many levels deep we are nested
into the kernel. The nesting level is bumped at the start of a trap,
interrupt, syscall, or exception and is decremented on return. This is
used to detect the case when the kernel is returning back to a kernel
context in exception_return(). If we are returning to the kernel we need
to update the globaldata pointer register saved in the stack frame in case
we have switched CPU's between taking the initial interrupt that saved the
frame and returning. If we don't do this fixup it is possible for a CPU to
use the wrong per-cpu data. On UP systems this is not a problem, so the
code is conditional on SMP.
A count was used instead of simply checking the process status register in
the frame during exception_return() since there are critical sections at
the very start and end of a trap, exception, or interrupt from userland in
which we could trash the t7 register being used in userland. The counter
is incremented after adn before these critical sections respectively so
that we will not overwrite the saved t7 register if we are interrupted
during one of these critical sections.
nam for an unbound socket instead of leaving nam untouched in that case.
This way, the getsockname() output can be used to determine the address
family of such sockets (AF_LOCAL).
Reviewed by: iedowse
Approved by: rwatson
linuxulator so as to allow privileged processes within a jail() to
invoke the Linux initgroups() system call. This allows the Linux
"su" to work properly (better) when running a complete Linux
environment under jail(). This problem was reported by Attila
Nagy <bra@fsn.hu>.
Reviewed by: marcel
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.
Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.
Reviewed by: mckusick
With the recent changes in the CAM error handling, some problems in
the error handling of sa(4) have been uncovered. Basically, a number
of conditions that are not actually errors have been mistreated as
genuine errors. In particular:
. Trying to read in variable length mode with a mismatched blocksize
between the on-tape (virtual) blocks and the read(2) supplied buffer
size, causing an ILI SCSI condition, have caused an attempt to retry
the supposedly `errored' transfer, causing the tape to be read
continuously until it eventually hit EOM. Since by default any
simple mt(1) operation does an initial test read, an `mt stat' was
sufficient to trigger this bug.
Note that it's Justin's opinion that treating a NO SENSE as an EIO
is another bug in CAM. I feel not authorized to fix cam_periph.c
without another confirmation that i'm on the right track, however.
. Hitting a filemark caused the read(2) syscall to return EIO, instead
of returning a `short read'. Note that the current fix only solves
this problem in variable length mode. Fixed length mode uses a
different code path, and since i didn't grok all the intentions behind
that handling, i did not touch it (IOW: it's still broken, and you get
an EIO upon hitting a filemark).
The solution is to keep track of those conditions inside saerror(),
and upon completion to not call cam_periph_error() in that case. We
need to make sure that the device gets unfrozen if needed though (in
case of actual errors, cam_periph_error() does this on our behalf).
Not objected by: mjacob (who currently doesn't have the time to
review the patch)
semantics don't: in practice, both policy and semantics permit
loop-back debugging operations, only it's just a subset of debugging
operations (i.e., a proc can open its own /dev/mem), and that's at a
higher layer.
This is will be required to prevent lowering the ipl when a critical_enter()
is present in the interrupt path when handling a machine check.
reviewed by: jhb
only aiod does this and is also marked P_SYSTEM, the locations that
reference p->p_vmspace usually do it within the context of the caller,
the async access from the vm system is protected by the fact that it
will skip over P_SYSTEM processes.
Ok'd by: jhb
means that the pcic98 functionality might now work (I've tested it on
my pcic machine, but not the pcic98). Since these functions are
rarely called, it is unlikely that this will have a measurable impact
on performance.
FreeBSD. This code doesn't work just yet, but does compile. We need
to start indirecting via the cinfo pointers, rather than directly
calling pcic_*. There may be other issues as well, but you gotta
start somewhere.
Obtained from: PAO3
we also reserve _adequate_ space for the mb_map submap; i.e. we need
space for nmbclusters, nmbufs, _and_ nmbcnt. Furthermore, we need to
rounddown, and not roundup, so that we are consistent.
Pointed out by: bde
Also move the insertion of the request to after the request is validated,
there's still looks like there may be some problems if an invalid address
is passed to the aio routines, basically a possible leak or having a
not completely initialized structure on the queue may still be possible.
A new sig macro was made _SIG_VALID to check the validity of a signal,
it would be advisable to use it from now on (in kern/kern_sig.c) rather
than rolling your own.
PR: kern/17152
Protect pager object list manipulation with a mutex.
It doesn't look possible to combine them under a single sx lock because
creation may block and we can't have the object list manipulation block
on anything other than a mutex because of interrupt requests.
available.
Only directory vnodes holding no child directory vnodes held in
v_cache_src are recycled, so that directory vnodes near the root of
the filesystem hierarchy remain in namecache and directory vnodes are
not reclaimed in cascade.
The period of vnode reclaiming attempt and the number of vnodes
attempted to reclaim can be tuned via sysctl(2).
Suggested by: tegge
Approved by: phk
Add TI4451 as well.
These are untested since I don't have the hardware to test against.
Also, some O2Micro devices are #define w/o numbers as place holders so that
I can encourage people to submit them when they appear in the channels.
parts. This is based on the newcard code that turns it off :-). We
can now reboot after NEWCARD or Windows and have OLDCARD work. Add
support for the RL5C466 while I'm at it.
Treat TI1031 the same as the CLPD6832. It doesn't work yet, but sucks
less than it did before.
Also add a few #defines for other changes in the pipe.
and __i386__ are defined rather than if SMP and BETTER_CLOCK are defined.
The removal of BETTER_CLOCK would have broken this except that kern_clock.c
doesn't include <machine/smptests.h>, so it doesn't see the definition of
BETTER_CLOCK, and forward_*clock aren't called, even on 4.x. This seems to
fix the problem where a n-way SMP system would see 100 * n clk interrupts
and 128 * n rtc interrupts.
and AS4100s into single user mode. This work was done jointly by jhb and
myself, and builds on dfr's earlier work.
smp_init_secondary() / smp_start_secondary()
- use the uniq val to pass the globalp (me)
- fancy footwork to take any pending machine checks (me)
- doing things the FreeBSD way and getting the per-cpu idleproc created
correctly, and synchronizing the startup of secondaries (jhb)
mp_start()
- better recognition of available cpus (jhb)
smp_rendezvous()
- if smp hasn't started, only run the rendezvous function on the current
cpu. Sleuthing and (prior) incorrect fix by me, correct fix by jhb
smp_handle_ipi()
- more verbose handling of console messages (jhb)
- grab sched lock around setting PS_ASTPENDING (jhb)
forward_*clock()
- commented out. Joint decision by dfr, jhb and myself
General synchronization improvements (more mb()s, etc) (jhb)
Printf cleanups (joint)
Whitespace cleanups (jhb)
- don't do the stack overflow sanity check on MP systems -- p->p_addr
will be malloc'ed memory (not K0SEG) and the check will fail.
- don't ignore clock interrupts on secondaries. Alphas apparently
roundrobin clock interrupts to all cpus, so we're going to take clock
interrupts on all CPUS and not forward them.
- use the unique value to save the per-cpu globalp struct like the
comment says
- don't lower the ipl to ALPHA_PSL_IPL_HIGH: we may have a pending machine
check to take and we're not prepared for that yet, as we haven't setup
our interrupt entry points. (this may only happen on sable/lynx)
- indicate the fact that the working version of smp_init_secondary() doesn't
return (this is tied up in other changes and hasn't yet been committed).
VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.
This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.
The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine. For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.
sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'.
Add debugging option to disable background writes of cylinder
groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'.
These debugging options should be tried on systems that are panicing
with corrupted cylinder group maps to see if it makes the problem
go away. The set of panics in question are:
ffs_clusteralloc: map mismatch
ffs_nodealloccg: map corrupted
ffs_nodealloccg: block not in map
ffs_alloccg: map corrupted
ffs_alloccg: block not in map
ffs_alloccgblk: cyl groups corrupted
ffs_alloccgblk: can't find blk in cyl
ffs_checkblk: partially free fragment
The following panics are less likely to be related to this problem,
but might be helped by these debugging options:
ffs_valloc: dup alloc
ffs_blkfree: freeing free block
ffs_blkfree: freeing free frag
ffs_vfree: freeing free inode
If you try these options, please report whether they helped reduce your
bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com>
and to Matt Dillon <dillon@earth.backplane.com>.
ACL_USER_OBJ and ACL_GROUP_OBJ fields, believing that modification of the
access ACL could be used by privileged processes to change file/directory
ownership. In fact, this is incorrect; ACL_*_OBJ (+ ACL_MASK and
ACL_OTHER) should have undefined ae_id fields; this commit attempts
to correct that misunderstanding.
o Modify arguments to vaccess_acl_posix1e() to accept the uid and gid
associated with the vnode, as those can no longer be extracted from
the ACL passed as an argument. Perform all comparisons against
the passed arguments. This actually has the effect of simplifying
a number of components of this call, as well as reducing the indent
level, but now seperates handling of ACL_GROUP_OBJ from ACL_GROUP.
o Modify acl_posix1e_check() to return EINVAL if the ae_id field of
any of the ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} entries is a value
other than ACL_UNDEFINED_ID. As a temporary work-around to allow
clean upgrades, set the ae_id field to ACL_UNDEFINED_ID before
each check so that this cannot cause a failure in the short term
(this work-around will be removed when the userland libraries and
utilities are updated to take this change into account).
o Modify ufs_sync_acl_from_inode() so that it forces
ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} ae_id fields to ACL_UNDEFINED_ID
when synchronizing the ACL from the inode.
o Modify ufs_sync_inode_from_acl to not propagate uid and gid
information to the inode from the ACL during ACL update. Also
modify the masking of permission bits that may be set from
ALLPERMS to (S_IRWXU|S_IRWXG|S_IRWXO), as ACLs currently do not
carry none-ACCESSPERMS (S_ISUID, S_ISGID, S_ISTXT).
o Modify ufs_getacl() so that when it emulates an access ACL from
the inode, it initializes the ae_id fields to ACL_UNDEFINED_ID.
o Clean up ufs_setacl() substantially since it is no longer possible
to perform chown/chgrp operations using vop_setacl(), so all the
access control for that can be eliminated.
o Modify ufs_access() so that it passes owner uid and gid information
into vaccess_acl_posix1e().
Pointed out by: jedger
Obtained from: TrustedBSD Project
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.
can happen if witness runs out of resources during initialization or if
witness_skipspin is enabled.
Sleuthing by: Peter Jeremy <peter.jeremy@alcatel.com.au>
"inside" of locked regions. That is, an acquire atomic operation will
always enforce a memory barrier after the atomic operation and a release
operation will always enforce a memory barrier before the atomic
operation.
- Explicitly use 'mb' instead of 'wmb' in release atomic operations. The
'wmb' memory barrier is not strong enough to guarantee coherence with
other processors. This is effectively a nop since alpha_wmb() actually
performs a 'mb' and not a 'wmb', but I wanted the code to be more
correct since at some point in the future alpha_wmb()'s implementation
may switch to being a real 'wmb'.
we should call ast(). This allows us to branch to a separate Lkernelret
label so we can fixup the saved t7 register in the trapframe. Otherwise
we can run into a problem on SMP systems where a process is interrupted by
a trap or interrupt on one CPU, migrates to another CPU, and then returns
with the t7 in the stack clobbering the CPU's t7. As a result, two CPU's
would both point to the same per-CPU data and things would go downhill from
there.
Sleuthing help by: gallatin
- Add a new ddb command: 'show pcpu' similar to the i386 command added
recently. By default it displays the current CPU's info, but an optional
argument can specify the logical ID of a specific CPU to examine.
bcopy would go off the end of the array by two elements, which sometimes
causes a panic if it happens to cross into a page that isn't mapped.
Submitted by: gibbs
Reviewed by: peter
are some good reasons for not doing this, even if the linting of
the code breaks.
1) If lint were ever to understand the stuff inside the macros,
that would break the checks.
2) There are ways to use __GNUC__ to exclude overly specific
code.
3) (Not yet practical) Lint(1) needs to properlyu understand
all of te code we actually run.
Complained about by: bde
Education by: jake, jhb, eivind
It is described in ufs/ffs/fs.h as follows:
/*
* Filesystem flags.
*
* Note that the FS_NEEDSFSCK flag is set and cleared only by the
* fsck utility. It is set when background fsck finds an unexpected
* inconsistency which requires a traditional foreground fsck to be
* run. Such inconsistencies should only be found after an uncorrectable
* disk error. A foreground fsck will clear the FS_NEEDSFSCK flag when
* it has successfully cleaned up the filesystem. The kernel uses this
* flag to enforce that inconsistent filesystems be mounted read-only.
*/
#define FS_UNCLEAN 0x01 /* filesystem not clean at mount */
#define FS_DOSOFTDEP 0x02 /* filesystem using soft dependencies */
#define FS_NEEDSFSCK 0x04 /* filesystem needs sync fsck before mount */
and non-P_SUGID cases, simplify p_cansignal() logic so that the
P_SUGID masking of possible signals is independent from uid checks,
removing redundant code and generally improving readability.
Reviewed by: tmm
Obtained from: TrustedBSD Project
the ability of unprivileged processes to deliver arbitrary signals
to daemons temporarily taking on unprivileged effective credentials
when P_SUGID is not set on the target process:
Removed:
(p1->p_cred->cr_ruid != ps->p_cred->cr_uid)
(p1->p_ucred->cr_uid != ps->p_cred->cr_uid)
o Replace two "allow this" exceptions in p_cansignal() restricting
the ability of unprivileged processes to deliver arbitrary signals
to daemons temporarily taking on unprivileged effective credentials
when P_SUGID is set on the target process:
Replaced:
(p1->p_cred->p_ruid != p2->p_ucred->cr_uid)
(p1->p_cred->cr_uid != p2->p_ucred->cr_uid)
With:
(p1->p_cred->p_ruid != p2->p_ucred->p_svuid)
(p1->p_ucred->cr_uid != p2->p_ucred->p_svuid)
o These changes have the effect of making the uid-based handling of
both P_SUGID and non-P_SUGID signal delivery consistent, following
these four general cases:
p1's ruid equals p2's ruid
p1's euid equals p2's ruid
p1's ruid equals p2's svuid
p1's euid equals p2's svuid
The P_SUGID and non-P_SUGID cases can now be largely collapsed,
and I'll commit this in a few days if no immediate problems are
encountered with this set of changes.
o These changes remove a number of warning cases identified by the
proc_to_proc inter-process authorization regression test.
o As these are new restrictions, we'll have to watch out carefully for
possible side effects on running code: they seem reasonable to me,
but it's possible this change might have to be backed out if problems
are experienced.
Submitted by: src/tools/regression/security/proc_to_proc/testuid
Reviewed by: tmm
Obtained from: TrustedBSD Project
ability of unprivileged processes to modify the scheduling properties
of daemons temporarily taking on unprivileged effective credentials.
These cases (p1->p_cred->p_ruid == p2->p_ucred->cr_uid) and
(p1->p_ucred->cr_uid == p2->p_ucred->cr_uid), respectively permitting
a subject process to influence the scheduling of a daemon if the subject
process has the same real uid or effective uid as the daemon's effective
uid. This removes a number of the warning cases identified by the
proc_to_proc iner-process authorization regression test.
o As these are new restrictions, we'll have to watch out carefully for
possible side effects on running code: they seem reasonable to me,
but it's possible this change might have to be backed out if problems
are experienced.
Reported by: src/tools/regression/security/proc_to_proc/testuid
Obtained from: TrustedBSD Project
by p_can(...P_CAN_SEE), rather than returning EACCES directly. This
brings the error code used here into line with similar arrangements
elsewhere, and prevents the leakage of pid usage information.
Reviewed by: jlemon
Obtained from: TrustedBSD Project
p_can(...P_CAN_SEE...) to getpgid(), getsid(), and setpgid(),
blocking these operations on processes that should not be visible
by the requesting process. Required to reduce information leakage
in MAC environments.
Obtained from: TrustedBSD Project
from signal authorization checking.
o p_cansignal() takes three arguments: subject process, object process,
and signal number, unlike p_cankill(), which only took into account
the processes and not the signal number, improving the abstraction
such that CANSIGNAL() from kern_sig.c can now also be eliminated;
previously CANSIGNAL() special-cased the handling of SIGCONT based
on process session. privused is now deprecated.
o The new p_cansignal() further limits the set of signals that may
be delivered to processes with P_SUGID set, and restructures the
access control check to allow it to be extended more easily.
o These changes take into account work done by the OpenBSD Project,
as well as by Robert Watson and Thomas Moestl on the TrustedBSD
Project.
Obtained from: TrustedBSD Project
toggle the P_SUGID bit explicitly, rather than relying on it being
set implicitly by other protection and credential logic. This feature
is introduced to support inter-process authorization regression testing
by simplifying userland credential management allowing the easy
isolation and reproduction of authorization events with specific
security contexts. This feature is enabled only by "options REGRESSION"
and is not intended to be used by applications. While the feature is
not known to introduce security vulnerabilities, it does allow
processes to enter previously inaccessible parts of the credential
state machine, and is therefore disabled by default. It may not
constitute a risk, and therefore in the future pending further analysis
(and appropriate need) may become a published interface.
Obtained from: TrustedBSD Project
interfaces and functionality intended for use during correctness and
regression testing. Features enabled by "options REGRESSION" may
in and of themselves introduce security or correctness problems if
used improperly, and so are not intended for use in production
systems, only in testing environments.
Obtained from: TrustedBSD Project
enable easy access to the hash chain stats. The raw prefixed versions
dump an integer array to userland with the chain lengths. This cheats
and calls it an array of 'struct int' rather than 'int' or sysctl -a
faithfully dumps out the 128K array on an average machine. The non-raw
versions return 4 integers: count, number of chains used, maximum chain
length, and percentage utilization (fixed point, multiplied by 100).
The raw forms are more useful for analyzing the hash distribution, while
the other form can be read easily by humans and stats loggers.
API for IPI's that isn't tied to the Intel APIC. MD code can still use
the apic_ipi() function or dink with the apic directly if needed to send
MD IPI's.
because:
- it used a better namespace (smp_ipi_* rather than *_ipi),
- it used better constant names for the IPI's (IPI_* rather than
X*_OFFSET), and
- this API also somewhat exists for both alpha and ia64 already.
codecs. Also, add some additional code to check for future cards without
this feature - attempting to initialise them as AC97 cards will hang the
machine.
PR: 26427
Reviewed by: cg
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.
------
One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.
First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:
1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35
2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50
You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html
Test Results
tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66
"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.
------
Algorithm description
The old dirpref algorithm is described in comments:
/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/
A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.
What I mean by a big file system ?
1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.
The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.
My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/
My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.
My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.
The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:
int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */
These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.
I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.
Obtained from: Grigoriy Orlov <gluk@ptci.ru>
s/1518/ETHER_MAX_LEN
Some style changes, add some braces, mostly residual from having
a lot of debug hooks added while working on this driver.
Bring in a plethora of changes from NetBSD:
revision 1.58
date: 2001/03/08 11:07:08; author: ichiro; state: Exp; lines: +17 -1
it wait until busy flag disappears.
it was able to prevent some cards with late initializing faling in wi_reset().
revision 1.41
date: 2000/10/13 19:15:08; author: jonathan; state: Exp; lines: +4 -2
Fix wi_intr() to avoid touching card registers during insert/remove events,
when sharing an interrupt with other devices:
check sc->sc_enabled, and drop the interrupt if its' off.
revision 1.30
date: 2000/08/18 04:11:48; author: jhawk; state: Exp; lines: +4 -4
Copy wi_{dst,src}_addr from struct wi_frame into faked-up ether_header
instead of addr1 and addr2. THis means that tcpdump -e will show the
correct MAC address for communications with access points instead of showing
the BSSID.
In the future there should be 802.11 support for bpf/libpcap/tcpdump,
but that is aways down the road.
count drops to 0 in witness_destroy, set the w_name and w_file pointers
to point to the string "(dead)" and the w_line field to 0. This way,
if a mutex of a given name is used only in a module, then as long as
all mutexes in the module are destroyed when the module is unloaded,
witness will not maintain stale references to the mutex's name in the
module's data section causing a panic later on when the w_name or w_file
field's are examined.
1. Pick up MII/PHY support for Livengood copper part (10/100/1000) from
Parag Patel. It was a fairly complete but not quite platform independent
job.
2. Finish silly offset differences that LIVENGOOD vs. WISEMAN registers
have (so the !)$*!)$*!$ fiber LIVENGOOD now works too).
3. Ansify the source.
So- we now suppor tthe PRO1000F and PRO1000T adapters.
1. The offsets for some registers change in LIVENGOOD. Gratuitously.
2. Define LIVENGOOD and LIVENGOOD_CU part numbers. Add some more
specific LIVENGOOD defaults.
3. Add definitions for PHY support for the copper LIVENGOOD part
(10/100/1000).
Since pid's are not in the kernel address space, this doesn't conflict
with the funcionality of specifying an arbitrary frame pointer to the
trace command.
- If the first function of a backtrace maps to fork_trampoline, then this
is a newly fork'd process that has not been executed yet, so just print
out the first frame and then return for that case.
- Lower the default count from 65535 to 1024. ddb doesn't trace into
userland, and if the stack gets hosed and starts looping it's less
annoying.
Parag Patel did all of the grunt work, so he gets the credit.
Register definitions and actions inferred from a Linux driver,
so Intel also gets some 'credit'.
the main benefit this gives for now is that via686 audio devices on
motherboards with ac97 codecs that do not support vra will be able to use
sample rates other than 48khz.
Add simple "xlat" converter which performs 8to8 table based conversion.
Unicode converter will be added in the near future.
Reviewed by: silence on arch@
Files placement reviewed by: bde
Obtained from: smbfs
o Change the number of init tries from 5 to a #define.
o Allow up to 5s rather than 2s for commands to complete. This
is still much less than 51 minutes, but makes my intel card init
with more reliability than before.
possible for some systems where the device is there, but the BIOS
hasn't allocated memory resources for it), we don't panic.
Submitted by: Gerard Roudier
peer out from sppp_lcp_open() to sppp_lcp_up(). For one, this makes
things look more symmetrical to sppp_lcp_close(), and somehow it also
just occurred to me that an Up event following the open caused the
value of the authentication option to be clobbered.
badaddr_read(). This fixes 'machine check in pal mode' halts on
ev5 2100As.
MFC candidate -- after spending 6 hours tracking this down, I checked and
discovered that it has been in NetBSD for over a year, so it should be safe
for MFC into 4.3-RELEASE
It's not finished yet (I still have to find a way to implement process-
dependent nodes without consuming too much memory, and the permission
system needs tightening up), but it's becoming hard to work on without
a repo (I've accidentally almost nuked it once already), and it works
(except for the lack of process-dependent nodes, that is).
I was supposed to commit this a week ago, but timed out waiting for jkh
to reply to some questions I had. Pass him a spoonful of bad karma :)
Specifically, the cpuid, curproc, curpcb, npxproc, and idleproc members.
Also, if witness is compiled into the kernel, then a list of all the spin
locks held by this CPU is displayed. By default the information for the
current CPU is displayed, but a decimal cpu id may be specified as a
parameter to obtain information on a specific CPU.
list into a public witness_list_locks() function. Call this function
twice in witness_list() instead of using an evil goto.
- Adjust the 'show locks' command to take an optional parameter which
specifies the pid of a process to list the locks of. By default the
locks held by the current process are displayed.
Don't leak iospace when irq allocation fails. (call wi_free())
Call bus_release_resource() with the correct "rid" obtained from
bus_alloc_resource() that's saved in the softc instead of a hardcoded
0.
VFS operation, make use of the calling process's credential. This
solution may not be ideal (there are a number of other possible
proposals, including making use of the proc0 credential, adding a
credential argument to the VFSOP, and switching from a hard-coded
ucred to a hard-coded nfscred), it is simple and appears to
work. The arguments against using simply crget() are fairly
strong: it is the only place in the code (other than a nearly
identical invocation in ncp) where crget() is invoked, other than
in the process credential creation code; as ucred becomes extensible,
this use of crget() without appropriate context results in less and
less meaningful credential data. The implementation here will
probably be tweaked as a result of experimentation and further
exploration of the requirements. In the mean-time, it allows
progress to be made in ucred expansion for new security models without
causing a crash every time df is used on an NFS mounted file system.
This code has been interop tested against FreeBSD and Solaris NFS
servers. While using the process credentials should not introduce
interop problems, please let me know if any turn out to exist.
Reviewed by: freebsd-arch
Also place the macros under #ifdef _KERNEL. Equally hide the internal
structures such as the freelist structs which include condition variables.
Reviewed by: bde
Mostly suggested by: bde
immediate value or the accumulator. 0 is the chip's internal
representation for the accumulator, and so 0 is an invalid immediate value
when the accumulator can also be specified as an argument.
Submitted by: gibbs
overflow the request queue. The reason we want to do this is that we
now push out completed CTIOs as we complete them- this gets the QLogic
working on them quicker. So we need to know whether we can put the entire
burrito out before we start.
We now support conjoint status with data for the last CTIO for both Fibre
Channel and SCSI. Leave the old code in place in case we need to go back
(minor 3 line ifdef).
Ultra-ultra important- *don't* set rq->req_seg_count for non-data
target mode requests in isp_pci_dmasetup. D'oh- this is actually
the tag value area for a CTIO. What *was* I thinking? Boy howdy
does both aic7xxx and sym get awfully unhappy when on reconnect
you give them a constant '1' for a tag value.
function- we did it a bit cleaner. We only use this if a CTIO completes with
!CT_OK state. We now have managed to get away without having to poke around
and trying to find the original ATIO- the csio we're using has the tag_id
and lun values with it which is mostly what we need when we do the putback.
Make sure we correctly propagate AT_TQAE->CT_TQAE for tags. Make sure
we call ISP_DMAFREE only if we had DATA to move.
tag is active for an ATIO, and say that you want to reconnect with
a tag value in a CTIO have *never* been exercised until now. This lossage
derived from Solaris code where this stuff originally came from that is
about 7 years old. Amazing.
We now bundle the incoming tag (legal values are 0..256) as the low
16 bits of the ccb_accept_tio's at_tagid while we put the firmware
handle for this ATIO in the top 16 bits- define some macros to make
this cleaner.
Complete some Ansification.
Redo establishment of default SCSI parameters whether or not
we've been compiled for target mode. Unfortunately, the Qlogic
f/w is confused so that if we set all targets to be 'safe' (i.e.,
narrow/async), it will also then report narrow, async if we're
contacted in target mode from that target (acting in initiator
role). D'oh!
Fix ISPCTL_TOGGLE_TMODE to correctly enable the right channel for
dual channel cards. Add some more opcodes. Fix a stupid NULL
pointer bug.
* Set the CSRG SCCS ID to the revision this file is actually based on
(the file itself has been updated to Lite2 in rev. 1.4).
* Fix some typos in comments.
* Add a comment to the trailing #endif according to style(9)
Simplify initialization and remove offending DMA channel resets there.
The resets trash whatever is pointed to DMA registers, but at cmi_attach()
time the DMA registers have not been initialized with valid addresses.
Reviewed by: Cameron Grant <gandalf@vilnya.demon.co.uk>
It appears that some of the new PRISM2 cards need it.
Fail the probe if we fail to read the MAC address.
Fix a comment.
Delete the unload printf. The bus system now prints this message.
stylistic.
# Yes, this break K&R, but this file already used so many gcc extensions
# keeping K&R support seemed too anachronistic for me.
Didn't fix the bug where functions that can only be used in the kernel
are exported to userland.
that people use from userland in C++ programs. I've had this in my
tree for ages and just got bit by it not being in the real tree again.
This is a MFC candidate.
The mbuf and mcluster free lists now each "own" a condition variable,
m_starved.
- Clean up minor indentention issues in sys/mbuf.h caused by previous
commit.
to not using IO_SYNC. Expose a sysctl (debug.ufs_extattr_sync) for
enabling the use of IO_SYNC.
- Use of IO_SYNC substantially degrades ACL performance when a
default ACL is set on a directory, as there are four synchronous
writes initiated to define both supporting EAs for new
sub-directories, and to set the data; two for new files. Later, this
may be optimized to two writes for sub-directories, one for new
files.
- IO_SYNC does not substantially improve consistency properties due
to the poor consistency properties of existing permissions (which
ACLs are a superset of), due to interaction with soft updates,
and due to differences in handling consistency for data and file
system meta-data.
- In macro-benchmarks, this reduces the overhead of setting default
ACLs down to the same overhead as enabling ACLs on a file system
and not using them. Enabling ACLs still introduces a small
overhead (I measure 7% on a -j 2 buildworld with pre-allocated
EA backing store, but this is not rigorous testing, nor in any way
optimized).
- The sysctl will probably change to another administration method
(or at least, a better name) in the near future, but consistency
properties of EAs are still being worked out. The toggle is defined
right now to allow easier performance analysis and exploration
of possible guarantees.
Obtained from: TrustedBSD Project
Don't use atomic operations for the stats updating, instead protect
the counts with the mbuf mutex. Most twiddling of the stats was
done right before or after releasing a mutex. By doing this we
reduce the number of locked ops needed as well as allow a sysctl
to gain a consitant view of the entire stats structure.
In the future...
This will allow us to chain common mbuf operations that would
normally need to aquire/release 2 or 3 of the locks to build an
mbuf with a cluster or external data attached into a single op
requiring only one lock.
Simplify the per-cpu locks that are planned.
There's also some if (1) code that should check if the "how"
operation specifies blocking/non-blocking behavior, we _could_ make
it so that we hold onto the mutex through calls into kmem_alloc
when non-blocking requests are made, but for safety reasons we
currently drop and reaquire the mutex around the calls.
Also, note that calling kmem_alloc is rare and only happens during
a shortage so drop/re-getting the mutex will not be a common
occurance.
Remove some #define's that seemed to obfuscate the code to me.
Remove an extranious comment.
Remove an XXX, including mutex.h isn't a crime.
Reviewed by: bmilekic
avoid silly lock contention on sched_lock since in 2 out of the 3 places
that we call stop(), we get sched_lock right after calling it and we were
locking sched_lock inside of stop() anyways.
failures in MOD_LOAD.
Dodge duplicate make_dev() calls by (ab)using dev->si_drv2 to
remember if we created the device node via a dev_clone callback
before the d_open call.
Without this, ifpromisc() always fails (after setting the IFF_PROMISC
bit in ifp->if_flags) and bpf never bothers to turn promiscuous mode off.
PR: 20188
SIGCHLD to our parent process. Otherwise, we could block while obtaining
the process lock for our parent process and switch out while we were
in SSTOP. Even worse, when we try to resume from the mutex being blocked
on our p_stat will be SRUN, not SSTOP.
- Fix a comment above stop() to indicate that it requires that the proc lock
be held, not a proctree lock.
Reported by: markm
Sleuthing by: jake
under heavy use when default ACLs were bgin inherited by new files
or directories. This is done by removing a bug in default ACL
reading, and improving error handling for this failure case:
- Move the setting of the buffer length (len) variable to above the
ACL type (ap->a_type) switch rather than having it only for
ACL_TYPE_ACCESS. Otherwise, the len variable is unitialized in
the ACL_TYPE_DEFAULT case, which generally worked right, but could
result in failure.
- Add a check for a short/long read of the ACL_TYPE_DEFAULT type from
the underlying EA, resulting in EPERM rather than passing a
potentially corrupted ACL back to the caller (resulting "cleaner"
failures if the EA is damaged: right now, the caller will almost
always panic in the presence of a corrupted EA). This code is similar
to code in the ACL_TYPE_ACCESS handling in the previous switch case.
- While I'm fixing this code, remove a redundant bzero() of the ACL
reader buffer; it need only be initialized above the acl_type
switch.
Obtained from: TrustedBSD Project
operations on file descriptors, which complement the existing set of
calls, extattr_{delete,get,set}_file() which act on paths. In doing
so, restructure the system call implementation such that the two sets
of functions share most of the relevant code, rather than duplicating
it. This pushes the vnode locking into the shared code, but keeps
the copying in of some arguments in the system call code. Allowing
access via file descriptors reduces the opportunity for race
conditions when managing extended attributes.
Obtained from: TrustedBSD Project
ps_showallprocs such that if superuser is present to override process
hiding, the search falls through [to success]. When additional
restrictions are placed on process visibility, such as MAC, new clauses
will be placed above the return(0).
Obtained from: TrustedBSD Project
than a NOP. bounds_check_with_label() would return -1 yet NOT set any
of the bio flags to show an error. This meant the caller would not
properly see that bounds_check_with_label() did not do any work. This
prevented newfs(8) from being able to write a file system on any partition
other than `c' on a `ccd'.
The logs of this file do not tell _why_ bounds_check_with_label() was
emasculated. Nor are there any `XXX' comments. So we'll unemasculated
it, and see what breaks.
Submitted by: gallatin
a #defined constant, wrap a few long lines, etc... Also remove stupid
'all your base are belong to us' joke from comment that I don't really
care to see immortalized in the source tree.
- Added 4 speaker enable to initialization sequence.
- Removed delays between register pokes which appear to aggravate a
problem this card has sampling at 44.1kHz. With any form of delay,
skew relative to system clock at 44.1kHz is usually in range 0-25%
(now 0-3%). No other rates exhibit this problem.
- Changed structs cmi_* to sc_*.
Approved by: Cameron Grant <gandalf@vilnya.demon.co.uk>
aic7xxx_pci.c:
Enable board generation of interrupts only once our handler is
in place and all other setup has occurred.
aic7xxx.c:
More conversion of data types to ahc_* names. tmode_tstate and
tmode_lstate are the latest victims.
Clean up the check condition path by branching early rather
than indenting a giant block of code.
Add support for target mode initiated sync negotiation.
The code has been tested by forcing the feature on for
all devices, but for the moment is left inaccesible until
a decent mechanism for controlling the behavior is complete.
Implementing this feature required the removal of the
old "target message request" mechanism. The old method
required setting one of the 16 bit fields to initiate
negotiation with a particular target. This had the nice
effect of being easy to change the request and have it
effect the next command. We now set the MK_MESSAGE bit
on any new command when negotiation is required. When
the negotiation is successful, we walk through and clean
up the bit on any pending commands. Since we have to walk
the commands to reset the SCSI syncrate values so no additional
work is required. The only drawback of this approach is that
the negotiation is deferred until the next command is queued to
the controller. On the plus side, we regain two bytes of
sequencer scratch ram and 6 sequencer instructions.
When cleaning up a target mode instance, never remove the
"master" target mode state object. The master contains
all of the saved SEEPROM settings that control things like
transfer negotiations. This data will be cloned as the
defaults if a target mode instance is re-instantiated.
Correct a bug in ahc_set_width(). We neglected to update
the pending scbs to reflect the new parameters. Since
wide negotiation is almost always followed by sync
negotiation it is doubtful that this had any real
effect.
When in the target role, don't complain about
"Target Initiated" negotiation requests when an initiator
negotiates with us.
Defer enabling board interrupts until after ahc_intr_enable()
is called.
Pull all info that used to be in ahc_timeout for the FreeBSD
OSM into ahc_dump_card_state(). This info should be printed
out on all platforms.
aic7xxx.h:
Add the SCB_AUTO_NEGOITATE scb flag. This allows us to
discern the reason the MK_MESSAGE flag is set in the hscb
control byte. We only want to clear MK_MESSAGE in
ahc_update_pending_scbs() if the MK_MESSAGE was set due
to an auto transfer negotiation.
Add the auto_negotiate bitfield for each tstate so that
behavior can be controlled for each of our enabled SCSI
IDs.
Use a bus interrupt handler vector in our softc rather
than hard coding the PCI interrupt handler. This makes
it easier to build the different bus attachments to
the aic7xxx driver as modules.
aic7xxx.reg:
Remove the TARGET_MSG_REQUEST definition for sequencer ram.
aic7xxx.seq:
Fix a few target mode bugs:
o If MK_MESSAGE is set in an SCB, transition to
message in phase and notify the kernel so that
message delivery can occur. This is currently
only used for target mode initiated transfer
negotiation.
o Allow a continue target I/O to compile without
executing a status phase or disconnecting. If
we have not been granted the disconnect privledge
but this transfer is larger than MAXPHYS, it may
take several CTIOs to get the job done.
Remove the tests of the TARGET_MSG_REQUEST field in scratch ram.
aic7xxx_freebsd.c:
Add support for CTIOs that don't disconnect. We now defer
the clearing of our pending target state until we see a
CTIO for that device that has completed sucessfully.
Be sure to return early if we are in a target only role
and see an initiator only CCB type in our action routine.
If a CTIO has the CAM_DIS_DISCONNECT flag set, propogate
this flag to the SCB. This flag has no effect if we've
been asked to deliver status as well. We will complete
the command and release the bus in that case.
Handle the new auto_negotiate field in the tstate correctly.
Make sure that SCBs for "immediate" (i.e. to continue a non
disconnected transaction) CTIO requests get a proper mapping
in the SCB lookup table. Without this, we'll complain when
the transaction completes.
Update ahc_timeout() to reflect the changes to ahc_dump_card_state().
aic7xxx_inline.h:
Use ahc->bus_intr rather than ahc_pci_intr.
two subject ucreds. Unlike p_cansee(), u_cansee() doesn't have
process lock requirements, only valid ucred reference requirements,
so is prefered as process locking improves. For now, back p_cansee()
into u_cansee(), but eventually p_cansee() will go away.
Reviewed by: jhb, tmm
Obtained from: TrustedBSD Project
locks were held, we could be preempted and switch CPU's in between the time
that we set a variable to the list of spin locks on our CPU and the time
that we checked that variable to ensure no spinlocks were held while
grabbing a sleep lock. Losing the race resulted in checking some other
CPU's spin lock list and bogusly panicing.
* Initialize the "struct sockaddr_dl sdl" correctly in vlan_setmulti().
PR: kern/22181
* The driver used to call malloc(..., M_NOWAIT), but to not check the
return value. Change malloc(..., M_NOWAIT) to malloc(..., M_WAITOK)
because the corresponding part of code is called from the upper
half of the kernel only.
PR: kern/22181
* Make sure a parent interface is up and running before invoking
its if_start() routine in order to avoid system panic.
PR: kern/22179 kern/24741 i386/25478
* Do not copy all the flags from a parent mindlessly.
PR: kern/22179
* Do not call if_down() on a parent interface if it's already down.
Call if_down() at splimp because if_down() needs that.
PR: kern/22179
Reviewed by: wollman
Change code from PRC_UNREACH_ADMIN_PROHIB to PRC_UNREACH_PORT for
ICMP_UNREACH_PROTOCOL and ICMP_UNREACH_PORT
And let TCP treat PRC_UNREACH_PORT like PRC_UNREACH_ADMIN_PROHIB
This should fix the case where port unreachables for udp returned
ENETRESET instead of ECONNREFUSED
Problem found by: Bill Fenner <fenner@research.att.com>
Reviewed by: jlemon
decode the BIOS and firmware version and announce the board as HP NetRaid.
This has been tested with a NetRaid 3si controller, the BIOS/firmware
printout should also work for other NetRaid controllers but the type
detection for other NetRaids (such as the 1si) will not work due to the
lack of hardware.
Reviewed by: msmith
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.
We really want to be able to say "auto NWAY", "limited NWAY", and
"no NWAY". Unfortunately, this does not appear to be possible with
the current mediaopt structure.
and change the u_int mtx_saveintr member of struct mtx to a critical_t
mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules. Change all the
_mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
parameters and rename them to use a prefix of two underscores. Inside
of kern_mutex.c, generate wrapper functions for
_mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
that are called from modules. The macros mtx_{un,}lock_{spin,}_flags()
are mapped to the __mtx_* macros inside of the kernel to inline the
usual case of mutex operations and map to the internal _mtx_* functions
in the module case so that modules will use WITNESS and KTR logging if
the kernel is compiled with support for it.
sections.
- Add implementations of the critical_enter() and critical_exit() functions
and remove restore_intr() and save_intr().
- Remove the somewhat bogus disable_intr() and enable_intr() functions on
the alpha as the alpha actually uses a priority level and not simple bit
flag on the CPU.
- If there is no gdb device, just return without trying to return any
value since gdb_handle_exception() returns void.
- When calling prom_halt(), pass in a value telling it to actually halt
and not to randomly choose whether or not to halt or reboot depending on
whatever value happened to be in a0 when the call was made.
an AST results in a signal being delivered, we'll need to do a full register
restore so as to properly setup the signal handler. This is somewhat of
a pessimization, because ast() will be called twice in this case.
This fixes several problems that have been reported where signal intensive
userland apps (most notably dump) have been SEGV'ing for no fault of their
own.
Thanks to Peter Jeremy <peter.jeremy@alcatel.com.au> for presenting the
AST scenario which led to me fiinally figuring this out.
Reviewed by: jhb
process we're looking for. (I don't think this can currently
happen, but it depends how the function is called).
PR: 25932
Submitted by: David Xu <davidx@viasoft.com.cn>
The fixes the problem of PLAY_BIG not being implemented on
some modern drives.
The problem now is that some old drives use BSD encoding
in the MSF case, which they dont tell, and which is also
not according to spec *sigh*. Hopefully there are not
too many of those still alive, or I hereby grant
license to kill the firmware writers that wrote the mess.
Some of the major changes include:
- The SCSI error handling portion of cam_periph_error() has
been broken out into a number of subfunctions to better
modularize the code that handles the hierarchy of SCSI errors.
As a result, the code is now much easier to read.
- String handling and error printing has been significantly
revamped. We now use sbufs to do string formatting instead
of using printfs (for the kernel) and snprintf/strncat (for
userland) as before.
There is a new catchall error printing routine,
cam_error_print() and its string-based counterpart,
cam_error_string() that allow the kernel and userland
applications to pass in a CCB and have errors printed out
properly, whether or not they're SCSI errors. Among other
things, this helped eliminate a fair amount of duplicate code
in camcontrol.
We now print out more information than before, including
the CAM status and SCSI status and the error recovery action
taken to remedy the problem.
- sbufs are now available in userland, via libsbuf. This
change was necessary since most of the error printing code
is shared between libcam and the kernel.
- A new transfer settings interface is included in this checkin.
This code is #ifdef'ed out, and is primarily intended to aid
discussion with HBA driver authors on the final form the
interface should take. There is example code in the ahc(4)
driver that implements the HBA driver side of the new
interface. The new transfer settings code won't be enabled
until we're ready to switch all HBA drivers over to the new
interface.
src/Makefile.inc1,
lib/Makefile: Add libsbuf. It must be built before libcam,
since libcam uses sbuf routines.
libcam/Makefile: libcam now depends on libsbuf.
libsbuf/Makefile: Add a makefile for libsbuf. This pulls in the
sbuf sources from sys/kern.
bsd.libnames.mk: Add LIBSBUF.
camcontrol/Makefile: Add -lsbuf. Since camcontrol is statically
linked, we can't depend on the dynamic linker
to pull in libsbuf.
camcontrol.c: Use cam_error_print() instead of checking for
CAM_SCSI_STATUS_ERROR on every failed CCB.
sbuf.9: Change the prototypes for sbuf_cat() and
sbuf_cpy() so that the source string is now a
const char *. This is more in line wth the
standard system string functions, and helps
eliminate warnings when dealing with a const
source buffer.
Fix a typo.
cam.c: Add description strings for the various CAM
error status values, as well as routines to
look up those strings.
Add new cam_error_string() and
cam_error_print() routines for userland and
the kernel.
cam.h: Add a new CAM flag, CAM_RETRY_SELTO.
Add enumerated types for the various options
available with cam_error_print() and
cam_error_string().
cam_ccb.h: Add new transfer negotiation structures/types.
Change inq_len in the ccb_getdev structure to
be "reserved". This field has never been
filled in, and will be removed when we next
bump the CAM version.
cam_debug.h: Fix typo.
cam_periph.c: Modularize cam_periph_error(). The SCSI error
handling part of cam_periph_error() is now
in camperiphscsistatuserror() and
camperiphscsisenseerror().
In cam_periph_lock(), increase the reference
count on the periph while we wait for our lock
attempt to succeed so that the periph won't go
away while we're sleeping.
cam_xpt.c: Add new transfer negotiation code. (ifdefed
out)
Add a new function, xpt_path_string(). This
is a string/sbuf analog to xpt_print_path().
scsi_all.c: Revamp string handing and error printing code.
We now use sbufs for much of the string
formatting code. More of that code is shared
between userland the kernel.
scsi_all.h: Get rid of SS_TURSTART, it wasn't terribly
useful in the first place.
Add a new error action, SS_REQSENSE. (Send a
request sense and then retry the command.)
This is useful when the controller hasn't
performed autosense for some reason.
Change the default actions around a bit.
scsi_cd.c,
scsi_da.c,
scsi_pt.c,
scsi_ses.c: SF_RETRY_SELTO -> CAM_RETRY_SELTO. Selection
timeouts shouldn't be covered by a sense flag.
scsi_pass.[ch]: SF_RETRY_SELTO -> CAM_RETRY_SELTO.
Get rid of the last vestiges of a read/write
interface.
libkern/bsearch.c,
sys/libkern.h,
conf/files: Add bsearch.c, which is needed for some of the
new table lookup routines.
aic7xxx_freebsd.c: Define AHC_NEW_TRAN_SETTINGS if
CAM_NEW_TRAN_CODE is defined.
sbuf.h,
subr_sbuf.c: Add the appropriate #ifdefs so sbufs can
compile and run in userland.
Change sbuf_printf() to use vsnprintf()
instead of kvprintf(), which is only available
in the kernel.
Change the source string for sbuf_cpy() and
sbuf_cat() to be a const char *.
Add __BEGIN_DECLS and __END_DECLS around
function prototypes since they're now exported
to userland.
kdump/mkioctls: Include stdio.h before cam.h since cam.h now
includes a function with a FILE * argument.
Submitted by: gibbs (mostly)
Reviewed by: jdp, marcel (libsbuf makefile changes)
Reviewed by: des (sbuf changes)
Reviewed by: ken
implementation is still experimental, and while fairly broadly tested,
is not yet intended for production use. Support for POSIX.1e ACLs on
UFS will not be MFC'd to RELENG_4.
This implementation works by providing implementations of VOP_[GS]ETACL()
for FFS, as well as modifying the appropriate access control and file
creation routines. In this implementation, ACLs are backed into extended
attributes; the base ACL (owner, group, other) permissions remain in the
inode for performance and compatibility reasons, so only the extended and
default ACLs are placed in extended attributes. The logic for ACL
evaluation is provided by the fs-independent kern/kern_acl.c.
o Introduce UFS_ACL, a compile-time configuration option that enables
support for ACLs on FFS (and potentially other UFS-based file systems).
o Introduce ufs_getacl(), ufs_setacl(), ufs_aclcheck(), which
respectively get, set, and check the ACLs on the passed vnode.
o Introduce ufs_sync_acl_from_inode(), ufs_sync_inode_from_acl() to
maintain access control information between inode permissions and
extended attribute data.
o Modify ufs_access() to load a file access ACL and invoke
vaccess_acl_posix1e() if ACLs are available on the file system
o Modify ufs_mkdir() and ufs_makeinode() to associate ACLs with newly
created directories and files, inheriting from the parent directory's
default ACL.
o Enable these new vnode operations and conditionally compiled code
paths if UFS_ACL is defined.
A few notes:
o This implementation is fairly widely tested, but still should be
considered experimental.
o Currently, ACLs are not exported via NFS, instead, the summarizing
file mode/etc from the inode is. This results in conservative
protection behavior, similar to the behavior of ACL-nonaware programs
acting locally.
o It is possible that underlying binary data formats associated with
this implementation may change. Consumers of the implementation
should expect to find their local configuration obsoleted in the
next few months, resulting in possible loss of ACL data during an
upgrade.
o The extended attributes interface and implementation is still
undergoing modification to address portable interface concerns, as
well as performance.
o Many applications do not yet correctly handle ACLs. In general,
due to the POSIX.1e ACL model, behavior of ACL-unaware applications
will be conservative with respects to file protection; some caution
is recommended.
o Instructions for configuring and maintaining ACLs on UFS will be
committed in the near future; in the mean time it is possible to
reference the README included in the last UFS ACL distribution
placed in the TrustedBSD web site:
http://www.TrustedBSD.org/downloads/
Substantial debugging, hardware, travel, or connectivity support for this
project was provided by: BSDi, Safeport Network Services, and NAI Labs.
Significant coding contributions were made by Chris Faulhaber. Additional
support was provided by Brian Feldman, Thomas Moestl, and Ilmar Habibulin.
Reviewed by: jedgar, keichii, mckusick, trustedbsd-discuss, freebsd-fs
Obtained from: TrustedBSD Project
region for CIS reading" problem:
Use bus_alloc_resource to get the memory that we'll be using. Also
has the benefit of doing usage checking as well. This gets rid of the
ugly kludge that we had before for mapping pmem to vmem.
Second, move PIOCSRESOURCE to its own routine and make it conform more
to style(9) in the process.
Merge rev's 1.65 and 1.66 from sys/net/if_spppsubr.c (implement the
`restart' option, and fix a blatant bug with PAP authentication).
The i4b implementation of this file should be merged back, but for now,
we need this here as well.
Reviewed by: gj
Fix a serious bug in sppp where anyone could obtain a successful PAP
authentication by supplying a null password. I've only stumpled across
the PR while browsing for all sppp-related PRs.
Should we also file a security advisory for this?
PR: 21592
Submitted by: <dli@3bc.de> Dirk Liebke
MCLGET macros in order to avoid incrementing the drop count twice.
Otherwise, in some cases, we may increment m_drops once in m_mballoc()
for example, and increment it again in m_mballoc_wait() if the
wait fails.
this introduces a new buffering mechanism which results in dramatic
simplification of the channel manager.
as several structures have changed, we take the opportunity to move their
definitions into the source files where they are used, make them private and
de-typedef them.
the sound drivers are updated to use snd_setup_intr instead of
bus_setup_intr, and to comply with the de-typedefed structures.
the ac97, mixer and channel layers have been updated with finegrained
locking, as have some drivers- not all though. the rest will follow soon.
this apparently fixes problems initialising certain es1371/es1373/ct5880
revisions.
Confirmed working by: Richard J Kuhns <rjk@grauel.com>
PR: i386/25944
that I removed in my last commit dealing with `make depend' bogons.
This commit has some races, but hopefully they are too short to matter.
Unfortuneatly, neither .newdep nor .olddep is removed by `make clean'.
Submitted by: bde
running in process context in order to run interrupt handlers. This
caused a big smashing of the stack on AMD K6, K5 and Intel Pentium (ie, P5)
processors because we are using npxproc as a flag to indicate whether
the state has been pushed onto the stack.
Submitted by: bde
because libc/rpc/key_call.c references uname(), and ps/print.c also
defines uname(), and ps is linked statically. This leads to a symbol
clash. The userland uname(3) kinda sucked anyway as the hostname
etc was too short. And since the libc rpc interface now uses
the utsname.nodename which gets truncated, I was tempted into doing
something about it. Create a new userland uname function, called
__xuname() which takes an extra argument that allows you to change
the size of the fields. uname() becomes a static inline function
in sys/utsname.h that passes the extra argument in. struct utsname
has its field members expanded by default now in userland.
We still provide a 'uname' externally linkable function for things
that either think that they ``know'' the utsname format and assume
32 character strings and bypass the include file, or objects that
are linked against old libcs. ie: just about every plausible
case that I can think of is covered. Should we ever change the
default lengths again, a libc major bump should not be required
as the size is now passed to the function.
XXX the uname(2) in the kernel is for FreeBSD 1.1 binary compatability!
All the uname(3) functions that are exported to userland are actually
implemented in libc with sysctl. uname(1) uses sysctl directly and
does not call uname(3).
PR: bin/4688
both should work in non-pnp mode, the 924 should also work in its rather
braindead pnp mode- it will adopt port 0x530 unless given hints due to it
starting up in soundblaster mode and thus not requesting a valid mss port
address.
Submitted by: George Reid <greid@ukug.uk.freebsd.org>
When we get an Open event in stopped state, experience shows that this
is usually means we've somehow missed a previous Down event. This has
occasionally bitten people for the IPCP layer with ISDN, apparently a
previously aborted IPCP negotiation must have caused this. As a
bandaid, we quickly pretent a Down event by advancing to starting
state; this effectively implements the `restart' option mentioned in
RFC 1663.
While i'm not yet fully convinced this is the best thing to do (and is
fully compliant with RFC 1661), i've seen a number of reports here on
the German mailing lists where people have been bitten by the previous
behaviour which usually causes quickly looping ISDN reconnects (thus
loss of money...), and where just this patch fixes the problem.
For this, i'd even like to see it MFC'd if possible.
Submitted by: Helmut Kreft <kreft@zeus.ai-lab.fh-furtwangen.de>
1 Make promiscuous mode work
2 A few header additions
3 Allow device config before IFF_UP
These were (respectively)...
Submitted by: Allan Saddi <asaddi@philosophysw.com>
Submitted by: Dave Cornejo <dave@dogwood.com>
Submitted by: Doug Ambrisko <ambrisko@ambrisko.com>
Tested by: David Wolfskill <dhw@whistle.com>
acl_add_perm, acl_clear_perms, acl_copy_entry, acl_create_entry,
acl_delete_perm, acl_get_permset, acl_get_qualifier, acl_get_tag_type,
acl_set_permset, acl_set_qualifier, acl_set_tag_type
This brings us within 4 functions of a full ACL editing library.
Reviewed by: rwatson
pccard in the kernel for those drivers with pccard attachments. This
makes the compat layer a little larger by introducing some inlines,
but should almost make it possible to have independent attachments.
The pccard_match function are the only one left, which I will take
care of shortly.
Make struct cmessage visible from socket.h (about 4 places were
defining it for themselves which wasn't good)
Make __rpc_get_local_uid() useable and give it prototype that's
visible.
Fix some issues with printing out usernames from rpcbind and keyserv.
which resulted in the output of warning messages at boot if
UFS_EXTATTR_AUTOSTART was enabled but ".attribute" and possible
sub-directories weren't in a mounted MFS or UFS file systems.
Pointed out by: dcs
Obtained from: TrustedBSD Project
fatal trap. Also, reload the GDT register to point to BTX's GDT before
playing around with the segment registers to return to real mode. This is
helpful if the kernel causes a fatal exception before it has setup its own
IDT and fault handlers. For example, if one happens to break mtx_init().
Without these changes BTX would recursively page fault (if paging was not
disabled) or triple fault and reset the CPU (without the GDT reload)
instead of providing a potentially useful register dump.
Reviewed by: rnordier
sysctl, net.inet.ip.fw.permanent_rules.
This allows you to install rules that are persistent across flushes,
which is very useful if you want a default set of rules that
maintains your access to remote machines while you're reconfiguring
the other rules.
Reviewed by: Mark Murray <markm@FreeBSD.org>
(as is done in unmount).
Remove a snapshot inode from the superblock list when its last
name goes away rather than when its last reference goes away.
That way it will be properly reclaimed by fsck after a crash
rather than reenabled when the filesystem is mounted.
I was hanging after sending a xfer CTIO and a status CTIO for a non-discon
INQUIRY- the xfer CTIO was returned as completed OK, but the status CTIO
was dropped on the floor. All the fields looked good. I don't know why
it got dropped. But allowing status to go back with data xfer seemed to
work. I also noticed that with a non-disconnecting command that the
firmware handle in the ATIO is zero- this leads me to believe that the
f/w really can only handle one CTIO at a time in the discon case, and
it had no idea what to do with the second (status) CTIO.
CAM_SEND_STATUS. Set a timeout of 2 seconds per CTIO. Make sure
that the 'real' tag value is being checked against- not the
one that also carries the firmware handle.
routine instead of pccard_event(). This avoids spurious extra calls
to pccard_insert_beep() at insert or remove time which could occur
due to noise on the card-present lines.
Clean up some code in pccard_beep.c; we were depending on the order
of evaluation of function arguments, which is undefined in C. Also,
use `0' rather than `NULL' for integer values.
Reviewed by: sanpei, imp