One bug fixed: Use getmicrouptime() to trigger reseeds so that we
cannot be tricked by a clock being stepped backwards.
Express parameters in natural units and with natural names.
Don't use struct timeval more than we need to.
Various stylistic and readability polishing.
Introduce arc4rand(void *ptr, u_int len, int reseed) function which
returns a stream of pseudo-random bytes, observing the automatic
reseed criteria as well as allowing forced reseeds.
Rewrite arc4random() in terms of arc4rand().
Sponsored by: DARPA & NAI Labs.
- add wrappers for mmap2(2) and ftruncate64(2) system calls;
- don't spam console with printf's when VFAT_READDIR_BOTH ioctl(2) is invoked;
- add support for SOUND_MIXER_READ_STEREODEVS ioctl(2);
- make msgctl(IPC_STAT) and IPC_SET actually working by converting from
BSD msqid_ds to Linux and vice versa;
- properly return EINVAL if semget(2) is called with nsems being negative.
Reviewed by: marcel
Approved by: marcel
Tested with: LSB runtime test
clear the bit. This allows ata driver to attach its children because
it needs the interrupts enabled to succeed.
Submitted by: iwasaki-san
o Spell CardBus as CardBus, not Cardbus or CardBUS while I'm here.
vcanrecycle to check a free vnode's availability. If it is
available, vcanrecycle returns an error code of zero and the
vnode in question locked. The getnewvnode routine then used
to call vn_start_write with the V_NOWAIT flag. If the filesystem
was suspended while taking a snapshot, the vn_start_write would
fail but getnewvnode would fail to unlock the vnode, instead
leaving it locked on the freelist. The result would be that the
vnode would be locked forever and would eventually hang the
system with a race to the root when it was attempted to recycle
it. This fix moves the vn_start_write check into vcanrecycle
where it will properly unlock the vnode if it is unavailable
for recycling due to filesystem suspension.
Sponsored by: DARPA & NAI Labs.
allow us to avoid nasty by-hand string parsing stuff in a number of
places in the kernel, reducing the risk of unexpected consequences
for kernel correctness.
on the _file() theme that do not follow symlinks. Sync to MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
sched_lock. This means that we no longer access p_limit in mi_switch()
and the p_limit pointer can be protected by the proc lock.
- Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE
processes don't call mi_switch(), and even if they did there is no longer
the danger of p_limit being NULL (which is what the original zombie check
was added for).
- When we bump the current processes soft CPU limit in ast(), just bump the
private p_cpulimit instead of the shared rlimit. This fixes an XXX for
some value of fix. There is still a (probably benign) bug in that this
code doesn't check that the new soft limit exceeds the hard limit.
Inspired by: bde (2)
that add an instance of themselves. The npx(4) driver doesn't even check
the npx 'port' hint but hardcodes IO_NPX instead. The npx(4) driver also
will use isa IRQ 13 (on x86, 8 on pc98) by default if no 'irq' hint is
specified, so we don't need that hint either.
Whenever doing a copy-on-write check, first look in the list of
initially allocated blocks to see if it is there. If so, no further
check is needed. If not, fall through and do the full check. This
change eliminates one of two known deadlocks caused by snapshots.
Handling the second deadlock will be the subject of another check-in.
This change also reduces the cost of the copy-on-write check by
speeding up the verification of frequently checked blocks.
Sponsored by: DARPA & NAI Labs.
Whenever doing a copy-on-write check, first look in the list of
initially allocated blocks to see if it is there. If so, no further
check is needed. If not, fall through and do the full check. This
change eliminates one of two known deadlocks caused by snapshots.
Handling the second deadlock will be the subject of another check-in.
This change also reduces the cost of the copy-on-write check by
speeding up the verification of frequently checked blocks.
Sponsored by: DARPA & NAI Labs.
even when the underlying device has a larger sector size. Therefore,
the filesystem code should not (and with this patch does not) try to
use the underlying sector size when doing disk block address calculations.
This patch fixes problems in -current when using the swap-based
memory-disk device (mdconfig -a -t swap ...). This bugfix is not
relevant to -stable as -stable does not have the memory-disk device.
Sponsored by: DARPA & NAI Labs.
in specific situations. The owner thread must be blocked, and the
borrower can not proceed back to user space with the borrowed KSE.
The borrower will return the KSE on the next context switch where
teh owner wants it back. This removes a lot of possible
race conditions and deadlocks. It is consceivable that the
borrower should inherit the priority of the owner too.
that's another discussion and would be simple to do.
Also, as part of this, the "preallocatd spare thread" is attached to the
thread doing a syscall rather than the KSE. This removes the need to lock
the scheduler when we want to access it, as it's now "at hand".
DDB now shows a lot mor info for threaded proceses though it may need
some optimisation to squeeze it all back into 80 chars again.
(possible JKH project)
Upcalls are now "bound" threads, but "KSE Lending" now means that
other completing syscalls can be completed using that KSE before the upcall
finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is
one of the highest priorities in the KSE system.) The upcall when it happens
will present all the completed syscalls to the KSE for selection.
The problem is that the code does a check for the granparent of
the Promise chip, if this is a bridge of the right type, we have
a TX4 on our hands, and need to handle that ones "issues".
Now the grandparent check cause subtle bugs in the newbus system,
mainly that pci_get_devid doesn't return an error value.
This patch works around the issue by using BUS_READ_IVAR() instead.
structure. This has been broken since 1998, but probably hasn't been
noticed because it takes a read/write of 64K blocks (32MB with 512 byte
blocks) to trigger using the 12 byte read/write CDB in scsi_read_write().
Submitted by: "Moore, Eric Dean" <emoore@lsil.com>
MFC after: 3 days
divide/remainder calls. For reasons not resolved, compiling the
relevant routines from libkern into boot2 results in stack corruption.
Do the simple thing: Don't use 64bit divide/remainder operations.
Sponsored by: DARPA & NAI Labs
among other things, the DEVFS rule subsystem to match nodes against a
path pattern supplied by the user.
fnmatch.c was repo-copied from src/lib/libc/gen/fnmatch.c, and the
only changes to it are those necessary to make it compile in the
kernel. The relevant parts of fnmatch.h were imported into libkern.h.
Approved by: -arch
o Implement the thread killing interlock as described by jhb in arch@
while talking to markm.
o Hold Giant around cbb_insert()/cbb_remove(). Deep in the belly of
the vm code we panic if we don't hold this when we activate the memory
for reading the CIS.
o If we had to do the kludge alloc, then do a kludge free.
configuration device hierarchy. Device arrival, departure and not
matched are presently reported. This will be the basis for devd, which
I still need to polish a little more before I commit it. If you don't
use /dev/devctl, it will be a noop.
o Allow the bus_debug variable to be set via the bus.debug tunable.
o Return pnpinfo and location info via the devinfo interface to userland.
devinfo(8) needs to be updated to print it.
o Better resume code. Move the comments around. Force the socket state to
be querried. Ack the interrupts properly.
o Intercept the interrupt requests and keep a list of interrupts to service
ourselves. When the card attaches, set its OK bit. When we get a card
status change interrupt for that card, clear the OK bit. Don't call the
ISR if the OK bit is cleared. Iwasaki-san and yamamoto-san have both
sent me patches that fix the same problem this fixes, but at the pccard
level.
o Try to get the signalling of the thread to actually die. This might not be
100% right, but it is less wrong than before.
o Add a SIC next to a TI type that looks like it could be wrong, but isn't.
the card.
o Add comments about how we're doing the CIS activation.
o Add location and pnp info functions.
o Add better code to hopefully deal with ata cards better (and other drivers
that allocate resources that we didn't preallocate from the CIS). OLDCARD
used to allow it, but NEWCARD was pickier. I'm not 100% sure this works,
but it doesn't break anything.
give us slightly better error checking than before and interpret what
default bits mean better. See the NetBSD CVS tree for the authors of
these changes (revs 1.10 .. 1.17).
Note, we return the PCI pnp info, but in fact that's wrong to do
since that data is not defined for CardBus cards. CardBus says that
these registers are undefined and one should use the CIS to do
device matching. To date, all CardBus cards have had these
registered defined, no doubt because they are using common silicon
to produce both the PCI cards and the CardBus cards. However, it isn't
any worse than the rest of the system, so just note it in passing and
move on.
o Also sort prototypes while I'm here.
Conditionalize the "XX bytes left" checks reference on UFS1/UFS12.
Conditionally build the necessary 64bit math for boot2 if UFS12.
Sponsored by: DARPA & NAI Labs.
revision 1.218. This bug caused a "struct file" reference to be
leaked if VOP_ADVLOCK(), vn_start_write(), or mac_check_vnode_write()
failed during the open operation.
PR: kern/43739
Reported by: Arne Woerner <woerner@mediabase-gmbh.de>
currently disabled):
o Don't use constants for the output parameter, use the iparam count as a
pointer to the first result location.
o Fix bits vs bytes counting problems.
o Split out the hardware and software normalization versions of modexp.
o Enable hardware normalization for chips that support it.
o On reset, disable hardware normalization for 582x and make sure the
chip is in little endian mode.
o Since sw normalization is now the only option, simplify normalization
handling.
Also fix RNG harvesting: disabling PK support (for the moment) had disabled
the MCR2 interrupt; consider both KEY support and RNG support when deciding
whether or not to enable it.
Obtained from: openbsd
It seems that the existence of a "depend" target in src/sys/boot is not
to be taken as an indication that it actually does what one would expect,
at least it clearly threw my testing off.
Apologies to: jhb
for processing callbacks. This closes race conditions caused by locking
too many things with a single mutex.
o reclaim crypto requests under certain (impossible) failure conditions
Load 4 sectors more than we used to. This is harmless overhead for
the UFS1_ONLY case, but sufficient for boot2(UFS1+2).
Sponsored by: DARPA & NAI Labs
Don't use snprintf where strlcpy() will do the job.
Also, a NUL is '\0' not 0 in our style (C doesn't care), so spell it like.
Remove useless {} and () in the general area of this change.
and therefore we need a way for ioctl handlers to run in that thread
in GEOM. Rather than invent a complicated registration system to
recognize which ioctl handler to use for a given ioctl, we still
schedule all ioctls down the tree as bio transactions but add a
special return code that means "call me directly" and have the
geom_dev layer do that.
Use this for all ioctls that make it as far as a diskdriver to
avoid any backwards compatibility problems.
Requested by: scottl
Sponsored by: DARPA & NAI Labs
OpenBSD who got the code (or the idea) from the NetBSD tlp driver.
This gets some cardbus dc cards working (either completely or nearly
so). It also appears to get additional pci cards working, without
breaking working ones.
# Maybe some additional work is needed here. Also, the cardbus attachment
# might need to match on the CIS rather than on the vendor/device so we have
# a finer level of detail as to what the card is. Technically, the
# vendor/device fields are undefined for CardBus (even though most cards are
# using common silicon with pci models).
there are some strange machines that seem to need this.
o delete bogus comment.
o don't use the the bios for read/writing config space. They interact badly
with SMP and being called from ISR. This brings -current in line with
-stable.
# make the latter #ifdef on USE_PCI_BIOS_FOR_READ_WRITE in case we
# need to go back in a hurry.
checks from the MAC tree: allow policies to perform access control
for the ability of a process to send and receive data via a socket.
At some point, we might also pass in additional address information
if an explicit address is requested on send.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
seperate entry points for each occasion:
mac_check_vnode_mmap() Check at initial mapping
mac_check_vnode_mprotect() Check at mapping protection change
mac_check_vnode_mmap_downgrade() Determine if a mapping downgrade
should take place following
subject relabel.
Implement mmap() and mprotect() entry points for labeled vnode
policies. These entry points are currently not hooked up to the
VM system in the base tree. These changes improve the consistency
of the access control interface and offer more flexibility regarding
limiting access to vnode mmaping.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
flags so that we can call malloc with M_NOWAIT if necessary, avoiding
potential sleeps while holding mutexes in the TCP syncache code.
Similar to the existing support for mbuf label allocation: if we can't
allocate all the necessary label store in each policy, we back out
the label allocation and fail the socket creation. Sync from MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
devfs VOP symlink creation by introducing a new entry point to determine
the label of the devfs_dirent prior to allocation of a vnode for the
symlink.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
point that instruments the creation of hard links. Policy implementations
to follow.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
to mbuf label initialization, that functionality was never merged to
the main tree. Go ahead and merge that functionality now. Note that
this requires policy modules to accept the case where the label
element may be destroyed even if init has not succeeded on it (in
the event that policy failed the init). This will shortly also
apply to sockets.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
order used in mac_policy.h and elsewhere. Sort order is basically
"by operation category", then "alphabetically by object". Sync to
MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
externalization, and cred label life cycle events to entirely above
devfs and vnode events. Sync from MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
entry points to better match the entry point ordering in mac_policy.h.
Big diff, no functional change; merge from the MAC tree.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
- If a policy isn't registered when a policy module unloads, silently
succeed.
- Hold the policy list lock across more of the validity tests to avoid
races.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
NB: But it will enable it in all kernels not having options "NO_GEOM"
Put the GEOM related options into the intended order.
Add "options NO_GEOM" to all kernel configs apart from NOTES.
In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.
There are currently three known issues which may force people to
need the NO_GEOM option:
boot0cfg/fdisk:
Tries to update the MBR while it is being used to control
slices. GEOM does not allow this as a direct operation.
SCSI floppy drives:
Appearantly the scsi-da driver return "EBUSY" if no media
is inserted. This is wrong, it should return ENXIO.
PC98:
It is unclear if GEOM correctly recognizes all variants of
PC98 disklabels. (Help Wanted! I have neither docs nor HW)
These issues are all being worked.
Sponsored by: DARPA & NAI Labs.
- Change mpo_init_foo(obj, label) and mpo_destroy_foo(obj, label) policy
entry points to mpo_init_foo_label(label) and
mpo_destroy_foo_label(label). This will permit the use of the same
entry points for holding temporary type-specific label during
internalization and externalization, as well as for caching purposes.
- Because of this, break out mpo_{init,destroy}_socket() and
mpo_{init,destroy}_mount() into seperate entry points for socket
main/peer labels and mount main/fs labels.
- Since the prototype for label initialization is the same across almost
all entry points, implement these entry points using common
implementations for Biba, MLS, and Test, reducing the number of
almost identical looking functions.
This simplifies policy implementation, as well as preparing us for the
merge of the new flexible userland API for managing labels on objects.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
treat it as an invalid partition.
This fixes a bug where ``dumpon <device>'' will configure the dump
device at a random offset on the disk if <device> isn't a valid
partition.
Reviewed by: phk
(1) Use namei() and devfs to discover devices rather than a hard-coded
MAKEDEV implementation. Once rootfs is in place, this will allow
Vinum to be used for the root file system partition.
(2) Pass FREAD to device opens so that GEOM will return sector size
rather than an error on attempts to read label data.
(3) Avoid clobbering return values from close_drive() and masking this
failure, resulting in a later divide by zero due to not having
updated the Vinum-cached sector size.
(4) Ignore failures from DIOCWLABEL as that appears not to be required
in the GEOM environment.
We've done testing in simple Vinum environments, but those with more
complex environments might want to give this a spin in DP2 and make
sure everything is up to speed.
Fixes in collaboration with: iedowse
Reviewed by: grog
before freeing so that WITNESS doesn't dereference mutex data pointers
and page fault. It's now possible to unload vinum.ko with a GENERIC
kernel on 5.0-CURRENT without panic.
Debugged/fixed with the aid of: jake, grog
This allocate the best IRQ to boot-disable devices (have IRQ 0).
Allocated IRQ will be used for PCI interrupt routing when ACPI is
enabled.
Note that verbose messaging enabled for the time being so that
people can easily notice the strange behavior if it happened.
a consistent interface to h/w and s/w crypto algorithms for use by the
kernel and (for h/w at least) by user-mode apps. Access for user-level
code is through a /dev/crypto device that'll eventually be used by openssl
to (potentially) accelerate many applications. Coming soon is an IPsec
that makes use of this service to accelerate ESP, AH, and IPCOMP protocols.
Included here is the "core" crypto support, /dev/crypto driver, various
crypto algorithms that are not already present in the KAME crypto area,
and support routines used by crypto device drivers.
Obtained from: openbsd
This will be removed when new versions of syscalls sigreturn()
and sigaction() are added (mini is working on this but is in
the middle of a move).
This should fix the problem of cvsupd dying.
SCSI disks are too square pegs for the round holes in both of these.
And since atapi-cd has clearly shown that there are better acccess
models for CD media than trying to pretend to be a classical disk,
we stop the masquerade rather than patch up the costume.
But do implement the DIOCGMEDIASIZE and DIOCGSECTORSIZE so it will
be possible to manually attach to GEOM, should some the need arise.
Ideally, this driver should do media-detection and call make_dev()
when a CD is inserted and destroy_dev() when it is removed, this
would allow our future devd(8) to automount etc etc but coding that
takes SCSI-clue beyond anything I posses.
Tested on: sparc64
This reduces the size of GENERIC's text space by 73999 bytes (about 2%).
The bloat is from approximately 3437 strings longer than 31 characters
being padded to a 32-byte boundary.
around limitations in the ia64 kernel stack handling code. Basically
preallocate a bunch of threads (and hence kstacks) while contigmalloc()
still works, and never free them back to the general memory pool. After
the system has been running for a while, contigmalloc() eventually fails
at a critical momemt and panics the system.
is partly based on the Alpha system which duplicates the clock to
each cpu, instead of doing a clock roundrobin like on i386. This means
we get hz * ncpu clocks per second and so we have to seperate clock
sampling from actual 'do the work' clock processing. The BSP runs the
complete processing, the rest just sample state etc.
Using the on-cpu interval timer is not ideal as it will drift. There
is more to be done here, we should use an external clock source.
it. It's also only used in vm/vm_swap.c, but that is also the only source
file that #include's <sys/dmap.h>. sys/dmap.h could probably be embedded
entirely in vm_swap.c since that is the only consumer of it.
totally bogus but will hide the occurances of access of 0xbc(NULL) which
people have run into lately. This is not a proper fix, just a bandaid, until
the cause of this happening is tracked down and fixed.
Reviewed by: rwatson
dereference the struct sigio pointer without any locking. Change
fgetown() to take a reference to the pointer instead of a copy of the
pointer and call SIGIO_LOCK() before copying the pointer and
dereferencing it.
Reviewed by: rwatson
o Unusual order of #ifndef _FOO_H_, followed by license.
o Missing tab in struct sched_param between type and member name.
o Space used, instead of tab, after #define.
o Reversed comment for #endif.
o Irregular comment block.
o Space used, instead of tab, to seperate return value type from
function name.
o Unordered function prototypes.
name instead. (e.g., SLOCK instead of SMTX, TD_ON_LOCK() instead of
TD_ON_MUTEX()) Eventually a turnstile abstraction will be added that
will be shared with mutexes and other types of locks. SLOCK/TDI_LOCK will
be used internally by the turnstile code and will not be specific to
mutexes. Making the change now ensures that turnstiles can be dropped
in at a later date without affecting the ABI of userland applications.
These are still unknown name but these are working as well
as the other ServerWorks chipset.
Description strings should be corrected when the chipsets
are known.
MFC after: 1 week