patch. lf can't be dereferenced after the unload attempt, in case it
was freed. Instead, decrement first and back it out if the unload failed.
This should be relatively immune to races caused by the user since the
userref count will be zero for the duration of the actual unloading and
will stop further kldunload attempts.
Submitted by: Ustimenko Semen <semen@iclub.nsu.ru>
Make TIB handling use buffer size to conform with ANS Forth.
Add ANS MEMORY-ALLOC word set.
See the PRs for extensive details.
PR: kern/9412 kern/9442 kern/9514
Submitted by: PRs from Daniel Sobral <dcs@newsguy.com>
help.common
interp.c
Rename the 'source' command to 'include' in order to avoid conflict
with the ANS Forth command of the same name. (kern/9473)
interp_forth.c:
Changes from kern/9412 (EXCEPTION word), kern/9442 (TIB buffer
sizing) and an improved version of kern/9460 (set
version numbers).
load_aout.c:
Trim some obsolete #if 0'ed cruft.
pnp.c:
Tidy the pnpscan output, turn off the module scanning until we
sort out how to do it right.
PR: kern/9412 kern/9442 kern/9460 kern/9473
Submitted by: PRs from Daniel Sobral <dcs@newsguy.com>
of the CRC as the multicast hash table bit, not the lower six bits. Plus
we have to flip on all bits in the table for multicast mode.
Pointed out by: Kazushi SUGYO <k-sugyou@nwsl.mesh.ad.jp>
turns out to not be useful to unwind the dependencies and continue in
the face of a fatal error.
Also changed the log() to a printf() in softdep_error() so that it will
be output in the case of a impending panic.
Submitted by: Kirk McKusick <mckusick@mckusick.com>
the buffer as still being dirty. This isn't a perfect solution, but
throwing away the buffer contents will often result in filesystem
corruption and this solution will at least correctly deal with transient
errors.
Submitted by: Kirk McKusick <mckusick@mckusick.com>
swap blocks are now in PAGE_SIZE'd increments instead of DEV_BSIZE'd
increments. We still convert to DEV_BSIZE'd increments for the
backing store I/O, but everything else is in PAGE_SIZE increments.
vm_pager.h
Added argument to getpbuf() and relpbuf() to allow each subsystem to
specify a different hard limit on the number of simultanious physical
bufferes that said subsystem may allocate. Without this feature, one
subsystem ( e.g. the vfs clustering code ) could hog *ALL* the pbufs,
causing a deadlock in the pager in a low memory situation.
Same for trypbuf().
Removed call to vm_object_collapse(), which can block. This was being
called without the pageout code holding any sort of reference on the
vm_object or vm_page_t structures being manipulated. Since this code
can block, it was possible for other kernel code to shred the state
the pageout code was assuming remained intact.
Fixed potential blocking condition in vm_pageout_page_free() ( which
could cause a deadlock in a low-memory situation ).
Currently there is a hack in-place to deal with clean filesystem meta-data
polluting the inactive page queue. John doesn't like the hack, and neither
do I.
Revamped and commented a portion of the pageout loop.
Added protection against potential memory deadlocks with OBJT_VNODE
when using VOP_ISLOCKED(). The problem is that vp->v_data can be NULL
which causes VOP_ISLOCKED() to return a less informed answer.
remove vm_pager_sync() -- none of the pagers use it any more ( the old
swapper used to. The new one does not ).
reducing the size of vm_page_t.
SWAPBLK_NONE and SWAPBLK_MASK are defined here. These actually are
more generalized then their names imply, but their placement is somewhat
of a legacy issue from a prior test version of this code that put
the swapblk in the vm_page_t structure. That test code was eventually
thrown away. The legacy remains.
Added vm_page_flash() inline. Similar to vm_page_wakeup() except that
it does not clear PG_BUSY ( one assumes that PG_BUSY is already clear ).
Used by a number of routines to wakeup waiters.
Collapsed some of the code in inline calls to make other inline calls.
GCC will optimize this well and it reduces duplication.
vm_page_free() and vm_page_free_zero() inlines added to convert to
the proper vm_page_free_toq() call.
vm_page_sleep_busy() inline added, replacing vm_page_sleep() ( which has
been removed ). This implements a much more optimizable page-waiting
function.
pointers per entry ). The table has been changed to a singly linked
list of vm_page_t pointers. The table has been doubled in size, but
the entries only take half the space so a net-zero change in memory use.
The hash function has been changed, hopefully for the better. The
combination of the larger hash table size of changed function should
keep the chain length down to a reasonable number (0-3, average 1).
vm_object->page_hint has been removed. This 'optimization' was not
only never needed, but costs as much as a hash chain link to implement.
While having page_hint in vm_object might result in better locality
of reference, the cost is not worth the space in vm_object or the
extra instructions in my view.
vm_page_alloc*() functions have been inlined and call a generalized
non-inlined vm_page_alloc_toq() which combines the standard alloc
and zero-page alloc functions together, reducing code size and the L1
cache footprint. Some reordering has been done... not much. The
delinking code should be faster ( because unlinking a doubly-linked list
requires four memory ops and unlinking a singly linked list only requires
two ), and we get a hash consistancy check for free.
vm_page_rename() now automatically sets the page's dirty bits.
vm_page_alloc() does not try to manually inline freeing a cache page.
Instead, it now properly calls vm_page_free(m) ... vm_page_free() is
really too complex to manually inline.
vm_await(), supporting asleep(), has been added.
of most of the swap-pager-specific fields, the removal of the id,
and the removal of paging_offset.
A new inline, vm_object_pip_wakeupn() has been added to subtract an
arbitrary number n from the paging_in_progress count and then wakeup
waiters as necessary. n may be 0, resulting in a 'flash'.
object->paging_offset has been removed - it was used to optimize a
single OBJT_SWAP collapse case yet introduced massive confusion throughout
vm_object.c. The optimization was inconsequential except for the
claim that it didn't have to allocate any memory. The optimization
has been removed.
madvise() has been fixed. The old madvise() could be made to operate
on shared objects which is a big no-no. The new one is much more careful
in what it modifies. MADV_FREE was totally broken and has now been fixed.
vm_page_rename() now automatically dirties a page, so explicit dirtying
of the page prior to calling vm_page_rename() has been removed.
about conversions of objects to OBJT_SWAP, it is done automatically
now.
Replaced manually inserted code with inline calls for busy waiting on
pages, which also incidently fixes a potential PG_BUSY race due to
the code not running at splvm().
vm_objects no longer have a paging_offset field ( see vm/vm_object.c )
instead to properly handle any waiters.
Added comments, added support for M_ASLEEP. Generally treat M_ flags
as flags instead of constants to compare against.
and the swap_pager has been completely replaced.
The new swap pager uses the new blist radix-tree based bitmap allocator
for low level swap allocation and deallocation. The new allocator
is effectively O(5) while the old one was O(N), and the new allocator
allocates all required memory at init time rather then at allocate
memory on the fly at run time.
Swap metadata is allocated in clusters and stored in a hash table,
eliminating linearly allocated structures.
Many, many features have been rewritten or added. Swap space is now
reallocated on the fly providing a poor-mans auto defragmentation of
swap space. Swap space that is no longer needed is freed on a timely
basis so no garbage collection is necessary.
Swap I/O is marked B_ASYNC and NFS has been fixed to do the right
thing with it, so NFS-based paging now has around 10x the performance
as it did before ( previously NFS enforced synchronous I/O for paging ).
B_DELWRI and B_CACHE flags, fixing a bug that showed up with NFS.
Also, a number of cases where manually inserted code has been removed
and replaced with an inline function call giving us better functional
isolation in the source.
descriptor-passing messages was calling sorflush() without checking
to see if the descriptor was actually a socket. This can cause a
crash by exiting programs that use the mechanism under certain
circumstances.
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
Change from lkm to kld
Add field plexsdno to sd struct
Add flag VF_NEWBORN to drive, sd, plex and volume structs, indicating
that the object has just been created.
Add object types for raw (unattached) plexes and subdisks
Remove definitions of VOLNO, PLEXNO and SDNO (now functions Volno,
Plexno and Sdno)
Move revive parameters from struct plex to struct sd.
struct plex:
maintain a count of the number of inaccessible subdisks.
remove defective and unmapped regions.
Debug flags: make an enum (previously #define)
Set default revive block size to 64kB (was 32 kB)
Previously, accidentally starting the wrong version could corrupt
the RAID5 configuration.
Add functions Volno, Plexno and Sdno to replace the old defines
VOLNO, PLEXNO and SDNO.
Change from lkm to kld
Serious rewrite. No longer call set_<foo>_state to set the state
based only on other objects; instead, add functions
update_<foo>_state, which determine what the state should be by
themselves. This allows the set_<foo>_state functions to shrink
enough to be almost intelligible.
Remove flags setstate_recurse and setstate_recursing.
Remove plex defective regions and unmapped regions, which were
maintained but not used.
Change code to allow daemon to perform operations formerly kludged
into an interrupt context. Remove the DIRTYCONFIG kludge.
Change from lkm to kld
Remove #ifdefs for FreeBSD 2.c
vinumstrategy:
Support anonymous (`raw') subdisks and plexes.
Change code to allow daemon to perform operations formerly kludged
into an interrupt context. Remove the DIRTYCONFIG kludge.
No longer set B_ORDERED for reviving subdisks. I suspect this
wouldn't work correctly, and it should be done in a different manner
in vinumrevive.c
sdio: set subdisk state correctly on error
start to remove code that doesn't make any sense any more.
Remove #ifdefs for FreeBSD 2.c
Change from lkm to kld
correct type of `flags' in calls to set_drive_state.
set_drive_parms: handle anonymous drives correctly (remove them)
drive VOP functions: use the PID of the original opener to fool the
lock manager.
open_drive: be quiet about failures (they're normal when scanning the
partitions).
close_drive: lock drive before closing.
remove_drive: lock drive before deallocating.
read_drive_label: set drive up when all is OK
check_drive:
Complete rewrite. Offload most of the code to the new
vinum_scandisk
format_config:
use snprintf and %qd options to make much less emetic.
Remove old supporting functions.
vinum_scandisk:
Moved here from vinum.c
Almost complete rewrite, incorporating much of what was check_drive.
We still don't have a general way to find the drives on a system, so
get the user to supply the names via the `read' command. For each
device, try each possible compatibility slice name (there's a danger
of finding both /dev/da1h and /dev/da0s1h otherwise). Sort the
partitions found in reverse order of last update time and read them
in, setting the `update' parameter to parse_config and descendents.
save_config: rename to daemon_save_config, since the function is now
called by the daemon. Create a new function save_config which queues
the request with the daemon.
daemon_save_config: some mods to allow for the unfamiliar
environment.
Change from lkm to kld
Remove BROKEN_GDB kludge (it's not needed with klds)
Add code for interfacing with daemon
Modify device minor number encoding, use selector functions which also
permit anonymous plexes and subdisks.
Remove code for 2.x support.
Change messages to omit obvious words like 'plex' and 'subdisk.
give_plex_to_volume: invalidate subdisks being given to a plex which
is part of a volume with other plexes.
give_sd_to_plex: keep track of plex size in all cases
lock drives before closing them, to keep the daemon from getting
confused.
config_drive: handle partition type errors more gracefully
config_subdisk: set subdisk state correctly
find_drive, find_drive_by_dev, find_subdisk, find_plex, find_volume:
set VF_NEWBORN flag when a new object is created
config_drive:
Handle partition_status returns more cleverly.
Replace the device name in some cases where it got overwritten.
config_subdisk:
add parameter `update'. If the object already exists, exit without
any changes.
Set state correctly.
config_plex, config_volume:
add parameter `update'. If the object already exists, exit without
any changes.
parse_config:
move read function to vinum_scandisk.
add parameter `update' to pass to config_<object>.
remove_<object>_entry:
print a message when the object is removed.
update_plex_config:
Start defusing this function, which will go away some time.
Remove calls to update_volume_config.
Make size 64 bits
Change from lkm to kld
Remove BROKEN_GDB kludge (it's not needed with klds)
Add code for interfacing with daemon
Modify manner of determining when module is idle
Modify device minor number encoding, use selector functions which also
permit anonymous plexes and subdisks.
Remove code for 2.x support.
Move vinum_scandisk to vinumio.c
Remove myproc kludge
Keep track of open volumes by flag, not by pid (the pids caused some
problems with the lock manager).
free_vinum:
Remove unmapped and defective regions from plexes.
Wait for daemon to stop before returning
vinumopen:
Don't refuse an open if the volume is already open.
CAM's xpt_init() to get in first. I hope this will fix the build again,
sorry guys. :-(
XXX configure_start() and configure_end() seem to be a bit excessive.. both
here and in the i386 code.
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.
Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's
fix for this problem.
lock, and add some macros and function parameters to make sure that
the information get to the point where it can be put in the lock
structure.
While I'm here, add DEBUG_VFS_LOCKS to LINT.
pnp system in freebsd, I'm not sure how useful this will be, but my
1542CP seems to work well in plug and play mode and does seem to
probe correctly at all the oddball addresses/irq/drqs that I tried.
[[
I was unable to get /kernel.conf or /kernel.config to read in, so
I wasn't able to verify that this method of userconfig works. that's
one thing that makes pnp so hard to use in the current scheme.
Pointers to the right new way of doing this accepted.
]]
o Add some kludges to maybe bring support for 1540A/1542A into the
driver. Since I have no 154xA cards, and the only person I know
that has them hasn't given me feedback, I'm making this commit
blind.
o Honor unit numbers that are in the config file now. This allows one
to hard wire the unit numbers (and have high unit numbers for plug
and pray devices, which can't seem to be hardwired) and have the
cards not migrate from aha1 -> aha0 should aha0 go on the fritz. I
didn't verify that hard wired scsi busses would work, but did verify
that hard wired aha addresses did work to a limited extent. Both
aha0 and aha1 must be hardwired, or when the card that was in aha0
goes away, the probe for aha0 might pick up the card that otherwise
would have been aha1.
its original form. (Originally, it only applied to the CFP 2107.)
Hopefully we can come to some conclusion about which Conner drives are
broken for tagged queueing.
The previous code just ignored the invalid map register, but this gave
surprising results because of the way pci_map_port() associated the map
register offset supplied with a map entry in the map array.
- Bring down the splash screen when a vty is opened for the first
time.
- Make sure the splash screen/screen saver is stopped before
switching vtys.
- Read and save initial values in the BIOS data area early.
VESA BIOS may change BIOS data values when switching modes.
- Fix missing '&' operator.
- Move ISA specific part of driver initialization to syscons_isa.c.
atkbd
- kbdtables.h is now in /sys/dev/kbd.
all
- Adjust for forthcoming alpha port. Submitted by: dfr
Fix NFS file corruption problem introduced in 1.188. The valid range
was not being set properly, causing a later reference to the buffer
to clear the B_CACHE bit.
the CFP2107, but it appears (not surprisingly) that the 1 gig and 4 gig
versions of that drive have the same problem with tagged queueing.
Also, fix the problem reported in PR kern/9482. The XPT_DEV_MATCH case in
xptioctl() wasn't putting a proper path in the CCB before it called
xpt_action(). When CAMDEBUG is defined, and CAM_DEBUG_TRACE debugging is
turned on, the CAM_DEBUG statement at the beginning of xpt_action would end
up deferencing a NULL path pointer. That of course caused a panic.
My solution is to just stick the xpt peripheral's path in the CCB.
PR: kern/9482
Reviewed by: gibbs
however is only marginally useful until the new-style bus (pci and isa)
stuff comes onboard to give us a better shot at actually pci and isa
drivers loadable (or preloadable anyway).
Iprobe is an alpha-only system profiling suite which I'm porting from
Linux/alpha to FreeBSD.
Iprobe works by using the hardware profiling support built into
alpha cpus. In a nutshell, what Iprobe does is to setup the alpha
performance counters to sample the pc at a fairly high rate & dumps
those pc samples out to user space. Then some code runs to map the
sampled PCs to functions. You get a bit more than that (like the PSL
word, so you can tell if you're in the kernel or userland, what the
ipl is, etc).
o Add the EB64PLUS systype into the kernel configuration files
and add it to the GENERIC kernel
o Correct mcclock_isa.c's dependence on cia, it should depend on isa.
This will allow avanti and eb64+ kernels to be built without the cia
chipset support code.
The new file pci_eb64plus_intr.s deals with the interrupt hardware
on the EB64PLUS and was obtained from NetBSD with the NetBSD
copyright intact
The apecs chipset support code was altered to allow routing interrupts
through pci if we're not running on an avanti. Avanti's route all
interrupts through isa.
Tested by: Wilko Bulte <wilko@yedi.iaf.nl>
Partially reviewed by: dfr
needs. This removes the dependancy on Perl for the generation of the
loader, allowing the world to be built on a perl-free system.
Submitted by: Joe Abley <jabley@clear.co.nz>
state.
Note: this requires a recompilation of netstat (but netstat has been
broken since rev 1.52 of ip_mroute.c anyway)
Obtained from: Significantly based on Steve McCanne's
<mccanne@cs.berkeley.edu> work for BSD/OS
arplookup() to try again. This gets rid of at least one user's
"arpresolve: can't allocate llinfo" errors, and arplookup() gives
better error messages to help track down the problem if there really
is a problem with the routing table.
one in the kernel source, and that one is already used for modules.
I don't _think_ this will hurt releases, aout-to-elf, etc, but it is
possible. In all the cases I've looked at, config(8) has been
generated straight after a make world, so if /usr/sbin/config exists and
is the right version for the kernel, then we can pretty much count on
/usr/bin/gensetdefs being there too.
XXX It probably makes sense to have a flag for bsd.kern.mk to avoid these
rules.
XXX IO_NDELAY seems to be the main reason for it, when used in a cdevsw
read or write "flag" context. Perhaps a redundant declaration
somewhere like sys/conf.h might help remove the need for vnode.h in
these device drivers in the first place.
reorganization in rev 1.16 of i386/include/types.h which changed
stdlib.h's use of <machine/types.h>. The problem was the -I. was causing
machine/types.h to come from the current kernel source, while stdlib.h was
coming from /usr/include. /usr/include/stdlib.h is as old as the last
'make world', the machine/types.h was as new as the current source.
- Have the VFS lkm support use vfs_register() etc rather than having it's
own version.
- Have the syscall lkm support use syscall_register() etc rather than
having it's own verison.
- Convert the lkm driver to a module.
suggestions from Greg Lehey some time ago. In the face of multiple
potential file formats, try and give a more sensible error than just
ENOEXEC.
XXX a good case can be made that the loading process is wrong - the linker
should locate the file first (using the search paths etc), then run the
loaders to see if they recognize it. While the present system allows for
the possibility of different search paths for different formats, we do not
use it and it just makes things more complicated than they need to be.
0 success
EAGAIN try again later
other don't call this screen saver again
- Test flags consistently to examine the status of the screen saver.
scrn_blanked: the screen saver is running
scp->status & SAVER_RUNNING: the saver is running in this vty
- Correctlyu preserve status flag bits in set/restore_scrn_saver_mdoe().
to look up cookies properly, at least for standard controllers.
Cookies are used so that we don't have to pass around lots of args.
All of the dmainit functions use the unit number so it is essential
that we pass them a cookie with the correct unit number.
This may break working configurations if there are bugs in the
dmainit functions like the ones I just fixed for VIA chipsets.
Broken in: rev 1.4 of ide_pci.c and rev.1.139 of wd.c.
Prefetch/postwrite was enabled for the wrong controller. (VIA
is bitwise big endian and we confused ourself by shifting left
instead of right.)
Extracted from: last set of patches from the author
(john hood <cgull@smoke.marlboro.vt.us>) on 7 Feb 1998
Instead of initializing UDMA mode, we turned it off and made sure that
it stays off by turning on the "UDMA enable by SET FEATURES" disable.
The damage was limited by bugs in cookie lookup, and suitable
initialization by some BIOSes. The cookie list has slaves before
masters, and the unit number is ignored when cookies are looked up,
so cookie lookup always finds cookies for slaves and the bug only
clobbers slaves, so the bug was harmless for common configurations
with no slaves or only non-UDMA slaves. UDMA initialization for
masters actually worked if the BIOS turns on the UDMA mode bit and
turns off the "UDMA enable by SET FEATURES" disable.
idiot about testing SA_QUIRK_2FM in samount. Fixed.
Removed the NORRLS quirk (to save quirk space) and left
the behaviour of being quiet about failed reserve/release
(failed due Illegal Request) the same.
Added a SF_QUIET_IR for prevent/allow for the same purposes.
* A function device_printf() to make pretty-printing driver messages easier.
* A function device_get_children() to query the children of a device.
* Generic implementations of BUS_ALLOC_RESOURCE and BUS_RELEASE_RESOURCE.
* Change bus_generic_print_child() so that it is actually useful.
- In wb_rxeof(), if the received packet is less than MINCLSIZE bytes,
copy it to an mbuf chain so as to be more frugal in our use of mbuf
clusters.
- The Winbond chip, like the ASIX, wants the 'TX interrupt request'
bit set in the _first_ fragment of a transmitted frame, not the
last. (At least the Winbond manual states this unambiguously; too
bad I wasn't paying attention when I read it the first time.)
- Turn off the transmit threshold mechanism (initialize the threshold
to 0). This effectively puts the chip in 'store and forward' mode
which seems to cut down on transmit errors a little. It may also
reduce transmit performace a bit, but I'm willing to do that if it
means better reliability.
- Normally, the driver allocates an mbuf cluster for each receive
descriptor. This is because we have to be prepared to accomodate up to
1500 bytes (a cluster buffer can hold up to 2K). However, using up a
whole cluster buffer for a tiny packet is a bit of a waste. Also,
it seems to me that sometimes mbufs will linger in the kernel for
a while after being passed out of the driver, which means we might
drain the mbuf cluster pool. The cluster pool is smaller than the
mbuf pool in general, so we do the following: if the packet is less
that MINCLSIZE bytes, then we copy it into a small mbuf chain and
leave the mbuf cluster in place for another go-round. This saves
mbuf clusters in some cases while still allowing them to be used
for heavy traffic exchanges with lots of full-sized frames.
- The transmit descriptor has a bit in the control word which allows
the driver to request that a 'TX OK' interrupt be generated when
a frame has been completed. Sometimes, a frame can be fragmented
across several descriptors. The manual for the real DEC 21140A says
that if this happens, the 'TX interrupt request' bit is only valid
in the descriptor of the last fragment. With the ASIX chip, it seems
the 'TX interrupt request' bit is only valid in the descriptor of
the _first_ fragment. Actually, the manual contains conflicting
information, but I think it's supposed to be the first fragment.
To play it safe, set the bit in both the first and last fragment to
be sure that we get a TX OK interrupt. Without this fix, the driver
can sometimes be late in releasing mbufs from the transmit queue
after transmission.
(<blank@fox.uni-trier.de>) about quirks being set as
arithmetic values, not as bitfields. Add HP, Kennedy
and M4 1/2" reel quirk entries.
Do a lot of gratuitous source changing.
Audit all functions that build ccbs for the tape driver
and decide whether each one can be retried or not.
Still to do is some more state management post errors.
IDE hardare. The attempted fix in rev.1.182 was a no-op except for
adding dozens of style bugs. The undocumented options ALI_V and
DISABLE_PCI_IDE go away as a side effect. ALI_V was a no-op because
rev.1.182 was a no-op. DISABLE_PCI_IDE didn't actually disable
PCI IDE. It disabled the buggy code in wdprobe() at a cost of
completely breaking support for Promise controllers.
Broken in: rev.1.139