swap blocks are now in PAGE_SIZE'd increments instead of DEV_BSIZE'd
increments. We still convert to DEV_BSIZE'd increments for the
backing store I/O, but everything else is in PAGE_SIZE increments.
vm_pager.h
Added argument to getpbuf() and relpbuf() to allow each subsystem to
specify a different hard limit on the number of simultanious physical
bufferes that said subsystem may allocate. Without this feature, one
subsystem ( e.g. the vfs clustering code ) could hog *ALL* the pbufs,
causing a deadlock in the pager in a low memory situation.
Same for trypbuf().
Removed call to vm_object_collapse(), which can block. This was being
called without the pageout code holding any sort of reference on the
vm_object or vm_page_t structures being manipulated. Since this code
can block, it was possible for other kernel code to shred the state
the pageout code was assuming remained intact.
Fixed potential blocking condition in vm_pageout_page_free() ( which
could cause a deadlock in a low-memory situation ).
Currently there is a hack in-place to deal with clean filesystem meta-data
polluting the inactive page queue. John doesn't like the hack, and neither
do I.
Revamped and commented a portion of the pageout loop.
Added protection against potential memory deadlocks with OBJT_VNODE
when using VOP_ISLOCKED(). The problem is that vp->v_data can be NULL
which causes VOP_ISLOCKED() to return a less informed answer.
remove vm_pager_sync() -- none of the pagers use it any more ( the old
swapper used to. The new one does not ).
reducing the size of vm_page_t.
SWAPBLK_NONE and SWAPBLK_MASK are defined here. These actually are
more generalized then their names imply, but their placement is somewhat
of a legacy issue from a prior test version of this code that put
the swapblk in the vm_page_t structure. That test code was eventually
thrown away. The legacy remains.
Added vm_page_flash() inline. Similar to vm_page_wakeup() except that
it does not clear PG_BUSY ( one assumes that PG_BUSY is already clear ).
Used by a number of routines to wakeup waiters.
Collapsed some of the code in inline calls to make other inline calls.
GCC will optimize this well and it reduces duplication.
vm_page_free() and vm_page_free_zero() inlines added to convert to
the proper vm_page_free_toq() call.
vm_page_sleep_busy() inline added, replacing vm_page_sleep() ( which has
been removed ). This implements a much more optimizable page-waiting
function.
pointers per entry ). The table has been changed to a singly linked
list of vm_page_t pointers. The table has been doubled in size, but
the entries only take half the space so a net-zero change in memory use.
The hash function has been changed, hopefully for the better. The
combination of the larger hash table size of changed function should
keep the chain length down to a reasonable number (0-3, average 1).
vm_object->page_hint has been removed. This 'optimization' was not
only never needed, but costs as much as a hash chain link to implement.
While having page_hint in vm_object might result in better locality
of reference, the cost is not worth the space in vm_object or the
extra instructions in my view.
vm_page_alloc*() functions have been inlined and call a generalized
non-inlined vm_page_alloc_toq() which combines the standard alloc
and zero-page alloc functions together, reducing code size and the L1
cache footprint. Some reordering has been done... not much. The
delinking code should be faster ( because unlinking a doubly-linked list
requires four memory ops and unlinking a singly linked list only requires
two ), and we get a hash consistancy check for free.
vm_page_rename() now automatically sets the page's dirty bits.
vm_page_alloc() does not try to manually inline freeing a cache page.
Instead, it now properly calls vm_page_free(m) ... vm_page_free() is
really too complex to manually inline.
vm_await(), supporting asleep(), has been added.
of most of the swap-pager-specific fields, the removal of the id,
and the removal of paging_offset.
A new inline, vm_object_pip_wakeupn() has been added to subtract an
arbitrary number n from the paging_in_progress count and then wakeup
waiters as necessary. n may be 0, resulting in a 'flash'.
object->paging_offset has been removed - it was used to optimize a
single OBJT_SWAP collapse case yet introduced massive confusion throughout
vm_object.c. The optimization was inconsequential except for the
claim that it didn't have to allocate any memory. The optimization
has been removed.
madvise() has been fixed. The old madvise() could be made to operate
on shared objects which is a big no-no. The new one is much more careful
in what it modifies. MADV_FREE was totally broken and has now been fixed.
vm_page_rename() now automatically dirties a page, so explicit dirtying
of the page prior to calling vm_page_rename() has been removed.
about conversions of objects to OBJT_SWAP, it is done automatically
now.
Replaced manually inserted code with inline calls for busy waiting on
pages, which also incidently fixes a potential PG_BUSY race due to
the code not running at splvm().
vm_objects no longer have a paging_offset field ( see vm/vm_object.c )
instead to properly handle any waiters.
Added comments, added support for M_ASLEEP. Generally treat M_ flags
as flags instead of constants to compare against.
and the swap_pager has been completely replaced.
The new swap pager uses the new blist radix-tree based bitmap allocator
for low level swap allocation and deallocation. The new allocator
is effectively O(5) while the old one was O(N), and the new allocator
allocates all required memory at init time rather then at allocate
memory on the fly at run time.
Swap metadata is allocated in clusters and stored in a hash table,
eliminating linearly allocated structures.
Many, many features have been rewritten or added. Swap space is now
reallocated on the fly providing a poor-mans auto defragmentation of
swap space. Swap space that is no longer needed is freed on a timely
basis so no garbage collection is necessary.
Swap I/O is marked B_ASYNC and NFS has been fixed to do the right
thing with it, so NFS-based paging now has around 10x the performance
as it did before ( previously NFS enforced synchronous I/O for paging ).
B_DELWRI and B_CACHE flags, fixing a bug that showed up with NFS.
Also, a number of cases where manually inserted code has been removed
and replaced with an inline function call giving us better functional
isolation in the source.
descriptor-passing messages was calling sorflush() without checking
to see if the descriptor was actually a socket. This can cause a
crash by exiting programs that use the mechanism under certain
circumstances.
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
Change from lkm to kld
Add field plexsdno to sd struct
Add flag VF_NEWBORN to drive, sd, plex and volume structs, indicating
that the object has just been created.
Add object types for raw (unattached) plexes and subdisks
Remove definitions of VOLNO, PLEXNO and SDNO (now functions Volno,
Plexno and Sdno)
Move revive parameters from struct plex to struct sd.
struct plex:
maintain a count of the number of inaccessible subdisks.
remove defective and unmapped regions.
Debug flags: make an enum (previously #define)
Set default revive block size to 64kB (was 32 kB)
Previously, accidentally starting the wrong version could corrupt
the RAID5 configuration.
Add functions Volno, Plexno and Sdno to replace the old defines
VOLNO, PLEXNO and SDNO.
Change from lkm to kld
Serious rewrite. No longer call set_<foo>_state to set the state
based only on other objects; instead, add functions
update_<foo>_state, which determine what the state should be by
themselves. This allows the set_<foo>_state functions to shrink
enough to be almost intelligible.
Remove flags setstate_recurse and setstate_recursing.
Remove plex defective regions and unmapped regions, which were
maintained but not used.
Change code to allow daemon to perform operations formerly kludged
into an interrupt context. Remove the DIRTYCONFIG kludge.
Change from lkm to kld
Remove #ifdefs for FreeBSD 2.c
vinumstrategy:
Support anonymous (`raw') subdisks and plexes.
Change code to allow daemon to perform operations formerly kludged
into an interrupt context. Remove the DIRTYCONFIG kludge.
No longer set B_ORDERED for reviving subdisks. I suspect this
wouldn't work correctly, and it should be done in a different manner
in vinumrevive.c
sdio: set subdisk state correctly on error
start to remove code that doesn't make any sense any more.
Remove #ifdefs for FreeBSD 2.c
Change from lkm to kld
correct type of `flags' in calls to set_drive_state.
set_drive_parms: handle anonymous drives correctly (remove them)
drive VOP functions: use the PID of the original opener to fool the
lock manager.
open_drive: be quiet about failures (they're normal when scanning the
partitions).
close_drive: lock drive before closing.
remove_drive: lock drive before deallocating.
read_drive_label: set drive up when all is OK
check_drive:
Complete rewrite. Offload most of the code to the new
vinum_scandisk
format_config:
use snprintf and %qd options to make much less emetic.
Remove old supporting functions.
vinum_scandisk:
Moved here from vinum.c
Almost complete rewrite, incorporating much of what was check_drive.
We still don't have a general way to find the drives on a system, so
get the user to supply the names via the `read' command. For each
device, try each possible compatibility slice name (there's a danger
of finding both /dev/da1h and /dev/da0s1h otherwise). Sort the
partitions found in reverse order of last update time and read them
in, setting the `update' parameter to parse_config and descendents.
save_config: rename to daemon_save_config, since the function is now
called by the daemon. Create a new function save_config which queues
the request with the daemon.
daemon_save_config: some mods to allow for the unfamiliar
environment.
Change from lkm to kld
Remove BROKEN_GDB kludge (it's not needed with klds)
Add code for interfacing with daemon
Modify device minor number encoding, use selector functions which also
permit anonymous plexes and subdisks.
Remove code for 2.x support.
Change messages to omit obvious words like 'plex' and 'subdisk.
give_plex_to_volume: invalidate subdisks being given to a plex which
is part of a volume with other plexes.
give_sd_to_plex: keep track of plex size in all cases
lock drives before closing them, to keep the daemon from getting
confused.
config_drive: handle partition type errors more gracefully
config_subdisk: set subdisk state correctly
find_drive, find_drive_by_dev, find_subdisk, find_plex, find_volume:
set VF_NEWBORN flag when a new object is created
config_drive:
Handle partition_status returns more cleverly.
Replace the device name in some cases where it got overwritten.
config_subdisk:
add parameter `update'. If the object already exists, exit without
any changes.
Set state correctly.
config_plex, config_volume:
add parameter `update'. If the object already exists, exit without
any changes.
parse_config:
move read function to vinum_scandisk.
add parameter `update' to pass to config_<object>.
remove_<object>_entry:
print a message when the object is removed.
update_plex_config:
Start defusing this function, which will go away some time.
Remove calls to update_volume_config.
Make size 64 bits
Change from lkm to kld
Remove BROKEN_GDB kludge (it's not needed with klds)
Add code for interfacing with daemon
Modify manner of determining when module is idle
Modify device minor number encoding, use selector functions which also
permit anonymous plexes and subdisks.
Remove code for 2.x support.
Move vinum_scandisk to vinumio.c
Remove myproc kludge
Keep track of open volumes by flag, not by pid (the pids caused some
problems with the lock manager).
free_vinum:
Remove unmapped and defective regions from plexes.
Wait for daemon to stop before returning
vinumopen:
Don't refuse an open if the volume is already open.
CAM's xpt_init() to get in first. I hope this will fix the build again,
sorry guys. :-(
XXX configure_start() and configure_end() seem to be a bit excessive.. both
here and in the i386 code.
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.
Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's
fix for this problem.
lock, and add some macros and function parameters to make sure that
the information get to the point where it can be put in the lock
structure.
While I'm here, add DEBUG_VFS_LOCKS to LINT.
pnp system in freebsd, I'm not sure how useful this will be, but my
1542CP seems to work well in plug and play mode and does seem to
probe correctly at all the oddball addresses/irq/drqs that I tried.
[[
I was unable to get /kernel.conf or /kernel.config to read in, so
I wasn't able to verify that this method of userconfig works. that's
one thing that makes pnp so hard to use in the current scheme.
Pointers to the right new way of doing this accepted.
]]
o Add some kludges to maybe bring support for 1540A/1542A into the
driver. Since I have no 154xA cards, and the only person I know
that has them hasn't given me feedback, I'm making this commit
blind.
o Honor unit numbers that are in the config file now. This allows one
to hard wire the unit numbers (and have high unit numbers for plug
and pray devices, which can't seem to be hardwired) and have the
cards not migrate from aha1 -> aha0 should aha0 go on the fritz. I
didn't verify that hard wired scsi busses would work, but did verify
that hard wired aha addresses did work to a limited extent. Both
aha0 and aha1 must be hardwired, or when the card that was in aha0
goes away, the probe for aha0 might pick up the card that otherwise
would have been aha1.
its original form. (Originally, it only applied to the CFP 2107.)
Hopefully we can come to some conclusion about which Conner drives are
broken for tagged queueing.
The previous code just ignored the invalid map register, but this gave
surprising results because of the way pci_map_port() associated the map
register offset supplied with a map entry in the map array.
- Bring down the splash screen when a vty is opened for the first
time.
- Make sure the splash screen/screen saver is stopped before
switching vtys.
- Read and save initial values in the BIOS data area early.
VESA BIOS may change BIOS data values when switching modes.
- Fix missing '&' operator.
- Move ISA specific part of driver initialization to syscons_isa.c.
atkbd
- kbdtables.h is now in /sys/dev/kbd.
all
- Adjust for forthcoming alpha port. Submitted by: dfr
Fix NFS file corruption problem introduced in 1.188. The valid range
was not being set properly, causing a later reference to the buffer
to clear the B_CACHE bit.
the CFP2107, but it appears (not surprisingly) that the 1 gig and 4 gig
versions of that drive have the same problem with tagged queueing.
Also, fix the problem reported in PR kern/9482. The XPT_DEV_MATCH case in
xptioctl() wasn't putting a proper path in the CCB before it called
xpt_action(). When CAMDEBUG is defined, and CAM_DEBUG_TRACE debugging is
turned on, the CAM_DEBUG statement at the beginning of xpt_action would end
up deferencing a NULL path pointer. That of course caused a panic.
My solution is to just stick the xpt peripheral's path in the CCB.
PR: kern/9482
Reviewed by: gibbs
however is only marginally useful until the new-style bus (pci and isa)
stuff comes onboard to give us a better shot at actually pci and isa
drivers loadable (or preloadable anyway).
Iprobe is an alpha-only system profiling suite which I'm porting from
Linux/alpha to FreeBSD.
Iprobe works by using the hardware profiling support built into
alpha cpus. In a nutshell, what Iprobe does is to setup the alpha
performance counters to sample the pc at a fairly high rate & dumps
those pc samples out to user space. Then some code runs to map the
sampled PCs to functions. You get a bit more than that (like the PSL
word, so you can tell if you're in the kernel or userland, what the
ipl is, etc).
o Add the EB64PLUS systype into the kernel configuration files
and add it to the GENERIC kernel
o Correct mcclock_isa.c's dependence on cia, it should depend on isa.
This will allow avanti and eb64+ kernels to be built without the cia
chipset support code.
The new file pci_eb64plus_intr.s deals with the interrupt hardware
on the EB64PLUS and was obtained from NetBSD with the NetBSD
copyright intact
The apecs chipset support code was altered to allow routing interrupts
through pci if we're not running on an avanti. Avanti's route all
interrupts through isa.
Tested by: Wilko Bulte <wilko@yedi.iaf.nl>
Partially reviewed by: dfr
needs. This removes the dependancy on Perl for the generation of the
loader, allowing the world to be built on a perl-free system.
Submitted by: Joe Abley <jabley@clear.co.nz>