Drastically quieten down the verbose load progress messages. They were
more useful for debugging than anything, but are beyond a joke when loading
a few dozen modules.
Simplify the ELF extended symbol table load format. Just take the main
symbol table and the string table that corresponds. This is what we will
be getting local symbols from. (needed for the alpha stack tracebacks).
Use the (optional) full symbol tables in lookups. This means we have to
furhter distinguish between symbols that can come from the dynamic linking
table and the complete table.
The alpha boot code now needs to be adapted as ddb/db_elf.c cannot use
the simpler format.
I have not implemented loading the extended symbol tables from the syscall
interface yet, just for preloaded modules.
I am not sure about the symbol resolution. I *think* it's possible that
a local symbol can be found in preference to a global, depending on the
search sequence and dependency tree.
Formerly, the heuristic involving the interpreter path took
precedence.
Also, print a better error message if the brand is missing or not
recognized. If there is no brand at all, give the user a hint that
"brandelf" needs to be run.
Implement preloading in a fairly MI way, assuming the information is
prepared.
DDB interface helpers.. Provide some support for db_kld.c so that we
don't have to export too much detail.
Debugging and cosmetic nits left in from development..
The other half of the containing file hack so modules can associate
themselves with their "file".
but I can't think of another (relatively) easy way of getting the info
since the boot-time initialization is not done immediately after "loading".
XXX module_register() gained an extra arg. This might break the alpha
compile, if so, just add a zero to get the old behavior.
should probably be moved to i386/i386/link_machdep.c (and the same for the
alpha).
Implement "deleting" a preloaded module by destroying it's tags. This is a
hack. We cannot reuse the data, it's been destroyed by relocation,
statically initialized variables have been modified, etc. Note that to
reclaim the load space is going to be more machine-dependent work.
Implement a relocate hook for machdep.c to call so that the physical
addresses get converted to the equivalent KVM addresses.
- seperate unload for preloaded linker objects.
- Don't build a kernel object if running as an a.out kernel.
- extract the real kernel name rather than hardwiring "kernel" for kldstat.
(sysctl kern.bootfile getst the full name via bootinfo)
- use real addresses on the kernel "module" rather than fictitious ones.
- preloaded module support
- search module path for file modules.
- symbols are checked to see if they are in the right containing file
before using their indexes into string tables. This is to help ddb
since it only supplies a pointer to an opaque symbol and there is no
telling which file/object/module/whatever it came from.
- symbol_values checks that the symbol is indeed belonging to the
correct symbol and string table pairs before looking up. (since there
could be many pairs, and KLD/DDB need to find out).
- different ops for files versus preload modules - the unload mechanism
is different. (a preloaded module has to be deleted on unload since
the in-core image is tainted by relocation and variables used)
- Do not build an a.out kernel module if we're running on an elf
kernel. :-) Note that it should theoretically be possible to
mix a.out and elf KLD modules providing -mno-underscores was used
to compile it, or some other symbol conversion takes place.
- Support preload modules (even though /boot/loader doesn't yet)
- Search the module path when loading files.
check off SYSINIT entries as they are run, and when more arrive, we re-sort
and restart (skipping the already-run entries).
This can *only* be done after KMEM (and malloc) is up and running - this is
fine because KLD is the only consumer of this and it's done after that.
The nice thing about this is that the SYSINIT's within preloaded KLD modules
are executed in their natural order. It should be possible to register
devices for the probes which follow, etc. (soon.. several key things
prevent this, such as use of linker sets for things like pci devices).
help track down bugs in the devstat implementation in various drivers.
(i.e., any situation where the driver does not call the devstat routines
once and only once for each transaction initiation and completion)
Prompted by: msmith
NFS_ROOT will produce kernel that cannot mount a UFS /.
Vfs type numbers must be distinct from VFS_GENERIC (and VFS_VFSCONF, but
that has the same value and should go away).
The problem happens because NFS is the first vfs (in sys/conf order) so it
gets type number 0 and conflicts harmfully with VFS_GENERIC which is also 0.
The conflict is apparently harmless in the usual case when another vfs
gets type number 0, because nfs is the only vfs that has sysctls.
Inital fix by: Dima <dima@tejblum.dnttm.rssi.ru>
Reason why it worked by: bde
Reviewed by: Luoqi Chen <luoqi@watermarkgroup.com>
Fixed problem where write()s can get lost due to buffers flagged B_DELWRI
being improperly released in brelse().
The last consumer of this code (the old SCSI system) has left us and
the CAM code does it's own bouncing. The isa dma system has been
doing it's own bouncing for a while too.
Reviewed by: core
Submitted by: Kirk McKusick <mckusick@McKusick.COM>
Two minor changes are also included,
1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set,
vn_lock should always succeed in these cases.
2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount
a little more unstable. It also keeps us in sync with other BSDs.
Suggested by: Bruce Evans <bde@zeta.org.au>
generation was causing unaligned access faults on the Alpha.
I have incremented the devstat version number, since this is an interface
change. You'll need to recompile libdevstat, systat, iostat, vmstat and
rpc.rstatd along with your kernel.
Partially Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
minus the NULL pointer dereference in rev. 1.33. Also simplify
things somewhat by eliminating one traversal of the VM map entries.
Finally, eliminate calls to vm_map_{un,}lock_read() which aren't
needed here. I originally took them from procfs_map.c, but here
we know we are dealing only with the map of the current process.
segments (except memory-mapped devices) in the ELF core file. This
is really nice. You get access to the data areas of all shared
libraries, and even to files that are mapped read-write.
In the future, it might be good to add a new resource limit in the
spirit of RLIMIT_CORE. It would specify the maximum sized writable
segment to include in core dumps. Segments larger than that would
be omitted. This would be useful for programs that map very large
files read/write but that still would like to get usable core dumps.
splhigh() after any system dumps have completed. SHUTDOWN_POST_SYNC
isn't quite late enough for disk controllers.
Converted at_shutdown queues to use the queue(3) macros.
object format of the executable being dumped. This is the first
step toward producing ELF core dumps in the proper format. I will
commit the code to generate the ELF core dumps Real Soon Now. In
the meantime, ELF executables won't dump core at all. That is
probably no less useful than dumping a.out-style core dumps as they
have done until now.
Submitted by: Alex <garbanzo@hooked.net> (with very minor changes by me)
detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually
work for any vfs, since it leaves garbage pointers to memory
allocation control structures.
and use this when masking/unmasking interrupts.
Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.
Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.
Don't let an AP enter _cpu_switch before all local apics are initialized.
type numbers in vfs attach order (modulo incomplete reuse of old
numbers after vfs LKMs are unloaded). This requires reinitializing
the sysctl tree (or at least the vfs subtree) for vfs's that support
sysctls (currently only nfs). sysctl_order() already handled
reinitialization reasonably except it checked for annulled self
references in the wrong place.
Fixed sysctls for vfs LKMs.
when nfs is an LKM. Declare it in a header file. Don't forget to use
it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.
NetBSD uses strcmp() to avoid this ugly global.
device drivers about sectors no longer in use.
Device-drivers receive the call through d_strategy, if they have
D_CANFREE in d_flags.
This allows flash based devices to erase the sectors and avoid
pointlessly carrying them around in compactions.
Reviewed by: Kirk Mckusick, bde
Sponsored by: M-Systems (www.m-sys.com)
- moved definition of MACHINE_ARCH from cpu.h to parm.h as alpha.
- Added definitions of _MACHINE and _MACHINE_ARCH.
- Added hw.ispc98. The hw.ispc98 is 1 in PC98 kernel and is 0 in
IBM-PC kernel.
Discussed with: John Birrell <jb@FreeBSD.ORG>
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
for the Lite2 fix for always returning EIO in dead_read().
Cleaned up the cdevswitch initializers for all tty drivers.
Removed explicit calls to ttsetwater() from all (tty) drivers. ttsetwater()
is now called centrally for opens, not just for parameter changes.
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.
They shouldn't be used there either. They should have gone away
about 3 years ago when the statically initialized devswitches went
away, but su.c unfortunately still frobs the cdevswitch in the old
way.
since (hardware) ttys have too low a bandwidth to benefit significantly
from large buffers. Use twice the old limit for the new-default case
and 8 times the old limit for the driver-specifies-watermark case.
Nothing uses these cases yet.
Removed related debugging code.
in a SMP system. Unexpected things could happen if each cpu
has a different ldt setting and one cpu tries to use value
of currentldt set by another cpu.
The fix is to move currentldt to the per-cpu area. It includes
patches I filed in PR i386/6219 which are also user ldt related.
PR: i386/7591, i386/6219
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
Cast pointers to (vm_offset_t) instead of to (u_long) (as before) or to
(uintptr_t)(void *) (as would be more correct). Don't cast vm_offset_t's
to (u_long) just to do arithmetic on them.
mp_machdep.c:
Cast pointers to (uintptr_t) instead of to (u_long). Don't forget
to cast pointers to (void *) first or to recover from integral
possible integral promotions, although this is too much work for
machine-dependent code.
vm code generally avoids warnings for pointer vs long size mismatches
by using vm_offset_t to represent pointers; pmap.c often uses plain
`unsigned int' instead of vm_offset_t and didn't use u_long elsewhere,
but this style was messed up by code apparently imported from mp_machdep.c.
in ddb) which I broke by changing %8[l]x to %8p. Hacked the central
printf routine to not add an "0x" prefix for %p formats if the field
width is nonzero. The tables are still horribly misformatted on
64-bit machines.
Use %p instead of %8p to print pointers when the field width isn't
important.
DOS partition type 15 (Extended DOS, LBA) as a container for
DOS logical volumes, so the appropriate slices (e.g. sd1s5)
are not initialized.
PR: 7549
PR: 4120
Reviewed by: phk
Submitted by: Jim Mattson <jmattson@sonic.net>
managed to avoid corruption of this variable by luck (the compiler used a
memory read-modify-write instruction which wasn't interruptable) but other
architectures cannot.
With this change, I am now able to 'make buildworld' on the alpha (sfx: the
crowd goes wild...)
and DSO_NOLABELS flags prevent searching for slices and labels
respectively. Current drivers don't set these flags. When
DSO_NOLABELS is set, the in-core label for the whole disk is cloned
to create an in-core label for each slice. This gives the correct
result (a good in-core label for the compatibility slice) if
DSO_ONESLICE is set or only one slice is found, but usually gives
broken labels otherwise, so DSO_ONESLICE should be set if DSO_NOLABELS
is set.
protection checks. Using the partition-relative blkno in some
parts broke the write protection for partitions at unusual
offsets (only for partitions at offset 1 on i386's).
best place to set it, and the wd and wfd strategy routines don't
set it (for failed transfers) because they expect dscheck() to
initialize everything necessary. dscheck() has always set B_ERROR,
but this is not quite sufficient, because b_resid is used by physio()
to decide how much of a B_ERROR'ed i/o was done.
formats and args to match. Fixed old printf format errors (all related;
most were hidden by calling printf indirectly).
This change somehow avoids compiler bugs for 64-bit longs on i386's,
although it increases the number of 64-bit calculations.
timeslice of the exiting process was counted for both the exiting
process and the next process to run if the next process runs
immediately.
Broken in: mostly in kern_clock.c rev.1.70 (1998/05/28)
Callers only need to initialize d_secperunit now, but should
initialize d_type (to reduce the IDE/SCSI confusion), d_typename
(put the disk model in it) and geometry info (if it isn't completely
ficticious). Callers will soon need to initialize d_secsize.
normally only defined in opt_devfs.h, so testing it before including
anything is normally a no-op. Undef'ing DEVFS before including
opt_devfs.h is similarly useless. OTOH, DEVFS support for sliced
but not SLICEd (despite defined(SLICE)) devices is either harmless
(if there are no such devices, then nothing in this file is used)
or necessary (otherwise). It even seems to work for sliced cd
devices.
nearly so many casts here. Casting an pointer that was an integer
back to an integer just to compare it with -1 is bad, and casting
it back just to compare it with NULL is just wrong.
Access the entry address as a uintfptr_t, not as a long, and not
necessarily as what modload(8) passes (it takes a u_long from the
exec header and passes a u_int).
Fixed bitrot in K&R support (suword() now takes a long word).
Didn't fix corresponding bitrot in store.9 and fetch.9.
The correct types for the store and fetch families are problematic.
The `word' functions are unfortunately named and need to be split
to handle ints/longs/object pointers/function pointers. Storing
argv[] as longs is quite broken when longs are longer than pointers,
but usually works because it clobbers variables that will soon be
reinitialized.
Hopefully caddr_t is large enough to hold function pointers.
Cast object pointers to uintptr_t before casting them to u_long.
Types are wronger than usual for the PT_READ_U case. ptrace() can
only return ints, but longs are accessed.
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.
suitable for holding object pointers (ptrint_t -> uintptr_t).
Added corresponding signed type (intptr_t). Changed/added
corresponding non-C9x types for function pointers to match. Don't
use nonstandard types to implement these types, and don't comment
on them in <machine/types.h>.
(long)(u_long)(u_int)-4 = 0x00000000fffffffc on machines with 32-bit
ints and 64-bit longs.
Restored %z format for printing signed hex. %+x shouldn't have been
used since it is an error in userland.
Prepared to nuke %n format by cloning it to %r. %n shouldn't have
been used because it means something completely different in
userland. Now %+r is equivalent to ddb's original %r, and %r is
equivalent to ddb's original %n.
Ignore '+' flag in combination with unsigned formats %{o,p,u,x}.
you can specify the corefile name by using:
sysctl -w kern.corefile="format"
where format is a pathname (relative or absolute -- default is "%N.core"),
with "%N" (process name), "%P" (process ID), and "%U" (user ID) formats.
Reviewed by: Mike Smith, with strong requests by Julian :)
writes of size (100,208]+N*MCLBYTES.
The bug:
sosend() hands each mbuf off to the protocol output routine as soon as it
has copied it, in the hopes of increasing parallelism (see
http://www.kohala.com/~rstevens/vanj.88jul20.txt ). This works well for
TCP as long as the first mbuf handed off is at least the MSS. However,
when doing small writes (between MHLEN and MINCLSIZE), the transaction is
split into 2 small MBUF's and each is individually handed off to TCP.
TCP assumes that the first small mbuf is the whole transaction, so sends
a small packet. When the second small mbuf arrives, Nagle prevents TCP
from sending it so it must wait for a (potentially delayed) ACK. This
sends throughput down the toilet.
The workaround:
Set the "atomic" flag when we're doing small writes. The "atomic" flag
has two meanings:
1. Copy all of the data into a chain of mbufs before handing off to the
protocol.
2. Leave room for a datagram header in said mbuf chain.
TCP wants the first but doesn't want the second. However, the second
simply results in some memory wastage (but is why the workaround is a
hack and not a fix).
The real fix:
The real fix for this problem is to introduce something like a "requested
transfer size" variable in the socket->protocol interface. sosend()
would then accumulate an mbuf chain until it exceeded the "requested
transfer size". TCP could set it to the TCP MSS (note that the
current interface causes strange TCP behaviors when the MSS > MCLBYTES;
nobody notices because MCLBYTES > ethernet's MTU).
Not sure of the result of it..
(may or may not effect anything) but it's fixed now.
(found by: comparing what cvsup sent back to me with what I tested..)
There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries. The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).
rawread()/rawwrite() went away as part of this though it's not strictly
the same patch, just that it involves all the same lines in the drivers.
cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.
Reviewed by: Eivind Eklund and Mike Smith
Changes suggested by eivind.
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>