repeatedly truncate the same file. Each time the file is truncated,
a buffer is grabbed to store the indirect block numbers that need
to be freed. Those blocks cannot be freed until the inode claiming
them is written to disk. Thus, the number of buffers being held by
soft updates explodes and in extreme cases can run the kernel out
of buffers. The problem can be avoided by doing an fsync on the
file every debug.maxindirdep truncates (currently defaulted to 50).
The fsync causes the inode to be written so that the held buffers
can be freed. The check for excessive buffers is checked as part
of the existing hook for excessive dependencies (softdep_slowdown)
in the truncate code.
Reported by: David Schultz <dschultz@uclink.Berkeley.EDU>
Sponsored by: DARPA & NAI Labs.
MFC after: 3 weeks
so it's value is not sign extended when assigned to the uintmax_t variable
used internally by printf. For example, if bit 31 is set in the cpuid
feature word, then %b would print out the initial value as a 16 character
hexadecimal value. Now it only prints out an 8 character value.
Reviewed by: bde
bus_dmamap_load_mbuf() returned EFBIG.
o Fix mbuf leaks in an error (rare) code path.
o Reuse the TX descriptor if xl_encap() failed instead of
just picking the next one.
o Better error messages.
as separate 16-bit entities. Some of the ring control blocks are
in NIC memory, so they must be referenced using 32-bit accesses.
Smaller accesses have been observed to fail under some conditions.
This caused the rings to be set up wrong, leading to writes by the
card outside of the intended bounds of the rings. This problem was
diagnosed by Michael Barthelow. Don Bowman submitted a patch which
fixed the problem using a slightly different approach.
Reference ring control blocks in NIC memory using a pointer to
volatile.
Parenthesize the BGE_HOSTADDR macro definition properly.
MFC after: 3 days
from the hardware descriptors to avoid the overhead of having a DMA
map for each of them. Bump the number of hardware descriptors to 128,
and use half as many software descriptors for now.
Some minor cleanups.
- remove DPRINTF(), there is a CTR*() for any of them, and KTR is
far more useful to debug this driver.
- some cleanups; remove some unused code and definitions.
map. Use this new feature to implement iommu_dvmamap_load_mbuf() and
iommu_dvmamap_load_uio() functions in terms of a new helper function,
iommu_dvmamap_load_buffer(). Reimplement the iommu_dvmamap_load()
to use it, too.
This requires some changes to the map format; in addition to that,
remove unused or redundant members.
Add SBus and Psycho wrappers for the new functions, and make them
available through the respective DMA tags.
o create a separate tag for each object allocated with bus_dmamem_alloc so
the tag's maxsize is setup appropriately; this reduces memory allocation
for the queue descriptors from 16M to what it should be and also fixes
memory allocation for public key operands
o release bus dma resources on detach so module usage doesn't leak
o remove public key op disable now that bus dma memory allocation is fixed
o collect attach error handling in one place
Sponsored by: Vernier Networks
_nexus_dmamap_load_buffer()
- implement nexus_dmamap_load() in terms of _nexus_dmamap_load_buffer().
Note that this is untested, as this code is not currently used (but
might be later for UPA devices).
- move BUS_DMAMAP_NSEGS to bus_private.h
- disable the ecache flushing in nexus_dmamap_sync(); it should not be
needed, although the docs are not entirely clear on that.
some trick is necessary to prevent further BSD geoms from attaching to
that. Our old trick was to make sure we don't attach to a geom from
the "BSD" class, but this doesn't work if an intermediary geom obscures
this fact. Instead, calculate the MD5 checksum of the label we target
and ask if anybody below us loves that label. If they do we don't.
Coded by: gordon.
- move some constants into iommureg.h
- correct some comments
- use KASSERT() in one place instead of rolling our own
- take a sanity check out of #ifdef DIAGNOSTIC
- fix a syntax error in normally #ifdef'ed out debug code
- tweak the announce message a bit
- remove '\n's from a few panic() calls
- don't use the DVMA base adress the firmware reports; instead, figure
it out from the appropriate register on Sabres and let the IOMMU code
choose it on Psychos. This also makes the IOMMU TSB size freely
selectable.
2.) pass the requesting child device (instead of the bus one) up when
handling interrupt resources
3.) remeber to mark the resource list entry as unused in
sbus_release_resource().
Reported by: scottl (3)
The leak in lseek was introduced in vfs_syscalls.c revision 1.218.
The leak in do_dup was introduced in kern_descrip.c revision 1.158.
Submitted by: iedowse
Otherwise, the scsi devices that it is trying to issue commands to may
have gone away. This is what caused shutdown to hang on ia64 systems
with mpt scsi controllers. The bus system has torn down the device tree
and reset the mpt controller etc, and suddenly along comes dashutdown
and wants to issue a few more scsi commands.... <HANG!>
This shouldn't work on i386 either, but it seems to work solely due
to luck.
Add support for GCC's --test-coverage --profile-arcs options.
Add code to call the functions listed in the .ctors section, these are
used to string the per .o file counter blocks into a linked list.
Add empty __bb_fork_func() to cope with GCC magic gandling of exec*()
named functions.
To add support for other platforms should be trivial, but involves
determining the exact data-types gcc uses on that platform.
Minimum Acceptable Value: _POSIX_LOGIN_NAME_MAX.
The comments at the bottom of this file claim sysconf(3) provides this value,
but it seems sysconf(3) hasn't implemented this yet.
and declare them extern in interrupt.c. This eliminates the need
for ia64_add_sapic(), which is called from sapic.c.
While here, reformat ia64_enable() in interrupt.c to improve
indentation and add a sysctl (machdep.apic) to dump the I/O APIC
entries currently programmed into all I/O APICs. The latter can
help analyze interrupt problems.
Note that the sysctl is not intended as a userland (software)
interface. It may be changed in the future to include counters
so that vmstat -i can make use of it. It may also be removed...
copies of the reload. Note that we use the precomputed itm_reload value
so that we can avoid a division in the kernel. The ia64 cpu does not
have integer divide, so this would have been done by a floating point
operation.
CLOCK_VECTOR and define it as 254, not 255. Vector 255 is already
in use as the AP wakeup vector on the HP rx2600.
This needs to be made more dynamic. The likelyhood of vector 254
being in use is pretty small, but we already have code to assign
vectors to IPIs (see sal.c) and it's preobably better to have a
centralized "vector manager" that hands out vectors based on
some imput (like priority).
called. Otherwise (depending on a non-deterministic sort), the timecounter
code can be initialized before the clock rate has been set (on ia64) and it
assumes hz = 100, rather than the real value of 1024. I'm not sure how much
gets upset by this.
Glanced at by: phk
handleclock itself is trivial.
While here, replace (itc_frequency+hz/2)/hz with itm_reload for
consistency. There's now a single place where we determine the
ITM reload value.
header with M_MOVE_PKTHDR one should not reference the packet header in the
original packet; in this case the code was assuming that m_adj would alter
m_pkthdr.len which stopped happening because M_MOVE_PKTHDR removes the
M_PKTHDR bit from m_flags
Submitted by: Bill Fenner <fenner@research.att.com>
interrupt block). We use the previously hardcoded address as a
default only, but will otherwise use whatever ACPI tells us.
The address can be found in the MADT table header or in the
LAPIC override table entry.
space most of the time, but handles machines with lots of I/O
(S)APICs. We cannot make this more dynamic without breaking the
interface with vmstat. Hence, we need to fix the interface first.
name of unused entries from "intr XXX" to "#XXX". This makes it
easier to debug interrupt problems, because vmstat can be hacked
more easily to dump all interrupt entries that are in use and not
those that have had interrupts.
devices aren't necessarily mapped within 4GB. I/O port addresses
are offsets into the memory mapped I/O port space, which is not
larger than 16MB. No need to convert those to 64 bit types.
Previously all filesystems which relied on specfs to do devices
would have private overrides for vop_std*, so the vop_no* overrides
here had no effect. I overlooked the transitive nature of the vop
vectors when I removed the vop_std* in those filesystems.
Removing the override here restores device node locking to it's
previous modus operandi.
Spotted by: bde
then call do_setopt_accept_filter(so, NULL) which will free the filter
instead of duplicating the code in do_setopt_accept_filter().
Pointed out by: Hiten Pandya <hiten@angelica.unixdaemons.com>
- 'spec' and 'ver' are attributes of a unit rather than a node.
- Report Phy and Link info separatelly.
- Reorder intialization step in fwohci_reset().
- Fix some bogosity with mixing unit numbers and channels, which would only
work for one instance of the device.
- Use a simpler scheme for input and output queueing.
- Use db_alt_break.
o Make the URL of the handbook match reality
o Improve some comments (either wording or formatting)
o Sync with i386: comment-out DDB, INVARIANTS, INVARIANT_SUPPORT
o Add some more SCSI/RAID controllers:
ahd, mpt, asr, ciss, dpt, iir, mly, ida
o Remove support for the parallel port
o Add NICs: em, bge
o Remove NICs: ste, tl, tx, vr, wb
o Enable USB support again, except of the UHCI host controller.
UHCI still hangs the BigSur (=HP i2000) machines, and makes
them useless. The OHCI controller works fine. Note that newer
ia64 boxes based on the Intel host controllers (UHCI or EHCI)
still won't have USB support. We really need to import the
EHCI host controller from NetBSD...
to sort out disk-io from file-io in the vm/buffer/filesystem space.
The intent is to sort VOP_STRATEGY calls into those which operate
on "real" vnodes and those which operate on VCHR vnodes. For
the latter kind, the call will be changed to VOP_SPECSTRATEGY,
possibly conditionally for those places where dual-use happens.
Add a default VOP_SPECSTRATEGY method which will call the normal
VOP_STRATEGY. First time it is called it will print debugging
information. This will only happen if a normal vnode is passed
to VOP_SPECSTRATEGY by mistake.
Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY
does on a VCHR vnode today.
Add a new VOP_STRATEGY method in specfs to catch instances where
the conversion to VOP_SPECSTRATEGY has not yet happened. Handle
the request just like we always did, but first time called print
debugging information.
Apart up to two instances of console messages per boot, this amounts
to a glorified no-op commit.
If you get any of the messages on your console I would very much
like a copy of them mailed to phk@freebsd.org
included in the kernel. Include imgact_elf.c in conf/files, instead of
both imgact_elf32.c and imgact_elf64.c, which will use the default word
size for an architecture as defined in machine/elf.h. Architectures that
wish to build an additional image activator for an alternate word size can
include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which
allows it to be dependent on MD options instead of solely on architecture.
Glanced at by: peter
void backtrace(void);
function which will print a backtrace if DDB is in the kernel and an
explanation if not.
This is useful for recording backtraces in non-fatal circumstances and
does not require pollution with DDB #includes in the files where it
is used.
It would of course be nice to have a non-DDB dependent version too,
but since the meat of a backtrace is MD it is probably not worth it.
o Update copyright.
o Add a tunable to allow the ti12xx to initialize the pci clock. Some rare
cards need it.
o rename 67xx -> 6729 since there's really only one chip in this family.
o from pcic_pci_pd6729_init call the routing code
o Fix some comments out routing on the TI-1130 class (1030, 1130 and 1131)
MFC: After RE@ says it is ok.
- Separate fc->dev (i.e. fwohci0) and fc->bdev (i.e. firewire0).
- Remove unused firewirebusreg.h.
- Reduce size of descriptor block for asynchronous transmit and
check the number of descriptor when copying from mbuf.
- Skip mbuf whose length is zero. NFS seems passing such mbuf and
some chips generates unrecoverable error.
the children of a sysctl node, so that the arguments to the SYSCTL_NODE
macro can themselves be macros. It would be nice to use __CONCAT throughout
this file, but the macros are so large that it quickly becomes unweildly,
(4 nested __CONCATs).
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.
Found by: src/tools/tools/vop_table
On architectures with a non-executable stack, eg sparc64, this is used by
libgcc to determine at runtime if its necessary to enable execute permissions
on a region of the stack which will be used to execute code, allowing the
call to mprotect to be avoided if the kernel is configured to map the stack
executable.
- Restore pci config registers after resume.
- Reinitialize and start rx buffers after resume.
- Don't reallocate memory in fwohci_db_init() if the dbch is
already initialized.
- Fix typo.
- Some clean up.
cryptodev or kldunload cryptodev module); crypto statistcs; remove
unused alloctype field from crypto op to offset addition of the
performance time stamp
Supported by: Vernier Networks
o Allow callers of m_extadd() to allocate their own reference
m_ext.ref_cnt pointer, rather than having the mbuf system allocate it
with a malloc() in the critical path. This speeds m_extadd() up, and
also simplifies locking (malloc() may need Giant).
A driver or subsystem wishing to take use its own ref counter must
initialize m_ext.ref_cnt to point to its ref counter prior to
calling m_extadd(), and it must use EXT_EXTREF as its external type.
Eg:
m->m_ext.ref_cnt = my_ref_cnt_ptr;
m_extadd(.....,EXT_EXTREF);
Reviewed by: bosko
positions for the status bits of port a and port b are different. To
avoid needing to know which channel the interrupt handler is working on,
shift the status bits for port a into the port b bit positions, and always
check the port b status bits. This fixes using port b, which I neglected
to test before.
- Remember to update the channel's tty structure from the passed in termios
in the param routine.
- Minor style.
code, make the emulator use it.
Rename unsupported_msg() to unimplemented_syscall(). Rename some arguments
for clarity
Fixup grammar.
Requested by: bde
Metricom Ricochet GS modem. Add them here.
# A new umodem appears to be needed to make the sanyo phone work, but that's
# more extensive and will come after coordination.
With a 1 byte transmit fifo, 3 byte receive fifo, and wierd multiplexed I/O
designed for a Z80 cpu, this chip redefines suckage.
Based on the openbsd and netbsd drivers. Only really works as a console,
modem support is not complete since I can't test it.
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.
take pointers to filedesc structures instead of threads. This makes
it more clear that they do not do any voodoo with the thread/proc
or anything other than the filedesc passed in or returned.
Remove some XXX KSE's as this resolves the issue.
is now synchronized by a mutex, whereas access to user maps is still
synchronized by a lockmgr()-based lock. Why? No single type of lock,
including sx locks, meets the requirements of both types of vm map.
Sometimes we sleep while holding the lock on a user map. Thus, a
a mutex isn't appropriate. On the other hand, both lockmgr()-based
and sx locks release Giant when a thread/process blocks during
contention for a lock. This could lead to a race condition in a legacy
driver (that relies on Giant for synchronization) if it attempts to
kmem_malloc() and fails to immediately obtain the lock. Fortunately,
we never sleep while holding a system map lock.
calling getmicrouptime (but maintain the struct timeval-based calling
convention for compatibility)
o eliminate the use of timersub in ratecheck
Note that flood ping tests indicate ppsratecheck is inaccurate (but on the
conservative side) with this revised implementation. If more accuracy is
needed we'll have to introduce an alternate interface or increase the
overhead.
Reviewed by: silby, dillon, bde
implement ALT_BREAK_TO_DEBUGGER. The caller provides a pointer to a state
variable to allow different state to be maintained for separate instances of
a device.
- Use struct vm_map * instead of vm_map_t in db_break.h to avoid its users
needing to include vm headers.
Requested by: njl
for the raidctl device.
Select a more conservative default for the permissions for /dev/raidctl
since the operations are performed using ioctl() not read() and write().
Submitted by: kris
Reviewed by: scottl
Use BUS_SPACE_MAXSIZE_32BIT for the parent dma tags, and
(NSEGS - 1) * PAGE_SIZE for the data buffer tags. FreeBSD/sparc64 is
more strict about checking these values that other arches.
It is not complete (the LILO root= specification isn't passed to our
loader for instance), it has not been touched in over 2 years. Linux has
moved on to GRUB, so this is OBE now. If someone creeps up to work on it,
it could become a port.
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a "copy" operation. This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain. This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.
These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block. This
introduces an incompatibility with openbsd which we may want to revisit.
Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them. We may want to support this;
for now we watch for it with an assert.
Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.
Supported by: Vernier Networks
Reviewed by: Robert Watson <rwatson@FreeBSD.org>
Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of
order / reverse-time-indexed packet should be acknowledged as specified
in RFC-793 page 69 then dropped. The original PAWS code in FreeBSD (1994)
simply acknowledged the segment unconditionally, which is incorrect, and
was fixed in 1.183 (2002). At the moment we do not do checks for SYN or FIN
in addition to (tlen != 0), which may or may not be correct, but the
worst that ought to happen should be a retry by the sender.
do this avoid m_getcl so we can copy the packet header to a clean mbuf
before adding the cluster
o move an assert to the right place
Supported by: Vernier Networks
- Add a mtx_destroy() to vm_object_collapse(). (This allows a bzero()
to migrate from _vm_object_allocate() to vm_object_zinit(), where it
will be performed less often.)
__acl_get_link(), __acl_set_link(), acl_delete_link(), and
__acl_aclcheck_link(), with almost identical implementations to
the existing __acl_*_file() variants on these calls. Update
copyright.
Obtained from: TrustedBSD Project
__acl_get_link() Retrieve an ACL by name without following
symbolic links.
__acl_set_link() Set an ACL by name without following
symbolic links.
__acl_delete_link() Delete an ACL by name without following
symbolic links.
__acl_aclcheck_link() Check an ACL against a file by name without
following symbolic links.
These calls are similar in spirit to lstat(), lchown(), lchmod(), etc,
and will be used under similar circumstances.
Obtained from: TrustedBSD Project
work. The interface was gleaned from the Linux driver. Currently only
one RX & one TX buffer are used. Firmware support is not tested so for the
MPI-350 so it is disabled. Signal cache and monitor mode are not supported
yet. Signal cache is not supported since in encapsulation mode ethernet
frames are returned by the chip. LAN monitor mode support will be added
shortly. Thanks to Warner for the MPI-350 card he sent me.
Add support for RSSI map from PR kern/32880 which was incomplete. Enhanced
with the ability to select the cache mode of raw, dbm or per-cent.
Clean up Signal/Noise/Quality structures and units with help from
Marco Molteni.
Change flash to use a malloc'ed buffer when needed.
PR: kern/32880
Submitted by: Douglas S. J. De Couto decouto@pdos.lcs.mit.edu,
Marco Molteni
MFC: 3 weeks
call is in progress on the vnode. When vput() or vrele() sees a
1->0 reference count transition, it now return without any further
action if this flag is set. This flag is necessary to avoid recursion
into VOP_INACTIVE if the filesystem inactive routine causes the
reference count to increase and then drop back to zero. It is also
used to guarantee that an unlocked vnode will not be recycled while
blocked in VOP_INACTIVE().
There are at least two cases where the recursion can occur: one is
that the softupdates code called by ufs_inactive() via ffs_truncate()
can call vput() on the vnode. This has been reported by many people
as "lockmgr: draining against myself" panics. The other case is
that nfs_inactive() can call vget() and then vrele() on the vnode
to clean up a sillyrename file.
Reviewed by: mckusick (an older version of the patch)
MUTEX_PROFILING is in opt_global.h, so this does not introduce a risk of
variant structure sizes unless foreign kernel modules are used.
This saved 16 bytes per vnode and 16 bytes per vm object for a total of
4MB on a 2GB machine.
Idea from: alc
to treat desiredvnodes much more like a limit than as a vague concept.
On a 2GB RAM machine where desired vnodes is 130k, we run out of
kmem_map space when we hit about 190k vnodes.
If we wake up the vnode washer in getnewvnode(), sleep until it is done,
so that it has a chance to offer us a washed vnode. If we don't sleep
here we'll just race ahead and allocate yet a vnode which will never
get freed.
In the vnodewasher, instead of doing 10 vnodes per mountpoint per
rotation, do 10% of the vnodes distributed evenly across the
mountpoints.
here. It manifests itself by sendmail hanging in "fifoow" during
boot on a diskless machine with sendmail disabled.
Giving the sleep a 1sec timout breaks the deadlock, but does not solve
the underlying problem.
XXX comment applied.
- Restore %g6 and %g7 for kernel traps if we are returning to prom code.
This allows complex traps (ones that call into C code) to be handled from
the prom.
files which might be included together.
Things like debuggers and lint-like programs get their knickers in
a twist (rightly so one might add) when they find different locations
for the same named struct depending on which .h file were included
first.
This is a stellar example of Very Bad Thinking on the part of the
standards dudes who wrote that both sys/uio.h and sys/socket.h
should define struct iovec the same way.
Fix this by putting struct iovec into its own miniature sys/_iovec.h
file and #include that from sys/socket.h and sys/uio.h.
Sensible people could just put iovec into sys/_types.h but there
is probably some standard or other which will be violated if we
did something that horrible.
comes along and flushes a file which has been mmap()'d SHARED/RW, with
dirty pages, it was flushing the underlying VM object asynchronously,
resulting in thousands of 8K writes. With this change the VM Object flushing
code will cluster dirty pages in 64K blocks.
Note that until the low memory deadlock issue is reviewed, it is not safe
to allow the pageout daemon to use this feature. Forced pageouts still
use fs block size'd ops for the moment.
MFC after: 3 days
than hard-coded uids and gids.
Switch the device to a group of wheel instead of operator.
Narrow down the permissions on the device to require root privilege
to manipulate the system power state. It may be that we can broaden
access to the device after review of the access control in ACPI.
Submitted by: kris
Reviewed by: takawata
(show thread {address})
Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.
All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().
kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.
Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.
be used for zones that allocate objects of less 1 page. The biggest advantage
of this is that all of a sudden the majority of kernel malloc-ed data doesn't
need kva allocated for it. Besides microbenchmarks I haven't seen a measurable
performance improvement from doing this.
with make_dev(). Use OPERATOR instead of implicit WHEEL to match
other storage devices. Use a mode of 0640 to be consistent
with other storage devices.
Submitted by: kris
Reviewed by: scottl