callout_drain() logic. We no longer need a separate non-spin mutex to
do sleep/wakeup with, instead we can now just use the one spin mutex to
manage all the callout functionality.
mutex.
- Don't use callout_drain() to stop the toffhandle callout while holding the
fdc mutex (this could deadlock) in functions called from softclock
(callouts aren't allowed to do voluntary sleeps). Instead, use
callout_stop(). Note that since we hold the associated mutex and are now
using callout_init_mtx(), callout_stop() is just as effective as
callout_drain(). (Though callout_drain() is still needed in detach to
make sure softclock isn't contesting on our mutex before we destroy the
mutex.)
- Remove unused callout 'tohandle' from softc.
MFC after: 1 week
the last reference is dropped. I forgot that vnodes can stick around
for a very long time until processes discover that they are dead. This
means that a vnode reference is not sufficient to keep the mount
referenced and even more code will be required to ref mount points.
Discovered by: kris
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.
Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week
interrupt handlers rather than BUS_SETUP_INTR() and BUS_TEARDOWN_INTR().
Uses of the BUS_*() versions in the implementation of foo_intr methods
in bus drivers were not changed. Mostly this just means that some
drivers might start printing diagnostic messages like [FAST] when
appropriate as well as honoring mpsafenet=0.
- Fix two more of the ppbus drivers' identify routines to function
correctly in the mythical case of a machine with more than one ppbus.
associated with the passed in pfs_node. If it does return a pointer, it
keeps the process locked. This allows a lot of places that were calling
pfind() again right after pfs_visible() to not have to do that and avoids
races since we don't drop the proc lock just to turn around and lock it
again. This will become more important with future changes to fix races
between procfs/ptrace and exit(2). Also, removed a duplicate pfs_visible()
call in pfs_getextattr().
Reviewed by: des
MFC after: 1 week
modules prior to looking up the directory which we will cover to avoid
this problem in mount.
- We must hold the coveredvp locked before we can busy the mountpoint to
prevent a lock order reversal with the vfs_busy() in lookup which holds
the directory lock prior to doing a vfs_busy(). The directory lock is
required to safely clear the v_mountedhere field on the directory.
MFC After: 1 week
prevent the mount point from going away while we're waiting on the lock.
The ref does not need to persist once we have the lock because the
lock prevents the mount point from being unmounted.
MFC After: 1 week
the VFS_STATFS call to prevent the mount from disappearing while we're
stating.
- Convert these routines to use MPSAFE namei semantics.
MFC After: 1 week
vop_lock_post do not trigger.
- Rearrange null_inactive to null_hashrem earlier so there is no chance
of finding the null node on the hash list after the locks have been
switched.
- We should never have a NULL lowervp in null_reclaim() so there is
no need to handle this situation. panic instead.
MFC After: 1 week
- Simplify the logic dealing with recycled vnodes in null_hashget() and
null_hashins(). Since we hold the lower node locked in both cases
the null node can not be undergoing recycling unless reclaim somehow
called null_nodeget(). The logic that was in place was not safe and
was essentially dead code.
MFC After: 1 week
object that requires Giant in vm_object_deallocate(). This is somewhat
hairy in that if we can't obtain Giant directly, we have to drop the
object lock, then lock Giant, then relock the object lock and verify that
we still need Giant. If we don't (because the object changed to OBJT_DEAD
for example), then we drop Giant before continuing.
Reviewed by: alc
Tested by: kris
a calcru() wrapper that passes a local rusage_ext on the stack that is
a snapshot to do the calculations on. Now we can pass p->p_crux to
calcru1() in calccru() again which fixes the issues with runtime going
backwards messages when dead processes are harvested by init.
Reviewed by: phk
Tested by: Stefan Ehmann shoesoft at gmx dot net
processing the interrupt events. If we clear them afterwards we
can completely miss some events as the NIC can change the source
flags while we're in the handler. In order to not get another
interrupt while we're in ifp->if_input() with the driver lock
dropped we now turn off NIC interrupts while in the interrupt
handler. Previously this was meant to be achieved by clearing the
interrupt source flags after processing the interrupt events but
didn't really work as clearing these flags doesn't actually
acknowledge and re-enable the events.
This fixes the device timeouts seen with the VMware LANCE.
- Relax the watchdog timer somewhat; don't enable it until the last
packet is enqueued and if there is a TX interrupt but there are
still outstanding ones reload the timer.
Reported and tested by: Morten Rodal <morten@rodal.no>
MFC after: 3 days
if ksocket is connected to an interface-type node somewhere later
in the graph (e.g., ng_eiface or ng_iface), the csum_data may be
applied to a wrong packet (if we encapsulate Ethernet or IP).
MFC after: 3 days
The ed driver on pc98 was broken by if_edvar.h rev1.31.
Reported by: Kaho Toshikazu (kaho at elam kais kyoto-u ac jp)
Tested by: Eiji Kato (ekato at a1 mbn or jp)
MFC after: 3 days
More info regarding these nics can be found at http://www.myri.com.
Please note that the files
sys/dev/myri10ge/{mcp_gen_header.h,myri10ge_mcp.h} are internally
shared between all our drivers (solaris, macosx, windows, linux, etc).
I'd like to keep these files unchanged, so I can just import newer
versions of them when the firmware API/ABI changes. This means I'm
stuck with some of the crazy-long #define names, and possibly
non-style(9) characteristics of these files.
Many thanks to mlaier for doing firmware(9) just as I
needed it, and to scottl for his helpful review.
Reviewed by: scottl, glebius
Sponsored by: Myricom Inc.
when option DEBUG_LOCKS is used. Trap frames are determined by checking
whether the caller was one of the tl0_*() or tl1_*() asm functions via
a newly added pair of dummy symbols in exception.S which mark the begin
and end of these functions. The tl_trap_* pair marks those in the special
.trap section and the tl_text_* in the regular .text section. Because
of their performance penalty db_search_symbol()/db_symbol_values() and
linker_ddb_search_symbol()/linker_ddb_symbol_values() aren't used here
for determining the caller, with db_search_symbol()/db_symbol_values()
additionally not being reentrant.
- For consistency, change db_backtrace() to also use the new markers for
determining the tl0_*() and tl1_*() asm functions instead of bcmp()'ing
the symbol name.
- Use FBSDID in db_trace.c.
PR: 93226
Based on a patch by: Antoine Brodin <antoine.brodin@laposte.net>
Ok'ed by: jhb
Without CODA_COMPAT_5, it would be equivalent to the plain coda
module. Therefore just add -DCODA_COMPAT_5 to CFLAGS instead of
fiddling with opt_coda.h. This is particularly important when
the module is built along with the kernel and CODA_COMPAT_5 isn't
in the kernel conf file (and so not in opt_coda.h either).
MFC after: 3 days
nfs_diskless.c if NFS_ROOT is in effect, e.g., present
in the kernel config file. Otherwise the built module
won't load due to an undefined reference to nfs_setup_diskless.
MFC after: 3 days
can't be changed from userland. Make them read-only and provide
descriptions.
kern.ipc.max_datalen must never be less than one byte. Enforce this
with a panic in net_init_domain().
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
forcing DMA alignment to default buffer size.
- Make sure DMA pointer properly aligned to avoid being truncated by caller
which causing severe underruns and random popping (especially in 32bit
playback / recording).
- Add AC97 inverted external amplifier quirk for Maxselect x710s
- http://maxselect.ru/
MFC after: 1 week
current directory to allow user rules to create the firmware (e.g. from a
uuencoded blob). make's version of if is evaluated too early to catch this.
Found-by: gallatin
the SMI/TCO address space. Switch the bus space I/O to the
one specific for either the SMI or TCO space. Re-calibrate
the tick. Add some more device id's, 82801FBR submitted by des.
This makes it work on the platforms I've tested with.
Go ahead by: des
should fix strange link state behaviour reported for bcm5721 & bcm5704c
2) Clear bge_link flag in bge_stop()
3) Force link state check after bge_ifmedia_upd(). Otherwise we can miss link
event if PHY changes it's state fast enough.
Tested by: phk (bcm5704c)
Approved by: glebius (mentor)
MFC after: 1 week
jumbo mbuf clusters. To make the variable size clear they are named
MJUMPAGESIZE.
Having jumbo clusters with the native PAGE_SIZE is more useful than
a fixed 4k size according the device driver writers using this API.
The 9k and 16k jumbo mbuf clusters remain unchanged.
Requested by: glebius, gallatin
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
Acknowledgement should definitly go to JMicron Technology for providing full
docs on the metadata format as the only vendor so far, big thanks from here.
threshold. Inflight doesn't make sense on a LAN as it has
trouble figuring out the maximal bandwidth because of the coarse
tick granularity.
The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold
in milliseconds below which inflight will disengage. It defaults
to 10ms.
Tested by: Joao Barros <joao.barros-at-gmail.com>,
Rich Murphey <rich-at-whiteoaklabs.com>
Sponsored by: TCP/IP Optimization Fundraise 2005
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.
Reviewed by: jhb
MFC after: 1 week
and it has not plenty of free pages it tries to free pages in the cache queue.
Unfortunately freeing a cached page requires the locking of the object that
owns the page. However in the context of allocating pages we may not be able
to lock the object and thus can only TRY to lock the object. If the locking try
fails the cache page can not be freed and is activated to move it out of the way
so that we may try to free other cache pages.
If all pages in the cache belong to objects that are currently locked the
cache queue can be emptied without freeing a single page. This scenario caused
two problems:
1) vm_page_alloc always failed allocation when it tried freeing pages from
the cache queue and failed to do so. However if there are more than
cnt.v_interrupt_free_min pages on the free list it should return pages
when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause
resource exhaustion deadlocks.
2) Threads than need to allocate pages spend a lot of time cleaning up the
page queue without really getting anything done while the pagedaemon
needs to work overtime to refill the cache.
This change fixes the first problem. (1)
Reviewed by: tegge@
- In em_attach() remove em_check_for_link(). Not needed here, since
already done in em_hardware_init().
- In em_attach() replace the printing block with call to
em_update_link_status().
- Remove modification of sc->link_state from em_hardware_init() and
from em_media_status(). This makes em_update_link_status() a
single point of change. Call em_update_link_status() where needed.
to getting rid u_int for uint and so on).
b) Turn back on 64 bit DAC support. Cheeze it a bit in that we have two
DMA callback functions- one when we have bus_addr_t > 4 bits in width and
the other which should be normal. Even Cheezier in that we turn off setting
up DMA maps to be BUS_SPACE_MAXADDR if we're in ISP_TARGET_MODE. More work
on this in a week or so.
c) Tested under amd64 and 1MB DFLTPHYS, sparc64, i386 (PAE, but insufficient
memory to really test > 4GB). LINT check under amd64.
MFC after: 1 month
with methods: instead of honoring non-zero values expect drivers
to write their own values on return from ieee80211_ifattach
o add a define for the default h/w bmiss count
MFC after: 2 weeks
pages, not a count of bytes. The sysctl handler for hw.realmem already
uses ctob() to convert realmem from pages to bytes. Thus, on archs that
were storing a byte count in the realmem variable, hw.realmem was inflated.
Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha)
MFC after: 3 days
it so that ip_id etc. don't get overwritten. This fixes forwarding
of fragmented IP packets through a dummynet pipe -- fragments came
out with modified and different(!) ip_id's, making it impossible to
reassemble a datagram at the receiver side.
Submitted by: Alexander Karptsov (reworked by me)
MFC after: 3 days
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.
save the MCA state of the AP. Saving the MCA state of the AP requires
us to allocate memory, which uses sleep locks.
Now that we correct the spinlock nesting of the AP without having
schedlock, avoid calling spinlock_exit(). Instead call critical_exit()
and manually clear the MD spinlock count.
MFC after: 3 days
to preserve currect behaviour). When set to 0, components are not
disconnected - graid3 will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'graid3 list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.
Prodded by: ru
MFC after: 3 days
to preserve currect behaviour). When set to 0, components are not
disconnected - gmirror will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'gmirror list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.
Prodded by: ru
MFC after: 3 days
An example entries for loader.conf to make it possible:
geli_da0_keyfile0_load="YES"
geli_da0_keyfile0_type="da0:geli_keyfile0"
geli_da0_keyfile0_name="/boot/keys/da0.key0"
geli_da0_keyfile1_load="YES"
geli_da0_keyfile1_type="da0:geli_keyfile1"
geli_da0_keyfile1_name="/boot/keys/da0.key1"
geli_da0_keyfile2_load="YES"
geli_da0_keyfile2_type="da0:geli_keyfile2"
geli_da0_keyfile2_name="/boot/keys/da0.key2"
geli_da1s3a_keyfile0_load="YES"
geli_da1s3a_keyfile0_type="da1s3a:geli_keyfile0"
geli_da1s3a_keyfile0_name="/boot/keys/da1s3a.key"
Thanks for jhb and kan who showed me the right direction.
MFC after: 3 days
Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.
Don't reach across CPUs in calcru().
Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).
Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.
Use 27MHz counter on i386/Geode.
Use TSC on amd64 & i386 if present.
Use tick counter on sparc64
automatically both SATA and SAS drives. The async SAS event handling we catch
but ignore at present (so automagic attach/detach isn't hooked up yet).
Do 64 bit PCI support- we can now work on systems with > 4GB of memory.
Do large transfer support- we now can support up to reported chain depth, or
the length of our request area. We simply allocate additional request elements
when we would run out of room for chain lists.
Tested on Ultra320, FC and SAS controllers on AMD64 and i386 platforms.
There were no RAID cards available for me to regression test.
The error recovery for this driver still is pretty bad.
to work with ipmitools. It works with other tools that have an OpenIPMI
driver interface. The port will need to get updated to used this.
I have not implemented the IPMB mode yet so ioctl's for that don't
really do much otherwise it should work like the OpenIPMI version.
The ipmi.h definitions was derived from the ipmitool header file.
The bus attachments are done for smbios and pci/smbios. Differences
in bus probe order for modules/static are delt with. ACPI attachment
should be done.
This drivers registers with the watchdod(4) interface
Work to do:
- BT interface
- IPMB mode
This has been tested on Dell PE2850, PE2650 & PE850 with i386 & amd64
kernel.
I will link this into the build on next week.
Tom Rhodes, helped me with the man page.
Sponsored by: IronPort Systems Inc.
Inspired from: ipmitool & Linux
o add dfs+radar hooks; DFS is presently disabled in the hal
o channel and mode handling changes
o various api changes
o be more aggressive about iq calibration settling so ap mode
operation is better immediately after startup
o rfkill/rfsilent sysctl support
o tpc ack/cts sysctl support
MFC after: 2 weeks
o new chip support
o new platforms: powerpc-be-elf, sparc64-be-elf, and alpha-elf
(alpha is untested, others are known to work)
o many fixes and improvements
MFC after: 2 weeks
zero in this case.) A kernel driver has IFF_DRV_RUNNING
at its full disposal while IFF_UP may be toggled only by
humans or their daemonic deputies from the userland.
MFC after: 3 days
o assume all data frames have been classified so there's no need
to check if QoS is being used, just fetch the wme priority from
the mbuf
o fix double counting of noack frames
o fix nearby comment
MFC after: 2 weeks
o pull nexttbtt forward in adhoc mode too
o resync beacon timers on joining a bss or ibss as the tstamp we
collected while scanning is almost certainly out of date
Note we may need to refine the ibss mode check in ath_recv_mgmt.
Reviewed by: avatar, dyoung
Obtained from: atheros
MFC after: 2 weeks
frame and if we get a beacon miss interrupt ignore it if we've received
a frame within the beacon miss interval. This should never trigger
and the handling at the net80211 layer should likewise deal with this
but it doesn't hurt and can suppress extranous probe request frames.
Note that we can legtimately get a bmiss when under heavy load.
MFC after: 2 weeks
- Restart watchdog if we *did* processed any descriptors. [1]
- Log the watchdog event if the link is *up*. [2]
PR: kern/92948 [1]
Submitted by: Mihail Balikov <mihail.balikov interbgc.com> [1]
PR: kern/92895 [2]
Submitted by: Vladimir Ivanov <wawa yandex-team.ru> [2]
it. The former code used to hang older Intel CPUs by trying to get
non-existent TLB info 2^32 times.
Reduce code duplication around the calls to CPUID 0x02 by using
do-while loops.
PR: i386/92977
Tested by: cy