file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.
Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.
Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by: Peter Holm
X-MFC after: 3 weeks (if ever: it changes ABI)
- Define our own maybe_preempt() as sched_preempt(). We want to be able
to preempt idlethread in all cases.
- Define our idlethread to require preemption to exit.
- Get the cpu estimation tick from sched_tick() so we don't have to worry
about errors from a sampling interval that differs from the time
domain. This was the source of sched_priority prints/panics and
inaccurate pctcpu display in top.
for clock.h, so changing th i386 clock.h broke it. MFi386 (not tested):
Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.
Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.
This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.
Discussed with: julian
- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.
Suggested by: jhb
Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.
Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.
This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
the mount options list with vfs_deleteopt(). At this point, the export
information is saved in mp->mnt_export, so we can delete
the "export" mount option from mp->mnt_optnew and mp->mnt_opt.
This fixes read-write/read-only update mounts (mount -u -o rw, mount -u -o ro)
of NFS exported directories.
For some reason, I could only reproduce the problem with a configuration
supplied by Andre:
- "options QUOTA" enabled in kernel config
- "/ -maproot=root 10.0.1.105" in /etc/exports
Reported by: kris, Andre Guibert de Bruet <andy siliconlandmark com>,
Andrzej Tobola <ato iem pw edu pl>
Tested by: Andre Guibert de Bruet
addresses shall access invalid descriptor DMA addresses on PCIe
hardwares and then panicked the system.
To fix it set descriptor DMA addresses before enabling Tx and Rx
such that hardware can see valid descriptor DMA addresses. Also
set RL_EARLY_TX_THRESH before starting Tx and Rx.
Reported by: steve.tell AT crashmail DOT de
Tested by: steve.tell AT crashmail DOT de
Obtained from: NetBSD
MFC after: 1 week
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
control data but no payload data is passed.
Change m_uiotombuf() to return at least one empty mbuf if the requested
length was zero. Add comment to sosend_dgram and sosend_generic().
Diagnoses by: jhb
Regression test by: rwatson
Pointy hat to. andre
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.
A simplified scenario:
root fs var fs
/ A / (/var) D
/var B /log (/var/log) E
vfs lock C vfs lock F
Within each file system, the lock order is clear: C->A->B and F->D->E
When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:
L1: C->A->B
|
+->F->D->E
L2: F->D->E
|
+->C->A->B
The lookup() process for namei("/var") mixes those two lock orders:
VOP_LOOKUP() obtains B while A is held
vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
violates L2)
vput() releases lock on B
VOP_UNLOCK() releases lock on A
VFS_ROOT() obtains lock on D while shared lock on F is held
vfs_unbusy() releases shared lock on F
vn_lock() obtains lock on A while D is held (violates L1, follows L2)
dounmount() follows L1 (B is locked while F is drained).
Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).
With unmount, you can get 4 processes in a deadlock:
p1: holds D, want A (in lookup())
p2: holds shared lock on F, want D (in VFS_ROOT())
p3: holds B, want drain lock on F (in dounmount())
p4: holds A, want B (in VOP_LOOKUP())
You can have more than one instance of p2.
The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.
- Tor Egge
To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.
Idea by: ups
Reviewed by: tegge, ups, jeff, rwatson (mac interaction)
Tested by: Peter Holm
MFC after: 2 weeks
sparc64 GENERIC and the sound device drivers known working on sparc64
to use bus_get_dma_tag() to obtain the parent DMA tag so we can get rid
of the sparc64_root_dma_tag kludge eventually. Except for ath(4), sk(4),
stge(4) and ti(4) these changes are runtime tested (unless I booted up
the wrong kernels again...).
a power saving mode otherwise.
- If the thread is already bound in sched_bind() unbind it before
re-binding it to a new cpu. I don't like these semantics but they are
expected by some code in the tree. Patch by jkoshy.
Dont expose em->shared to the outside world before its properly
initialized. Might not affect anything but its at least a better
coding style.
Dont expose em via p->p_emuldata until its properly initialized.
This also enables us to get rid of some locking and simplify the
code because we are workin on a local copy.
In linux_fork and linux_vfork create the process in stopped state
to be sure that the new process runs with fully initialized emuldata
structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
race that could result in never woken up process [2].
Reported by: Scot Hetzel [1]
Suggested by: jhb [2]
Reviewed by: jhb (at least some important parts)
Submitted by: rdivacky
Tested by: Scot Hetzel (on amd64)
Change 2 comments (in the new code) to comply to style(9).
Suggested by: jhb
work when we start requiring this.
- Don't specify an alignment when creating our own parent DMA tag;
the supported DMA engines require no alignment constraint (f.e. the
LANCE child does though) and it's no inherited by the child DMA
tags anyway (which probably is a bug though).
- Fix whitespace nits.
These are shared-memory variants based on Am79C90-compatible chips
that apart from the missing DMA engine are similar to the 'ledma'
variant including using a (pseudo-)bus/device for the buffer that
the actual LANCE device hangs off from. The performance of these is
close to that of the 'ledma' one, like expected at a few times the
CPU load though.
1) Do not do quota accounting for the actual quota data files
or for file system snapshot files ("system" files). This
prevents a deadlock descibed in PR kern/30958 if the kernel
ever has to grow the quota file. Snapshot files were already
exempt from the quota checks, but this change generalized the check.
2) Fix a cast that caused extremely large uids/gids to incorrectly
write the quota information to the data file at a truncated
value for a uint_t32 id value. The incorrect cast caused quota
files in this case to be around 4GB in size, with the correct cast
they can now be 131GB in size. Also related to PR kern/30958.
3) Check for what appear to be negative UIDs/GIDs and not account
for them. This prevents the quota files from becoming 131GB in
size and causing quotacheck to run forever at bootup. This could
also cause the kernel to try and expand the quota file, which might
deadlock due to the issue in #1. kern/30958 and kern/38156
(and some much older closed PR's).
4) With the deadlock problems gone, the kernel can now expand the
size of the quota database files if it needs to.
5) Pass in the i-node count change value to chkiq and chkiqchg as an
int, like it used to be before the common routine was split up
into 2 different routines to increase / decrease the i-node in-use
count. Prevents an underflow on the i-node count. Related
to PR kern/89247.
6) Prevent the block usage from growing slowly if a file system is
full and the write was denied due to that fact. PR kern/89247.
Some of these changes require an updated quotacheck to prevent
the creation of huge (131GB) quota data files (item #3).
#1/#4 probably fixes a lot of the random hangs when quotas are enabled,
possibly some of the jail hangs.
unlike documented may not take effect without an initialization. So
don't invoke (*sc_mediachange) directly in lance_mediachange() but
go through lance_init_locked(). It's suboptimal to impose this for
all chips but given that besides the affected PCI bus front-end the
only other front-end which supports media selection is and likely
ever will be the 'ledma' front-end I see not enough reason to break
the in-driver API for this (though one could argue both ways here).
the ipi settings. If NEEDRESCHED is set and an ipi is later delivered
it will clear it rather than cause extra context switches. However, if
we miss setting it we can have terrible latency.
- In sched_bind() correctly implement bind. Also be slightly more
tolerant of code which calls bind multiple times. However, we don't
change binding if another call is made with a different cpu. This
does not presently work with hwpmc which I believe should be changed.
front of isp_init so we can read NVRAM even if we're role ISP_NONE.
Prepare for reintroduction of channels (for FC) for N-Port
Virtualization.
Fix a botch in handle assignment that caused us to nuke one device
when a new one arrives and end up with two devices with the same
identity in the virtual target mapping table.
ifmedia_init() invocation. IFM_IMASK makes only sense here when all of
the maxium of 32 PHYs on each one MII bus support disjoint sets of media,
which generally isn't the case (though it would be nice if we had a way
to let NIC drivers indicate that for the few card models where the PHY
configuration is known/fixed and IFM_IMASK actually makes sense).
- Add and use a miibus_print_child() for the bus_print_child method which
additionally prints the PHY number (which actually is the PHY address)
so one can figure out the media instance <-> PHY number mapping from the
PHY driver attach output. This is intented to be usefull in situations
where the addresses of the PHYs on the bus are known (f.e. of internal/
integrated PHYs) so one can feed the appropriate media instance number
to ifconfig(8) (with the upcoming change for ifconfig(8)).
This is more or less inspired by the NetBSD mii_print().
multiple PHYs. In case some PHYs currently driven by ukphy(4) exhibit
problems when isolating due to incomplete implementations or silicon bugs
we'll need to add specific drivers for these. Looking at NetBSD and
OpenBSD I don't expect problems here though (quite the contrary; we still
seem to set MIIF_NOISOLATE without good reason in a bunch of PHY drivers).
- Fix a style(9) whitespace nit.
capability rather than hardcoded offsets for a particular card. While
I'm here, expand the constants some.
- Change the ahd(4) driver to use pci_find_extcap() to locate the PCI-X
capability to keep up with the first change.
Reviewed by: scottl, gibbs (earlier version)
- Switch back to direct modification of remote CPU run queues. This added
a lot of complexity with questionable gain. It's easy enough to
reimplement if it's shown to help on huge machines.
- Re-implement the old tdq_transfer() call as tdq_pickidle(). Change
sched_add() so we have selectable cpu choosers and simplify the logic
a bit here.
- Implement tdq_pickpri() as the new default cpu chooser. This algorithm
is similar to Solaris in that it tries to always run the threads with
the best priorities. It is actually slightly more complex than
solaris's algorithm because we also tend to favor the local cpu over
other cpus which has a boost in latency but also potentially enables
cache sharing between the waking thread and the woken thread.
- Add a bunch of tunables that can be used to measure effects of different
load balancing strategies. Most of these will go away once the
algorithm is more definite.
- Add a new mechanism to steal threads from busy cpus when we idle. This
is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The
threshold is the required length of a tdq's run queue before another
cpu will be able to steal runnable threads. This prevents most queue
imbalances that contribute the long latencies.
headers in .S directly rather than getting to their macros through
genassym.c/assym.s so there are less headers genassym.c has to be
kept in sync with.
While at it fix some stytle(9) bugs (indentation, prototype format,
sort headers, etc) and remove trailing whitespace.
that can be used to check whether receive data is ready, i.e. whether
the subsequent call of uart_poll() should return a char, and unlike
uart_poll() doesn't actually receive data.
- Remove the device-specific implementations of uart_poll() and implement
uart_poll() in terms of uart_getc() and the newly added uart_rxready()
in order to minimize code duplication.
- In sunkbd(4) take advantage of uart_rxready() and use it to implement
the polled mode part of sunkbd_check() so we don't need to buffer a
potentially read char in the softc.
- Fix some mis-indentation in sunkbd_read_char().
Discussed with: marcel
may also reflect a Fireplane/Safari or JBus bus (or a virtual bus which
in turn reflects a JBus bus or something like that...).
- In the both the sparc64 and sun4v bus_machdep.c use __FBSDID.
- Spell SBus the official way in comments.
- Replace hardcoded function names (all of which were actually outdated)
in panic and status strings with __func__.
- Fix whitespace nits.
hooks get their per hook rcvdata methods, and all functions are organized
corresponding to protocol stack model.
Submitted by: Alexander Motin <mav alkar.net>
Reviewed by: archie, julian
and friends along with all hacks required to implement them. None of
the drivers currently built (as part of GENERIC, LINT or modules) on
sparc64 or sun4v and none of those we might want to use there in
future uses them, AFAICT there actually never was a driver hooked up
to the sparc64 or sun4v build that correctly used these functions
(and it looks like that due to a bug read{b,w,l}()/write{b,w,l}() and
the other functions working on a memory handle never actually worked on
sun4v). All they ever were good for on sparc64 and sun4v was erroneously
dragging in dependencies on isa(4) in drivers like f.e. dpt(4), si(4)
and syscons(4) in source files that supposedly were bus-neutral and
hiding issues with drivers like f.e. ng_bt3c(4) that used these
functions with busses other than isa(4) and therefore couldn't work on
these platforms.
the newly added DEV_EISA. This is done so that these back-ends can
be compiled on platforms not providing in{b,w,l}()/out{b,w,l}() and
friends (but may wish to use them together with bus front-ends other
than the EISA one).
- Finally all splxx() are removed
- Count error fixed in mapping array which might
cause a wrong cumack generation.
- Invariants around panic for case D + printf when no invariants.
- one-to-one model race condition fixed by using
a pre-formed connection and then completing the
work so accept won't happen on a non-formed
association.
- Some additional paranoia checks in sctp_output.
- Locks that were missing in the accept code.
Approved by: gnn
to open() [1].
Improve locking for accessing session control structures [2].
Try to document (most likely harmless) races in the code [3].
Based on submission by: Intron (intron at intron ac) [1]
Reviewed by: jhb [2]
Discussed with: netchild, rwatson, jhb [3]
total size of all input reports is < 6.
PR: usb/106435
Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Approved by: emax (mentor)
MFC after: 3 days
PCI bus' one as the default one, and explicitely use the other one for
non-PCI devices.
This is needed because the PCI bus can only address 64MB of RAM, while some
IXP425 boards have 128MB or more, and most of the PCI drivers do not bother
providing the parent dma tag.
- Add a default parent dma tag, similar to what has been done for sparc64.
- Before invalidating the dcache in POSTREAD, save the bits which are in the
same cachelines than our buffers, but not part of it, and restore them after
the invalidation.
from just before extending a file. This has the desired effect
of keeping the write speed constant. And yes, that helps a lot
copying large files always at full speed now, and I have seen
improvements using benchmarks/bonnie.
Stolen from: NetBSD
Reviewed by: bde
bus hanging off from the Fireplane/Safari bus in some USIII machines.
This is part 3/4 of allowing creator(4) to work in these machines.
The little info needed on how to configure the bridge and to work
around the incorrect values contained in the `interrupts' properties
of its children were obtained form OpenSolaris.
The separate bus front-end was inherited from the OpenBSD creator(4),
which at that time had a mainbus(4) (for USI/II machines, which use
an UPA interconnection bus as the nexus) and an upa(4) (for USIII
machines, which use a subordinate/slave UPA bus hanging off from the
Fireplane/Safari interconnection bus) front-end. With FreeBSD and
newbus there is/will be no need to have two separate bus front-ends
for these busses, so we can easily coallapse the shared front-end
and the back-end into a single source file (note that the FreeBSD
creator_upa.c was misnomer anyway; based on what it actually attached
to that should have been creator_nexus.c), actually OpenBSD meanwhile
also has moved to a shared front-end and a single source file. Due
to the low-level console support creator.c also wasn't free from bus
related things before.
While at it, also split sys/sparc64/creator/creator.h into a
sys/dev/fb/creatorreg.h that only contains register macros and move
the structures to the top of sys/dev/fb/creator.c as suggested by
style(9) so creator(4) is no longer scattered over two directories.
- Use OF_decode_addr()/sparc64_fake_bustag() to obtain the bus tags and
handles for the low-level console support instead of hardcoding
support for AFB/FFB hanging off from nexus(4) only. This is part 2/4
of allowing creator(4) to work in USIII machines (which have a UPA
bus hanging off from the Fireplane/Safari bus reflected by the nexus),
which already makes it work as the low-level console there.
- Allocate resources in the bus attach routine regardless of whether
creator(4) is used as for the low-level console and thus the required
bus tags and handles have been already obtained or not so the resources
are marked as taken in the respective RMAN.
- For both obtaining the bus tags and handles for the low-level console
support as well as allocating the corresponding resources in the
regular bus attach routine don't bother to get all for the maximum of
24 register banks but only (for) the two tag/handle pairs required for
providing the video interface for syscons(4) support. If we can't
allocate the rest of them just limit the memory range accessible via
creator_fb_mmap() accordingly.
- Sanity check the memory range spanned by the first and last resources
and the resources in between as far as possible, as the XFree86/Xorg
sunffb(4) expects to be able to access the whole region, even though
the backing resources are actually non-continuous. Limit and check
the memory range accessible via creator_fb_mmap() accordingly.
- Reduce the size of buffers for OFW properties to what they actually
need to hold.
- Rename some tables to creator_<foo> for consistency.
- Also for the sizes in the creator_fb_mmap() mapping table entries use
macros for consistency, add macros for the remaining register banks
for completeness.
nexus (which might or might not reflect an UPA interconnection bus;
accordingly UPA_BUS_SPACE should be renamed to NEXUS_BUS_SPACE at a
later point) and subordinate/slave UPA busses. This is part 1/4 of
allowing creator(4) to work in USIII machines (which have a UPA bus
hanging off from the Fireplane/Safari bus reflected by the nexus).
operation as it ran out of free descriptors or if there are too many
segments in the first place, call bus_dmamap_unload() in order to
unload the already loaded segments.
For trying to map the defragmented mbuf (chain) in re_encap() this
introduces re_dma_map_desc() setting arg.rl_maxsegs to 0 as a new
failure mode. Previously we just ignored this case, corrupting our
view of the TX ring.
o In re_txeof():
- Don't clear IFF_DRV_OACTIVE unless there are at least 4 free TX
descriptors. Further down the road re_encap() will bail if there
aren't at least 4 free TX descriptors, causing re_start() to
abort and prepend the dequeued mbuf again so it makes no sense
to pretend we could process mbufs again when in fact we won't.
While at it replace this magic 4 with a macro RL_TX_DESC_THLD
throughout this driver.
- Don't cancel the watchdog timeout as soon as there's at least one
free TX descriptor but instead only if all descriptors have been
handled. It's perfectly normal, especially in the DEVICE_POLLING
case, that re_txeof() is called when only a part of the enqueued
TX descriptors have been handled, causing the watchdog to be
disarmed prematurely.
o In re_encap():
- If m_defrag() fails just drop the packet like other NIC drivers
do. This should only happen when there's a mbuf shortage, in which
case it was possible to end up with an IFQ full of packets which
couldn't be processed as they couldn't be defragmented as they
were taking up all the mbufs themselves. This includes adjusting
re_start() to not trying to prepend the mbuf (chain) if re_encap()
has freed it.
- Remove dupe initialization of members of struct rl_dmaload_arg to
values that didn't change since trying to process the fragmented
mbuf chain.
While at it remove an unused member from struct rl_dmaload_arg.
o In re_start() remove a abandoned, banal comment. The corresponding
code was moved to re_attach() some time ago.
With these changes re(4) now survives one day (until stopped) of
hammering out packets here.
Reviewed by: yongari
MFC after: 2 weeks
- Retire the PCI_SUB*_1 constants and don't try to read a subvendor ID out
of them. There isn't a standard subvendor ID field for PCI-PCI bridges.
Instead, the dword at offset 0x34 is actually mostly reserved except for
the LSB which is the capabilities pointer.
- Add support for the PCI-PCI bridge subvendor ID capability (13) and use
it to set the subvendor ID for PCI-PCI bridges.
MFC after: 1 month
functions. The idea is taken from OpenBSD.
- Set/clear jumbo frame configurations for bge(4).
- Re-add BCM5750 PHY workaround for bce(4), which was mistakenly removed
from the previous commit.
- Move some PHY bug detections from brgphy.c to if_bge.c.
- Do not penalize working PHYs.
- Re-arrange bge_flags roughly by their categories.
- Fix minor style(9) nits.
PR: kern/107257
Obtained from: OpenBSD
Tested by: Mike Hibler <mike at flux dot utah dot edu>
The code is modelled after cd9660, including support for simple read-ahead
courtesy of clustered read.
Fix udf_strategy to DTRT.
This change fixes sendfile(2) not to send out garbage.
Reviewed by: scottl
MFC after: 1 month
- Added a short time wait (not used yet) constant
- Corrected the type of the crc32c table (it was
unsigned long and really is a uint32_t
- Got rid of the user of MHeaders until they
are truely needed by lower layers.
- Fixed an initialization problem in the readq structure
(ordering was off).
- Found yet another collision bug when the random number
generator returns two numbers on one side (during a collision)
that are the same. Also added some tracking of cookies
that will go away when we know that we have the last collision
bug gone.
- Fixed an init bug for book_size_scale, that was causing
Early FR code to run when it should not.
- Fixed a flight size tracking bug that was associated with
Early FR but due to above bug also effected all FR's
- Fixed it so Max Burst also will apply to Fast Retransmit.
- Fixed a bug in the temporary logging code that allowed a
static log array overflow
- hashinit_flags is now used.
- Two last mcopym's were converted to the macro sctp_m_copym that
has always been used by all other places
- macro sctp_m_copym was converted to upper case.
- We now validate sinfo_flags on input (we did not before).
- Fixed a bug that prevented a user from sending data and immediately
shuting down with one send operation.
- Moved to use hashdestroy instead of free() in our macros.
- Fixed an init problem in our timed_wait vtag where we
did not fully initialize our time-wait blocks.
- Timer stops were re-positioned.
- A pcb cleanup method was added, however this probably will
not be used in BSD.. unless we make module loadable protocols
- I think this fixes the mysterious timer bug.. it was a
ordering of locks problem in the way we did timers. It
now conforms to the timeout(9) manual (except for the
_drain part, we had to do this a different way due
to locks).
- Fixed error return code so we get either CONNREUSED or CONNRESET
depending on where one is in progression
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
Approved by: gnn
Approved by: gnn
Add a new function hashinit_flags() which allows NOT-waiting
for memory (or waiting). The old hashinit() function now
calls hashinit_flags(..., HASH_WAITOK);
o eliminate assumptions that half/quarter rate channels on exist in 11a
o handle frequency mapping between hal and net80211; hal gives us freq's
in the range 2422..2437 that we remap
MFC after: 1 month
o add channel flag to enable freq <-> ieee channel # mapping (can
go away in the future when ieee number is precomputed)
o add mapping between 900mhz freq's and channel #'s that gives a
unique channel # for each half/quarter/full width channel
o remove assumptions that half/quarter rate channels on happen in 11a
o remove assumptions that all 11g channels are full width
o ensure ic_curchan is reset on mode change so changing the channel
list (e.g. on countrycode change) doesn't leave curchan set to an
invalid channel
There is still an issue with switching rate sets; to be fixed separately.
MFC after: 1 month
only support external PHYs (besides not connectable internal ones
which respond at the usual addresses, but which don't hurt if we
let them show up) and don't wedge when isolating PHYs. Actually,
this change special cases limiting PHYs to Am79C97{3,5,8}, for
which this driver doesn't implement swiching between the internal
and external PHYs, yet, and Am79C971, where isolating the external
PHY (at least in case it's a DP83840A) wedges the chip. Together
with sys/dev/mii/acphy.c rev. 1.21 this adds support for the
100baseFX port of AT-2700 series adaptors, which use two AC101,
one for the copper and one for the fibre port (there might be
variants which only use one PHY though).
- Fix a bug in the previous revision that prevented the address of
the used (external) PHY to be actually recorded.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
MFC after: 1 week
bridge if it doesn't pass MSI messages up correctly. We set the flag
in pcib_attach() if the device ID is disabled via a PCI quirk.
- Disable MSI for devices behind the AMD 8131 HT-PCIX bridge. Linux has
the same quirk.
Tested by: no one despite repeated calls for testers
laptops.
Tested by: [1] Lion G. <liontanker@hotmail.com>
[2] Pietro Cerutti <pietro.cerutti@gmail.com>
Specialized mixer initialization for STAC9221, much like STAC9220.
Tested by: Devon H. O'Dell
- Set MIIF_NOLOOP as loopback doesn't work with this PHY. The MIIF_NOLOOP
flag currently triggers nothing but hopefully will be respected by
mii_phy_setmedia() later on.
- Use MII_ANEGTICKS instead of 5.
- Remove an unused macro.
- Fix some whitespace nits.
MFC after: 1 week
- In exphy_service() for the MII_TICK case don't bother to check whether
the currently selected media is of type IFM_AUTO as auto-negotiation
doesn't need to be kicked anyway.
- Remove #if 0'ed unapplicable code.
- Fix some whitespace nits.
MFC after: 1 week
and thus the FX_DIS pin indicates fibre media. This is part 1/2 of
adding support for the 100baseFX interface/port of AT-2700 series
adaptors.
Idea from: NetBSD
MFC after: 1 week
indices when manually adding media. Some of these I've missed while
converting drivers to take advantage of said fuctions recently,
others where longstanding bugs.
- General style(9) cleanup -- white space, braces, line wraps, etc.
- Annotate a lack of synchronization the global route cache if the input
routine is invoked with parallelism.
- Remove unused debugging code.
routing:
- style(9) cleanup -- white space, braces, etc.
- Make include guards consistent with our more general naming
convention.
- Rearrange and complete forward structure declarations in at_extern.h,
remove testing of guards of various other include files to protect
function declarations.
This leaves an ifdef _KERNEL in at_var.h, but from inspection it seems
likely that this file is not actually safe for inclusion in user space
still. However, since it's not included from within src/ so this does
not appear to be an issue (ifconfig, etc, have migrated to the generic
cross-protocol ioctls for address operations).
etc, changes.
Remove a small amount of #if !defined(__FreeBSD__) code.
Add missing include guard for _NETATALK_AARP_H_.
Remove unneeded (and conflicting) extern prototype for aarptfree().
call, its semantics were unintentionally changed. It went from
returning the time state to returning 0 or -1. Since 0 means time
normal, and non-zero effectively only shows up around leap seconds,
this went unnoticed until now. At least unnoticed until someone was
trying to run a binary they didn't have source for and it was
misbehaving...
Submitted by: Judah Levine
MFC After: 2 weeks
members right. However, it also said it was aligned(1), which meant
that gcc generated really bad code. Mark this as aligned(4). This
makes things a little faster on arm (a couple percent), but also saves
about 30k on the size of the kernel for arm.
I talked about doing this with bde, but didn't check with him before
the commit, so I'm hesitant say 'reviewed by: bde'.
- Use printf() and device_printf() instead of log() in ichsmb(4).
- Create the mutex sooner during ichsmb(4) attach.
- Attach the interrupt handler later during ichsmb(4) attach to avoid
races.
- Don't try to set PCIM_CMD_PORTEN in ichsmb(4) attach as the PCI bus
driver does this already.
- Add locking to alpm(4), amdpm(4), amdsmb(4), intsmb(4), nfsmb(4), and
viapm(4).
- Axe ALPM_SMBIO_BASE_ADDR, it's not really safe to write arbitrary values
into BARs, and the PCI bus layer will allocate resources now if needed.
- Merge intpm(4) and intsmb(4) into just intsmb(4). Previously, intpm(4)
attached to the PCI device and created an intsmb(4) child. Now,
intsmb(4) just attaches to PCI directly.
- Change several intsmb functions to take a softc instead of a device_t
to make things simpler.
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.
MFC after: 2 weeks
preemptions when adjusting the priority of a thread that is on a run
queue. This was only observed when FULL_PREEMPTION was enabled.
Reported by: kris
Diagnosed by: ups
MFC after: 1 week
that piggybacks on bce_tick() callout.
- Instead of unconditionally resetting the controller, try to
skip the reset in case we got a pause frame, like em(4) did.
- Lock bce_tick() using callout_init_mtx().
Discussed with/Reviewed by: glebius, scottl, davidch
we actually issue preemptions.
- Remove the #ifdef IPI_PREEMPTION so it is always compiled in. Leave
the option which optionally enables support in sched_4bsd. sched_ule.c
will soon use this functionality as a run time rather than compile time
option.
- Compare against the idlethread rather than the priority. There are some
idle prio tasks that we can preempt.
Discussed with: ups
Tested on: i386, amd64
allocations were made using improper flags in interrupt context.
Replace with a simple WITNESS warning call. This restores the
invariant that M_WAITOK allocations will always succeed or die
horribly trying, which is relied on by many UMA consumers.
MFC after: 3 weeks
Discussed with: jhb
NOTES though, as ofw_syscons(4) doesn't properly interface with
syscons(4) regarding loading the font specified with SC_DFLT_FONT,
causing a kernel with both options SC_OFWFB and SC_NO_MODE_CHANGE
to not link.
not needed if the proper ordering is done in attach and shutdown.
Remove usage of if_timer/watchdog and roll my own by piggybacking
off the tick() function.
Use the new usb system to allocate task queues instead of using
the system wide thread for taskqueues.
for usb. I hope that this will eventually be used for generic devices
that need full fledged blocking threads for event processing.
Create a taskqueue:
void usb_ether_task_init(device_t, int, struct usb_taskqueue *);
Enqueue a task:
void usb_ether_task_enqueue(struct usb_taskqueue *, struct task *);
Wait for all tasks queued to complete:
void usb_ether_task_drain(struct usb_taskqueue *, struct task *);
Destroy the taskqueue:
void usb_ether_task_destroy(struct usb_taskqueue *);
University of Washington copyrights, which include the
advertising clause. Move $NetBSD$ into standard location for
FreeBSD source files, and normalize formatting.
MFC after: 3 days
the UCB license now excludes the advertising clause. I'm not
interested in it either, so move my copyright. This leaves
only a CGD copyright with the advertising clause.
MFC after: 3 days
the state machine clocks to INIT, node references are not reclaimed.
Add a new routine ieee80211_drain_ifq that does this and use it
instead of IF_DRAIN.
Submitted by: Sepherosa Ziehau
Obtained from: DragonFly
MFC after: 1 month
- Sort by date in license blocks, oldest copyright first.
- All rights reserved after all copyrights, not just the first.
- Use (c) to be consistent with other entries.
MFC after: 3 days
o add IEEE80211_F_JOIN flag to ieee80211_fix_rate to indicate a station
is joining a BSS; this is used to control whether or not we over-write
the basic rate bit in the calculated rate set
o fix ieee80211_fix_rate to honor IEEE80211_F_DODEL when IEEE80211_F_DONEGO
is not specified (e.g. when joining an ibss network)
o on sta join always delete unusable rates from the negotiated rate set,
this was being done only ibss networks but is also needed for 11g bss
with mixed stations
o on sta join delete unusable rates from the bss node's rate set, not the
scan table entry's rate set
o when calculating a rate set for new neighbors in an ibss caculate a
negotiated rate set so drivers are not presented with rates they should
not use
Submitted by: Sepherosa Ziehau (w/ modifications)
Obtained from: DragonFly
MFC after: 1 month
- Clear the PCI AFSR and status error bits as previous errors still
might be indicated.
- Set up the PCI control and diagnostic registers according to the
capabilities, workarounds, etc of/for specific revisions of the
supported bridges. This includes no longer setting Hummingbird-/
Sabre-specific bits in the PCI control register but preserving
what the firmware has initialized them to like OpenSolaris does.
Previously we were setting these bits according to the example in
the Sabre documentation, which I doubt is appropriate for all
Sabre based designs and especially not for Hummingbirds. This
also includes not enabling bus parking unless the firmware tells
us to.
- Set the PCI latency timer register as this isn't always done by
the firmware.
o Remove a redundant argument from psycho_set_intr() and in this
function check the return value of bus_setup_intr(). [2]
o Let psycho_setup_intr() return ENOMEM instead of 0 when it can't
allocate memory for the interrupt wrapper stub and EINVAL instead
of 0 if it can't find the interrupt vector in the interrupt map.
o Add a workaround for a bug of the Sabre-APB-combination where it
doesn't drain DMA write data for devices behind additional PCI-PCI
bridges underneath the APB PCI-PCI bridge. This workaround (do
things necessary in order to achieve a manual drain when coherency
is required) is currently implemented in psycho_setup_intr() and
psycho_intr_stub() (for easy MFC'ing) and therefore is only applied
for interrupt handlers. This should be moved to psycho(4)-specific
bus_dma_tag_create() and bus_dmamap_sync() methods, respectively,
once this driver is converted to make use of BUS_GET_DMA_TAG(), so
the workaround is also applied for polling(4) callbacks. [3]
o Fix some minor style issues.
Info from: OpenSolaris [1]
Info from: Linux, OpenBSD, OpenSolaris [3]
Suggested by: Coverity Prevent (CID 682) [2]
MFC after: 1 month
firmware (mainly 'pmu' and its 'lomp' dupe found in a couple of
later USII{e,i}-based machines) by checking whether a device with
the same triple of bus number, slot and function already has been
added. This is the simple yet effective approach introduced in
OpenBSD some time ago, but which has the flaw that it assumes
that the device and its dupe(s) found in the OFW device tree are
equal or at least the one encountered first is in some way the
more important one (this is the case with 'pmu' and 'lomp'; the
'pmu' node has couple of properties and children while the 'lomp'
one misses most of these). If there's ever a device/dupe pair
where we don't encounter the more important node first, we'll
probably need to introduce a quirk list in order to add the
desired device but prevent its dupe(s) from being added.
MFC after: 1 week
link state changes. Instead, build new speed/duplex/flow-control
settings from the values reported from PHY.
This should fix speed/duplex/flow-control mismatches between GMAC and
PHY which resulted in very poor Rx performance due to lots of
out-of-order packet delivery.
Reported by: Arno J. Klaassen <arno AT heho DOT snv DOT jussieu DOT fr>
Tested by: Arno J. Klaassen <arno AT heho DOT snv DOT jussieu DOT fr>
modern dual-core systems as well.
- Parse the _CST packages for each cpu and track all the states individually,
on a per-cpu basis.
- Revert to generic FADT/P_BLK based Cx control if the _CST package
is not present on all cpus. In that case, the new driver will
still support per-cpu Cx state handling. The driver will determine the
highest Cx level that can be supported by all the cpus and configure the
available Cx state based on that.
- Fixed the case where multiple cpus in the system share the same
registers for Cx state handling. To do that, added a new flag
parameter to the acpi_PkgGas and acpi_bus_alloc_gas functions that
enable the caller to add the RF_SHAREABLE flag. This flag could also be
useful to other callers (acpi_throttle?) in the tree but this change is
not yet made.
- For Core Duo cpus, both cores seems to be taken out of C3 state when
any one of the cores need to transition out. This broke the short sleep
detection logic. It is disabled now if there is more than one cpu in
the system for now as it fixed it in my case. This quirk may need to
be re-enabled later differently.
- Added support to control cx_lowest on a per-cpu basis. There is still
a generic cx_lowest to enable changing cx_lowest for all cpus with a single
sysctl and for ease of use. Sample output for the new sysctl:
dev.cpu.0.cx_supported: C1/1 C2/1 C3/57
dev.cpu.0.cx_lowest: C3
dev.cpu.0.cx_usage: 0.00% 43.16% 56.83%
dev.cpu.1.cx_supported: C1/1 C2/1 C3/57
dev.cpu.1.cx_lowest: C3
dev.cpu.1.cx_usage: 0.00% 45.65% 54.34%
hw.acpi.cpu.cx_lowest: C3
This work was done by Stephane E. Potvin with some simple reworking by
myself. Thank you.
Submitted by: Stephane E. Potvin <sepotvin / videotron.ca>
MFC after: 2 weeks
recording enabled some programs (audio/audacity from ports) can't
correctly enumerate all /dev/dsp device.
Note: previous commit did not enable some debugging stuff, my eyes did
misread "#undef" as "#define".
Submitted by: Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
Now (ok it's been a while...) that FreeBSD has RLIMIT_AS too, we can use
it in the linuxolator instead of ignoring it.
This fixes a LTP test.
Submitted by: rdivacky
No need to lock prison in a case of linux_use26 because the int
setting is atomic and process cannot leave jail.
Submitted by: kib
Reviewed by: jhb
Requested by: rdivacky
Dont lock em in a case of just using em->shared->group_pid because
the group_pid never changes.
Submitted by: rdivacky
Reviewed by: kib
Glanced at by: jhb
(due to an early reset or the like), remember to unlock the socket lock.
This will not occur in 7-CURRENT, but could in theory occur in 6-STABLE.
MFC after: 1 week
do not call markvoldirty() until the mount has been flagged as read-write.
Due to the nature of the msdosfs code, this bug only seemed to appear for
FAT-16 and FAT-32.
This fixes the testcase:
#!/bin/sh
dd if=/dev/zero bs=1m count=1 oseek=119 of=image.msdos
mdconfig -a -t vnode -f image.msdos
newfs_msdos -F 16 /dev/md0 fd120m
mount_msdosfs -o ro /dev/md0 /mnt
mount | grep md0
mount -u -o rw /dev/md0; echo $?
mount | grep md0
umount /mnt
mdconfig -d -u 0
PR: 105412
Tested by: Eugene Grosbein <eugen grosbein pp ru>
revision 1.98 is NOT merged, because FreeBSD does not support this
syntax.
revision 1.99 is NOT merged, "const poisoning" part is not applicable
to FreeBSD. There is no variable shadowing, GCC can't find
this one (but there are others)
revision 1.100 is NOT merged, because it was null patch (no changes)
revision 1.101 is NOT merged, there is no BIT() macro in FreeBSD
revision 1.102 is merged
revision 1.103 is partially merged. There is no ai.ifaceh in FreeBSD
revision 1.104 is NOT merged
revision 1.105 is merged
revision 1.106 is not merged, because of rev. 1.107
revision 1.107 is a backuout of 1.106
Submitted by: Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
---snip---
New features:
1. Optional multichannel recording (32 channels on Live!, 64 channels
on Audigy).
All channels are 16bit/48000Hz/mono, format is fixed.
Half of them are copied from sound output, another half can be
used to record any data from DSP. What should be recorded is
hardcoded in DSP code. In this version it records dummy data, but
can be used to record all DSP inputs, for example..
Because there are no support of more-than-stereo sound streams
multichannell stream is presented as one 32(64)*48000 Hz 16bit mono
stream.
Channel map:
SB Live! (4.0/5.1)
offset (words) substream
0x00 Front L
0x01 Front R
0x02 Digital Front L
0x03 Digital Front R
0x04 Digital Center
0x05 Digital Sub
0x06 Headphones L
0x07 Headphones R
0x08 Rear L
0x09 Rear R
0x0A ADC (multi-rate recording) L
0x0B ADC (multi-rate recording) R
0x0C unused
0x0D unused
0x0E unused
0x0F unused
0x10 Analog Center (Live! 5.1) / dummy (Live! 4.0)
0x11 Analog Sub (Live! 5.1) / dummy (Live! 4.0)
0x12..-0x1F dummy
Audigy / Audigy 2 / Audigy 2 Value / Audigy 4
offset (words) substream
0x00 Digital Front L
0x01 Digital Front R
0x02 Digital Center
0x03 Digital Sub
0x04 Digital Side L (7.1 cards) / Headphones L (5.1 cards)
0x05 Digital Side R (7.1 cards) / Headphones R (5.1 cards)
0x06 Digital Rear L
0x07 Digital Rear R
0x08 Front L
0x09 Front R
0x0A Center
0x0B Sub
0x0C Side L
0x0D Side R
0x0E Rear L
0x0F Rear R
0x10 output to AC97 input L (muted)
0x11 output to AC97 input R (muted)
0x12 unused
0x13 unused
0x14 unused
0x15 unused
0x16 ADC (multi-rate recording) L
0x17 ADC (multi-rate recording) R
0x18 unused
0x19 unused
0x1A unused
0x1B unused
0x1C unused
0x1D unused
0x1E unused
0x1F unused
0x20..0x3F dummy
Fixes:
1. Do not assign negative values to variables used to index emu_cards
array. This array was never accessed when index is negative, but
Alexander (netchild@) told me that Coverity does not like it.
After this change emu_cards[0] should never be used to identify
valid sound card.
2. Fix off-by-one errors in interrupt manager. Add more checks there.
3. Fixes to sound buffering code now allows driver to use large playback
buffers.
4. Fix memory allocation bug when multichannel recording is not
enabled.
5. Fix interrupt timeout when recording with low bitrate (8kHz).
Hardware:
1. Add one more known Audigy ZS card to list. Add two cards with
PCI IDs betwen old known cards and new one.
Other changes:
1. Do not use ALL CAPS in messages.
Incomplete code:
1. Automute S/PDIF when S/PDIF signal is lost.
Tested on i386 only, gcc 3.4.6 & gcc41/gcc42 (syntax only).
---snip---
This commits enables a little bit of debugging output when the driver is
loaded as a module. I did a cross-build test for amd64.
The code has some style issues, this will be addressed later.
The multichannel recording part is some work in progress to allow playing
around with it until the generic sound code is better able to handle
multichannel streams.
This is supposed to fix
CID: 171187
Found by: Coverity Prevent
Submitted by: Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
Bring the linux mmap code more into line with how linux (2.4.x) behaves.
Tested by: Scot Hetzel <swhetzel@gmail.com> on amd64 without PROT_EXEC
Additionally to the i386 version always use PROT_EXEC in the mapping like the
previous version of the amd64 code did. We need to examinate this further to
decide what the right thing to do is. For now this fixes several problems in
the LTP test runs and should behave regarding PROT_EXEC like before.
of max() when computing the divisor in SCHED_TICK_PRI(). This prevents
cases where rounding down would allow the quotient to exceed
SCHED_PRI_RANGE.
- Garbage collect some unused flags and fields.
- Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply
duplicated this functionality.
- Re-enable the rebalancer by default and fix the sysctl so it can be
modified.
marked idle, thus breaking cpu load balancing.
- Change sched_interact_update() to fix cases where the stored history
has expanded significantly rather than handling them in the callers. This
fixes a case where sched_priority() could compute a bad value.
- Add a sysctl to disable the global load balancer for experimentation.
server.
Don't complain about a hard loop id of 0xffff- we get this in
point-to-point topologies with the 2300 and 2K Login firmware.
Up the timeout on register FC4 types commands.
- Rename confusing AGP_INTEL_I845_MCHCFG to AGP_INTEL_I845_AGPM.
- Move E7205 and E7505 from i8x5 to i8x0 family. It probably worked
because the actual offset is the same.
In fact, all three families have the bit at the exact same place. Only
differences are name and width of the registers, i.e., NBXCFG (0x50, dword),
RDCR (0x51, byte), AGPM (0x51, byte), MCHCFG (0x50, word) depending on
the family of the chipsets.
sysctl and socket teardown by adding a reference count to the UNIX domain
pcb object and fixing the sysctl that enumerates unpcbs to grab a
reference on each unpcb while it builds the list to copy out to userland.
- Close a race between UNIX domain pcb garbage collection (unp_gc()) and
file descriptor teardown (fdrop()) by adding a new garbage collection
flag FWAIT. unp_gc() sets FWAIT while it walks the message buffers
in a UNIX domain socket looking for nested file descriptor references
and clears the flag when it is finished. fdrop() checks to see if the
flag is set on a file descriptor whose refcount just dropped to 0 and
waits for unp_gc() to clear the flag before completely destroying the
file descriptor.
MFC after: 1 week
Reviewed by: rwatson
Submitted by: ups
Hopefully makes the panics go away: mx1
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
exhausted so that there's at least a warning before a box that runs out
of swapzone space before running out of swap space deadlocks.
MFC after: 1 week
Reviwed by: alc
functions now more closely resemble similar functions in nullfs.
This also eliminates some errors.
Submitted by: daichi, Masanori OZAWA <ozawa ongs co jp>
setting ftick = ltick = ticks in schedinit().
- Update the priority when we are pulled off of the run queue and when we
are inserted onto the run queue so that it more accurately reflects our
present status. This is important for efficient priority propagation
functioning.
- Move the frequency test into sched_pctcpu_update() so we don't repeat it
each time we'd like to call it.
- Put some temporary work-around code in sched_priority() in case the tick
mechanism produces a bad priority. Eventually this should revert to an
assert again.
a spin mutex since it doesn't have an INTR_FAST interrupt handler.
Beyond that the driver is still under Giant anyway.
- Remove unneeded locking during attach across operations that can't be
called with locks held (such as bus_dma_tag_create()).
MFC after: 1 week
Not objected to by: scottl
the most recently chosen index. This significantly improves nice
behavior. This allows a lower priority thread to run some multiple of
times before the higher priority thread makes it to the front of
the queue. A nice +20 cpu hog now only gets ~5% of the cpu when running
with a nice 0 cpu hog and about 1.5% with a nice -20 hog. A nice
difference of 1 makes a 4% difference in cpu usage between two hogs.
- Track a seperate insert and removal index. When the removal index is
empty it is updated to point at the current insert index.
- Don't remove and re-add a thread to the runq when it is being adjusted
down in priority.
- Pull some conditional code out of sched_tick(). It's looking a bit
large now.
- Remove the double queue mechanism for timeshare threads. It was slow
due to excess cache lines in play, caused suboptimal scheduling behavior
with niced and other non-interactive processes, complicated priority
lending, etc.
- Use a circular queue with a floating starting index for timeshare threads.
Enforces fairness by moving the insertion point closer to threads with
worse priorities over time.
- Give interactive timeshare threads real-time user-space priorities and
place them on the realtime/ithd queue.
- Select non-interactive timeshare thread priorities based on their cpu
utilization over the last 10 seconds combined with the nice value. This
gives us more sane priorities and behavior in a loaded system as
compared to the old method of using the interactivity score. The
interactive score quickly hit a ceiling if threads were non-interactive
and penalized new hog threads.
- Use one slice size for all threads. The slice is not currently
dynamically set to adjust scheduling behavior of different threads.
- Add some new sysctls for scheduling parameters.
Bug fixes/Clean up:
- Fix zeroing of td_sched after initialization in sched_fork_thread() caused
by recent ksegrp removal.
- Fix KSE interactivity issues related to frequent forking and exiting of
kse threads. We simply disable the penalty for thread creation and exit
for kse threads.
- Cleanup the cpu estimator by using tickincr here as well. Keep ticks and
ltick/ftick in the same frequency. Previously ticks were stathz and
others were hz.
- Lots of new and updated comments.
- Many many others.
Tested on: up x86/amd64, 8way amd64.
- runq_add_pri allows the caller to position the thread at any rqindex
regardless of priority.
- runq_choose_from() chooses the lowest priority thread starting from a given
index. The index is updated with the rqindex of the chosen thread. This
routine is used to pick the lowest priority relative to a given index.
- runq_remove_idx() updates the index if the run queue that held the removed
thread is now empty.
start working with third party usb modules, where sometimes it
is not easy to set the inclusion order so that there are no multiple
inclusions, yet you want to compile with high WARNS levels).
I am not sure if there is a standard for having a leading and/or trailing _
in the macro name, the usb code seems to use both.
There are still several unprotected headers here so it might be useful
to do the same thing on other files as well as the need arises.
MFC After: 3 days
check length of the pathname in the range 0<=n<=NFS_MAXPATHLEN,
not 0<n<=NFS_MAXPATHLEN. This fixes a minor interoperability problem
that the FreeBSD NFS server did not allow a symlink pointing the empty
pathname.
MFC after: 1 week
mbuf. First moves toward being able to cope better with having layer 2 (or
other encapsulation data) before the IP header in the packet being examined.
More commits to come to round out this functionality. This commit should
have no practical effect but clears the way for what is coming.
Revirewed by: luigi, yar
MFC After: 2 weeks
been introduced to the MAC framework:
mpo_associate_nfsd_label
mpo_create_mbuf_from_firewall
mpo_check_system_nfsd
mpo_check_vnode_mmap_downgrade
mpo_check_vnode_mprotect
mpo_init_syncache_label
mpo_destroy_syncache_label
mpo_init_syncache_from_inpcb
mpo_create_mbuf_from_syncache
MFC after: 2 weeks [1]
[1] The syncache related entry points will NOT be MFCed as the changes in
the syncache subsystem are not present in RELENG_6 yet.
exclusive access if there is at least one thread waiting for it to
become available. This may significantly reduce overhead by reducing
the number of unnecessary wakeups issued whenever the framework becomes
idle.
Annotate that we still signal the CV more than necessary and should
fix this.
Obtained from: TrustedBSD Project
Reviewed by: csjp
Tested by: csjp
Redo the checking for 2.6 emulation. We now cache the value of
use26 and replace calls to linux_get_osrelease() + parsing with
a call to linux_use26(). Typical path is lockless now.
Pointed out by: kib
This allows to ship RELENG_7_0 with a default osrelease of 2.4.2 and the
possibility to enable 2.6.x emulation without the possible performance
impact of the previous version of the check.
Submitted by: rdivacky
- Micro-optimize the addition of an 802.1q header to match the removal code.
- Consistently check for interfaces being up and running.
- Consistently use NULL instead of 0 with pointers.
With the second (and last) part of my previous Summer of Code work, we get:
-ipfw's in kernel nat
-redirect_* and LSNAT support
General information about nat syntax and some examples are available
in the ipfw (8) man page. The redirect and LSNAT syntax are identical
to natd, so please refer to natd (8) man page.
To enable in kernel nat in rc.conf, two options were added:
o firewall_nat_enable: equivalent to natd_enable
o firewall_nat_interface: equivalent to natd_interface
Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet
to continue being checked by the firewall ruleset after being
(de)aliased.
NOTA BENE: due to some problems with libalias architecture, in kernel
nat won't work with TSO enabled nic, thus you have to disable TSO via
ifconfig (ifconfig foo0 -tso).
Approved by: glebius (mentor)
access plus timers. This makes the code
more portable and able to change out the
mbuf or timer system used more easily ;-)
b) removal of all use of pkt-hdr's until only
the places we need them (before ip_output routines).
c) remove a bunch of code not needed due to <b> aka
worrying about pkthdr's :-)
d) There was one last reorder problem it looks where
if a restart occur's and we release and relock (at
the point where we setup our alias vtag) we would
end up possibly getting the wrong TSN in place. The
code that fixed the TSN's just needed to be shifted
around BEFORE the release of the lock.. also code that
set the state (since this also could contribute).
Approved by: gnn
semantics.
- Stop testing bpf pointers for NULL. In some cases use
bpf_peers_present() and then call the function directly inside the
conditional block instead of the macro.
- For places where the entire conditional block is the macro, remove the
test and make the macro unconditional.
- Use BPF_MTAP() in if_pfsync on FreeBSD instead of an expanded version of
the old semantics.
Reviewed by: csjp (older version)
lookup early. This has some performance implications and should not be
enabled by default, but might help greatly in certain setups. After some
more testing this could be turned into a sysctl.
Tested by: avatar
LOR ids: 17, 24, 32, 46, 191 (conceptual)
MFC after: 6 weeks
MPLOCKED. The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors. The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
manipulation is visible to the subject process. Remove XXX comments
suggesting this.
Convert one XXX on a difference from Darwin into a note: it's not a
bug, it's a feature.
Obtained from: TrustedBSD Project
system calls on the amd64 architecture.
Some minor white space tweaks for consistency with other syscalls.master
files.
Obtained from: TrustedBSD Project
- Replace XXX with Note: in several cases where observations are made about
future functionality rather than problems or bugs.
- Remove an XXX comment about byte order and au_to_ip() -- IP headers must
be submitted in network byte order. Add a comment to this effect.
- Mention that we don't implement select/poll for /dev/audit.
Obtained from: TrustedBSD Project
kernel<->policy ABI version. Add a comment to the definition describing
it and listing known versions. Modify MAC_POLICY_SET() to reference the
current kernel version by name rather than by number.
Staticize mac_late, which is used only in mac_framework.c.
Obtained from: TrustedBSD Project
mac_framework.c Contains basic MAC Framework functions, policy
registration, sysinits, etc.
mac_syscalls.c Contains implementations of various MAC system calls,
including ENOSYS stubs when compiling without options
MAC.
Obtained from: TrustedBSD Project
consumes and implements, as well as the location of the framework and
policy modules.
Refactor MAC Framework versioning a bit so that the current ABI version can
be exported via a read-only sysctl.
Further update comments relating to locking/synchronization.
Update copyright to take into account these and other recent changes.
Obtained from: TrustedBSD Project
node would send every outgoing frame to the "compress" hook.
Packets received on the "compress" hook were expected to be
compressed and PROT_COMPD tag was put on them unconditionally.
After this commit an alternative compression mode can be set.
In this mode the node doesn't put the PROT_COMPD, the compressor
should put it itself. This is important for such kind of
compressors, that can submit uncompressed frames.
Before this commit, if the decompression is enabled, the ng_ppp(4)
node would send and incoming frame to the "decompress" hook
only if it has the PROT_COMPD proto tag on it.
After this commit an alternative decompression mode can be set.
In this mode the node sends all the incoming packets to the
decompression hook. This is important for such kind of compressors
that need uncompressed packets too, to keep their library in sync.
These new features will be used in new version of mpd4, and in new
compressor nodes.
Submitted by: Alexander Motin <mav alkar.net>
mainly involves removing all __CC_SUPPORTS___INLINE__ ifdefs. These
ifdefs are even less needed for amd64 than for i386, but the i386
atomic.h never had them. The ifdefs here were just an optimization
of obsolescent compatibility cruft (__inline) for a null set of
compilers. I think null sets of compilers should only be supported
in cases where this is more than an optimization, doesn't require
extensive ifdefs, and only involves not-so-obsolescent compatibility
cruft (plain inline here).
o mark 11g mode support on finding 11g or pure 11g (OFDM-only)
channels; was requiring pure 11g which caused some contortions
in drivers that manually setup their channel lists
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%. This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).
Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register. We converted to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax. Let
gcc promote %al in the few cases where this is needed.
Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
%al anywhere, so don't hard-code it in the constraints either.
Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
for the main output operand, although this is not required. The input
and output operands that use the "a" constraint are now decoupled, and
this makes things clearer except for the reason that the output register
is hard-coded. It is now just a hack to tell gcc that the input "a" has
been clobbered without increasing the number of operands.
o change handling of regdomain-related mib knobs so they can be set
post-attach: regdomain, countrycode, outdoor, and xchanmode; the
hal will not permit changing the regdomain but we expose it for now
o on regdomain/countrycode change recalculate the channel list and
push it to the net80211 layer (NB: looks to need more tweaking)
o setup rate tables for half/quarter rate channels
o honor half/quarter rate channel configs when changing channels
o honor half/quarter rate channel configs when setting the slot time
o use hack/nonstandard channel numbering scheme for the public safety
band to avoid overlapping 2.4G channels on dual-band cards
o remove setup of ic_sup_rates; the net80211 layer can do this for us
and it simplifies handling of half/quarter rate channels
Tested only in Public Safety Band with cards that have RF5112.
in the Public Safety Band):
o add channel flags to identify half/quarter-rate operation
o add rate sets (need to check spec on 4Mb/s in 1/4 rate)
o add if_media definitions for new rates
o split net80211 channel setup out into ieee80211_chan_init
o fixup ieee80211_mhz2ieee and ieee80211_ieee2mhz to understand half/quarter
rate channels: note we temporarily use a nonstandard/hack numbering that
avoids overlap with 2.4G channels because we don't (yet) have enough
state to identify and/or map overlapping channel sets
o fixup ieee80211_ifmedia_init so it can be called post attach and will
recalculate the channel list and associated state; this enables changing
channel-related state like the regulatory domain after attach (will be
needed for 802.11d support too)
o add ieee80211_get_suprates to return a reference to the supported rate
set for a given channel
o add 3, 4.5, and 27 MB/s tx rates to rate <-> media conversion routines
o const-poison channel arg to ieee80211_chan2mode
bge_intr(). Some of them are used in bge_poll(). Simplify by only
initializing these for polling mode and not toggling them when switching
modes. This also fixes missing synchronization with the coalescing
engine in the toggling.
Add a pointer to the relevant PR for future reference. The whole comment
will be OK to remove as soon as the general solution is applied.
PR: kern/105943
pmap on i386
- check for change in executable status in pmap_enter
- pmap_qenter and pmap_qremove only need to invalidate the range if one
of the pages has been referenced
- remove pmap_kenter/pmap_kremove as they were only used by pmap_qenter
and pmap_qremove
- in pmap_copy don't copy wired bit to destination pmap
- mpte was unused in pmap_enter_object - remove
- pmap_enter_quick_locked is not called on the kernel_pmap, remove check
- move pmap_remove_write specific logic out of tte_clear_phys_bit
- in pmap_protect check for removal of execute bit
- panic in the presence of a wired page in pmap_remove_all
- pmap_zero_range can call hwblkclr if offset is zero and size is PAGE_SIZE
- tte_clear_virt_bit is only used by pmap_change_wiring - thus it can be
greatly simplified
- pmap_invalidate_page need only be called in tte_clear_phys_bit if there
is a match with flags
- lock the pmap in tte_clear_phys_bit so that clearing the page bits is
atomic with invalidating the page
- these changes result in 100s reduction in buildworld from a malloc backed
disk to a malloc backed disk - ~2.5%
mbuf is dropped, to preserve the invariant in the PR_ADDR case.
Add a regression test to detect this condition, but do not hook it
up to the build for now.
PR: kern/38495
Submitted by: James Juran
Reviewed by: sam, rwatson
Obtained from: NetBSD
MFC after: 2 weeks
The problem was that I was acquiring the driver sx lock and then waiting
for a taskqueue to drain, however the taskqueue itself would try to
acquire the lock as well leading to a deadlock.
To fix the problem roll my own exclusive lock that allows for lock
cancellation. This is a normal exclusive lock, however if someone
marks it as "dead" then all waiters who request an error return will
get back an error instead of continuing to wait for the lock.
In this particular case, the shutdown and detach functions kill the
lock while the async task thread tries to acquire the lock but will
abort if the lock returns an error.
The other option was to drop the driver lock mid-detach and mid-shutdown,
mid-detach was a ok, however mid-shutdown was not.
While I'm here, fix a bug in what appears to be the mii link status
word in the softc going out to lunch. Explicitly set the status
word to 1 after initializing the mii. This would result in an interface
that would never respond to "if_start" requests as the mii interface
would always look down.
nmi handler is used to stop other processors, nmi hander calls trap(),
however, trap() now accepts a pointer rather than a reference, this was
changed by kmacy@.
non-extattr functions from vfs_extattr.c, and extattr functions from
vfs_syscalls.c.
Change copyright/license on vfs_extattr.c to my copyright/license on
the extended attribute implementation (from extattr.h).
Clean up includes a bit.
Obtained from: TrustedBSD Project
Framework and security modules, to src/sys/security/mac/mac_policy.h,
completing the removal of kernel-only MAC Framework include files from
src/sys/sys. Update the MAC Framework and MAC policy modules. Delete
the old mac_policy.h.
Third party policy modules will need similar updating.
Obtained from: TrustedBSD Project
return an error since it returns a count of battery devices in the system.
Set it to 0 explicitly, since it is the only switch branch that doesn't set
it.
# I guess no one uses it.
It always called MH_ALIGN for small lengths being
prepended (less than MHLEN). This meant that if you did
a prepend on a non M_PKTHDR the system would panic with
the KASSERT in MH_ALIGN. Instead we are not aware of
this and do a MH_ALIGN or M_ALIGN as appropriate.
Reviewed by: andre
Approved by: gnn
subsystems will be a property of policy modules, which may require
access control check entry points to be invoked even when not actively
enforcing (i.e., to track information flow without providing
protection).
Obtained from: TrustedBSD Project
Suggested by: Christopher dot Vance at sparta dot com
than from the slab, but don't.
Document mac_mbuf_to_label(), mac_copy_mbuf_tag().
Clean up white space/wrapping for other comments.
Obtained from: TrustedBSD Project
Exapnd comments on System V IPC labeling methods, which could use improved
consistency with respect to other object types.
Obtained from: TrustedBSD Project
the ifnet itself. The stack copy has been made while holding the mutex
protecting ifnet labels, so copying from the ifnet copy could result in
an inconsistent version being copied out.
Reported by: Todd.Miller@sparta.com
Obtained from: TrustedBSD Project
MFC after: 3 weeks
- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.
Add rudimentary IPC_INFO/MSG_INFO command support for linux_msgctl()
to pacify Linux ipcs(1). While I am here, add more bound checks
for linux_msgsnd() and linux_msgrcv().
copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and
it is done from its wrapper functions to support 32-bit emulations. After I
implemented this, I have briefly referenced NetBSD and Darwin. NetBSD passes
copyin()/copyout() function pointers from wrappers. Darwin passes size of
message type as an argument, which is actually similar to my first
implementation (P4 109706). We may revisit these implementations later.
would be able to work with aac(4).
This approach is used by some other drivers as well. However, we
need a more generic way to do this in order to avoid having to
special case headers in individual drivers for each platform.
Obtained from: Adaptec (version b11518)
Approved by: scottl
been handled instead of when at least one descriptor was just handled.
For bge, it is normal to get a txeof when only a small fraction of the
queued tx descriptors have been handled, so the bug broke the watchdog
in a usual case.
- moved the synchronizing bus read to after the bus write for the first
interrupt ack so that it actually synchronizes everything necessary.
We were acking not only the status update that triggered the interrupt
together with any status updates that occurred before we got around
to the bus write for the ack, but also any status updates that occur
after we do the bus write but before the write reaches the device.
The corresponding race for the second interrupt ack resulted in
sometimes returning from the interrupt handler with acked but
unserviced interrupt events. Such events then remain unserviced
until further events cause another interrupt or the watchdog times
out.
The race was often lost on my 5705, apparently since my 5705 has broken
event coalescing which causes a status update for almost every packet,
so another status update is quite likely to occur while the interrupt
handler is running. Watchdog timeouts weren't very noticeable,
apparently because bge_txeof() has one of the usual bugs resetting the
watchdog.
- don't disable device interrupts while bge_intr() is running. Doing this
just had the side effects of:
- entering a device mode in which different coalescing parameters apply.
Different coalescing parameters can be used to either inhibit or
enhance the chance of getting another status update while in the
interrupt handler. This feature is useless with the current
organization of the interrupt handler but might be useful with a
taskqueue handler.
- giving a race for ack+reenable/return. This cannot be handled
by simply rearranging the order of bus accesses like the race for
ack+keepenable/entry. It is necessary to sync the ack and then
check for new events.
- taking longer, especially with the extra code to avoid the race on
ack+reenable/return.
Reviewed by: ru, gleb, scottl
vnode v_flag. For cluster buffers this would result in dereferencing NULL
b_vp. To prevent the panic, cache relevant vnode flag before calling
bstrategy.
Reported by: Peter Holm, kris
Tested by: Peter Holm
Reviewed by: tegge
Pointy hat to: kib
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.
Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.
Tested on: Athlon64 X2 3800+, Dual Xeon 5130
re_watchdog() in order to avoid races accessing if_timer.
- Use bus_get_dma_tag() so re(4) works on platforms requiring it.
- Remove invalid BUS_DMA_ALLOCNOW when creating the parent DMA tag
and the tags that are used for static memory allocations.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
- Remove an unused variable in re_intr().
watchdog timer in dc_txeof() in case there are still unhandled
descriptors as dc_poll() invokes dc_poll() unconditionally.
Otherwise this would result in the watchdog timer constantly being
being reloaded and thus circumvent that the watchdog ever fires in
the DEVICE_POLLING case.
Pointed out by: bde
pmap.c, and is potentially the cause of hangs reported on machines with a
small amount of memory. On machines with sufficient RAM, and without a lot
of processes running, this situation would probably never occur.
Testing is still incomplete, but it is obviously wrong so remove the
offending code now.
The issue of what to do when both the primary and secondary hash overflow
is still open.
Reported by: Dan Kresja at windriver dot com, via alc
This macro was written expecting a 32-bit unsigned long, and
doesn't work properly on 64-bit systems. This bug caused vn_stat()
to return incorrect values for files larger than 2gb on msdosfs filesystems
on 64-bit systems.
PR: 106703
Submitted by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days
This bug caused vn_stat() to fail on files larger than 2gb on msdosfs
filesystems on AMD64.
PR: 106703
Tested by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days
- Do not repeatedly read vendor/device IDs while probing.
- Remove redundant bzero(3) for softc. device_get_softc(9) does it for free[1].
Reviewed by: glebius
Suggested by: glebius[1]
aches as a read-only file. In a number of cases this has led to
compiles failing- usually due to some strange NFS drift which thinks
that the opt_ah.h in the compile directory is out of date wrt the
source it is copied from. When the copy is executed again, it fails
because the target is read-only. Oops. Modify the compile hooks
avoid this.
Discussed with a while back with: Sam Leffler
- If we want mii_phy_add_media() to add 1000baseT media, we need to
supply sc->mii_extcapabilities.
- Fix formatting when announcing autonegotiation support.
usage to conform to that of tl0_trap - the separate code path
for unaligned faults was never getting used (and evidently doesn't
work), so ifdef out for now
Because accessing ID registers in rtl81x9 needs 32bit register access
and RL_IDR4/RL_IDR5 registers are reservered registers bzero() is
needed before copying ethernet address.
This fixes unaligned memory accesses panic in sparc64.
PR: kern/106801
MFC after: 3 days
as if they were really passed by reference. Specifically, the dead stores
elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior
on which FreeBSD relied. This change brings FreeBSD up to date by switching
trap frames to being explicitly passed by reference.
Reviewed by: kan
Tested by: kan
passed by value (trap frames) as if they were in fact being passed by
reference. For better or worse, this incorrect behaviour is no longer
present in gcc 4.1. In this patch I convert all trapframe arguments to
be explicitly pass by reference. I also remove vm86_initflags, pushing
the very little work that it actually does up into vm86_prepcall.
Reviewed by: kan
Tested by: kan
- The PCPU usage was to ensure that there were no faults on the stack while
the tte_hash_bucket lock was held - but this can be avoided by making sure
the address on the stack is already referenced.
- PCPU removal obviates the need for critical_{enter, exit}
- in trying to avoid nested brackets and #ifdef INVARIANTS around i at the
top, I broke booting for INVARIANTS all together :-(
- the cleanest fix is to simply assign to sq twice if INVARIANTS is enabled
- tested both with and without INVARIANTS :-/
after we perform the operations to delete the export,
call vfs_deleteopt() to delete the "export" mount option from
the linked list of mount options associated with that mount point.
This fixes one scenario:
- put a filesystem in /etc/exports to export it
- remove the filesystem from /etc/exports to delete the export and restart
mountd
- try to do a "mount -u -o ro" or "mount -u -o rw" on that filesystem
now that it is no longer exported.
arguments to fail. The mode field for shmget() appears to have undefined
meaning in the context of an already-present IPC object, but applications
appear to assume any arbitrary passed value will be ignored. I had hoped
to revisit this more quickly, but am removing the change for now to
prevent toe-stubbing.
Reported by: JAroslav Suchanek <jarda at grisoft dot cz>
PR: kern/106078
- rename skip_utrap to tl0_skip_utrap to indicate its use by the fill trap fault handler
- handle a null kstack by switching to the idle threads stack and then going to trap
- correctly handle a unaligned or unmapped stack during a fill trap
- save off some extra data in the pcpu pad in ptl1_panic
- add an assert that PCB is valid in vm_machdep.c
- add cnt_hold cnt_lock support for spin mutexes
- make sure contested is initialized to zero to only bump contested when appropriate
- move initialization function to kern_mutex.c to avoid cyclic dependency between
mutex.h and lock_profile.h
behave as expected.
Also:
- Return an error if WD_PASSIVE is passed in to the ioctl as only
WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an
explanation of the difference between WD_ACTIVE and WD_PASSIVE.
- Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've
lost your sense of humor, than don't add a define.
Specific changes:
i80321_wdog.c
Don't roll your own passive watchdog tickle as this would defeat the
purpose of an active (userland) watchdog tickle.
ichwd.c / ipmi.c:
WD_ACTIVE means active patting of the watchdog by a userland process,
not whether the watchdog is active. See sys/watchdog.h.
kern_clock.c:
(software watchdog) Remove a check for WD_ACTIVE as this does not make
sense here. This reverts r1.181.
o fixed a comment
o made in kernel libalias a bit less verbose (disabled automatic
logging everytime a new link is added or deleted)
Approved by: glebius (mentor)
work:
- A new PCI quirk (PCI_QUIRK_DISABLE_MSI) is added to the quirk table.
- A new pci_msi_device_blacklisted() determines if a passed in device
matches an MSI quirk in the quirk table. This can be overridden (all
quirks ignored) by setting the hw.pci.honor_msi_blacklist to 0.
- A global blacklist check is performed in the MI PCI bus code by checking
to see if the device at 0:0:0 is blacklisted.
Tested by: jdp
1) s/mi/mfi/ in FreeBSD ioctl path
2) add in "\n" on various failure messages
3) cap the length of time to abort an AEN command
4) fix passing sense data back to user to make Dell's Linux firmware
upgrade tool happy.
5) bump the MFI_POLL_TIMEOUT_SECS from 10s to 50s since the
firmware flash command can take ~40s to return.
This is some clean-up and enables RAID firmware to updated via Dell's
tool. Note Dell's tool requires the updates to the Linux emulator
that has been done in -current with TLS etc.
I need to discuss with scottl how to better submit mfi commands to
the firmware via the ioctl path so we don't do it in polled mode.
2) Fix all "magic numbers" to be constants.
3) A collision case that would generate two associations to
the same peer due to a missing lock is fixed.
4) Added tracking of where timers are stopped.
Approved by: gnn
by vnode. Allow for md thread and the thread that owns lock on vnode
backing the md device to do the write even when runningbufspace is
exhausted.
Tested by: Peter Holm
Reviewed by: tegge
MFC after: 2 weeks
have been added erroneously, and it causes problems on some chips. A larger
change is needed to do this write at a more appropriate place, but that
change requires reworking the ASF logic. That will be worked on in the
future.
Submitted by: Bruce Evans
o no more ds_vdata in tx/rx descriptors
o split h/w tx/rx descriptor from s/w status
o as part of the descriptor split change the rate control module api
so the ath_buf is passed in to the module so it can fetch both
descriptor and status information as needed
o add some const poisoning
Also for sample rate control algorithm:
o split debug msgs (node, rate, any)
o uniformly bounds check rate indices (and in some cases correct checks)
o move array index ops to after bounds checking
o use final tsi from the status block instead of the h/w descriptor
o replace h/w descriptor struct's with proper mask+shift defs (this
doesn't belong here; everything is known by the driver and should
just be sent down so there's no h/w-specific knowledge)
MFC after: 1 month
o remove os-specific glue code; it's now the responsibility of
the driver
o add wackelf utility for patching the ELF magic number on arm
builds since noone can agree on how to mark a .o file as not
having any floating point instructions
o remove radar/dfs-related entry points; folks have finally
decided how to support dfs w/o polluting the hal
o properly recognize AR2424 chips (they were being rejected on
attach despite being fully supported)
o add HAL_CAP_RXORN_FATAL capability to control how RXORN errors
are handled; previously RXORN was always treated as fatal because
older chips required a reset; now we do not treat it as fatal
for "newer chips" (noone seems to know what the cutoff is so
this capability can be used to override the current guestimate)
o HAL_CAP_RXTSTAMP_PREC capability to export the number of bits
of precision for timestamp data returned in the rx descriptor
o remove public exposure of the compression buffer; it is chip
specific and never belonged in the public view
o change definition of HAL_INT_GLOBAL from an enum member to a
#define to workaround compilers that bitch about enum values
that appear overflow 31 bits
o add support for newer chips that can store the tkip mic key
together with the cipher key in a single key cache entry
o split tx/rx descriptor into a h/w section and a s/w portion;
this permits storing the s/w area in cached memory when the
h/w area is stored in uncached memory; this also shrinks
memory use since only one status block is needed while multiple
tx/rx descriptors may be required per frame
o add final transmit series index to the transmit descriptor status
so rate control algorithms don't need to grovel through h/w state
to find it
o remove ds_vdata field from the descriptor state as part of the
radar changes
o fix excessive stack usage for some 5212 rf backends
o correct rfkill handling when the pin polarity is 0 true
o correct handling of tsf wrap when reading 64-bit values
MFC after: 1 month
kernel. This LOR snuck in with some of the recent syncache changes. To
fix this, the inpcb handling was changed:
- Hang a MAC label off the syncache object
- When the syncache entry is initially created, we pickup the PCB lock
is held because we extract information from it while initializing the
syncache entry. While we do this, copy the MAC label associated with
the PCB and use it for the syncache entry.
- When the packet is transmitted, copy the label from the syncache entry
to the mbuf so it can be processed by security policies which analyze
mbuf labels.
This change required that the MAC framework be extended to support the
label copy operations from the PCB to the syncache entry, and then from
the syncache entry to the mbuf.
These functions really should be referencing the syncache structure instead
of the label. However, due to some of the complexities associated with
exposing this syncache structure we operate directly on it's label pointer.
This should be OK since we aren't making any access control decisions within
this code directly, we are merely allocating and copying label storage so
we can properly initialize mbuf labels for any packets the syncache code
might create.
This also has a nice side effect of caching. Prior to this change, the
PCB would be looked up/locked for each packet transmitted. Now the label
is cached at the time the syncache entry is initialized.
Submitted by: andre [1]
Discussed with: rwatson
[1] andre submitted the tcp_syncache.c changes
controller. Due to lack of documentation, this driver is based on the
code from sk(4) and Marvell's myk(4) driver for FreeBSD. I've also
adopted the OpenBSD interface name, msk(4) in order to reduce naming
differences between BSDs.
The msk(4) driver supports the following Gigabit Ethernet adapters.
o SysKonnect SK-9Sxx Gigabit Ethernet
o SysKonnect SK-9Exx Gigabit Ethernet
o Marvell Yukon 88E8021CU Gigabit Ethernet
o Marvell Yukon 88E8021 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8022CU Gigabit Ethernet
o Marvell Yukon 88E8022 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8061CU Gigabit Ethernet
o Marvell Yukon 88E8061 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8062CU Gigabit Ethernet
o Marvell Yukon 88E8062 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8035 Gigabit Ethernet
o Marvell Yukon 88E8036 Gigabit Ethernet
o Marvell Yukon 88E8038 Gigabit Ethernet
o Marvell Yukon 88E8050 Gigabit Ethernet
o Marvell Yukon 88E8052 Gigabit Ethernet
o Marvell Yukon 88E8053 Gigabit Ethernet
o Marvell Yukon 88E8055 Gigabit Ethernet
o Marvell Yukon 88E8056 Gigabit Ethernet
o D-Link 550SX Gigabit Ethernet
o D-Link 560T Gigabit Ethernet
Unlike OpenBSD/NetBSD msk(4), the msk(4) driver supports all hardware
features including TCP/UDP checksum offload for transmit, MSI, TCP
segmentation offload(TSO), hardware VLAN tag stripping/insertion,
and jumbo frames(up to 9022 bytes). The only unsupported hardware
feature except RLMT is Rx checksum offload which I don't know how to
make it work reliably.
Known Issues:
It seems msk(4) does not work on the second port of dual port NIC.
(The first port works without problems.)
Thanks to Marvell for releasing the BSD licensed myk(4) driver and
thanks to all users helped fixing bugs.
Tested by: bz, philip, bms,
YAMAMOTO Shigeru < shigeru AT iij DOT ad DOT jp >,
Dmitry Pryanishnikov < dmitry AT atlantis DOT dp DOT ua >,
Jia-Shiun Li < jiashiun AT gmail DOT com >,
David Duchscher < daved AT tamu DOT edu >,
Arno J. Klaassen < arno AT heho DOT snv DOT jussieu DOT fr>,
Nicolae Namolovan < adrenalinup AT gmail DOT com>,
Andre Guibert de Bruet < andy AT siliconlandmark DOT com >
current ML
Tested on: i386, amd64
subtypes of HT capabilities.
- Add constants for the MSI mapping window HT PCI capability.
- On i386 and amd64, enable the MSI mapping window on any HT bridges we
encounter and report any non-standard mapping window addresses.
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
This is the "+ one more change" missed in the original commit.
Noticed by: tinderbox
Pointy hat to: me (#1)
In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
- Use the appropriate register writing method when reseting the chip
- Program the descriptor DMA engine correctly.
- More reliably detect certain chips and their features.
Also add some low-level debugging tools to help future work on this driver.
Submitted by: David Christenson (proof of concept changes)
Sponsored by: www.UIA.net