The 6105M and 6102 does not have the DWORD alignment problem, so
don't m_defrag() every packet in the transmit path for those.
More stringent usage of tx-descriptor ring and its flags.
Tested on 6102 and 6105M, other chips may also be able to run
without the m_defrag() but I have neither hardware nor docs to
find out.
Sponsored by: Soekris Engineering
use to synchornize and protect all data objects that are used for that
SIM. Drivers that are not yet MPSAFE register Giant and operate as
usual. RIght now, no drivers are MPSAFE, though a few will be changed
in the coming week as this work settles down.
The driver API has changed, so all CAM drivers will need to be recompiled.
The userland API has not changed, so tools like camcontrol do not need to
be recompiled.
implement robust version of m_collapse
add support for sf_buf
add fix for m_iovappend
add calls to m_sanity under INVARIANTS
fix m_freem_vec to correctly travese the mbuf iovec chain
Yukon II generated corrupted TCP checksum for short TCP packets
that's less than 60 bytes in size(e.g. window probe packet, pure ACK
packet etc). Padding the frame with zeros to make the frame minimum
ethernet frame size didn't work at all. Instead of dropping Tx
checksum offload support we calculate TCP checksum with S/W method
when we encounter short TCP frames.
Fortunately it seems that short UDP datagrams appear to be handled
correctly by Yukon II.
While I'm here simplify ethernet/VLAN header size calculation logic.
PR: 111384
anymore. Previously it tried to access interrupt register to disable
interrupts which could result in hang if the hardware was not
properly initialized by system BIOS/ACPI.
Tested by: Benjamin Hansmann (benjamin.hansmann AT rub dot de)
MFC after: 3 days
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).
Tested by: kris
Discussed with: jhb, kris, attilio, jeff
Defer mbuf allocation and initialization until after data has already been
received in a cluster
This reduces cpu utilization somewhat, but it only improves the rx path.
Recent changes to TCP appear to make us rate limited by the TX path.
This is the first step in reducing mbuf management overhead for manipulating
clusters.
MFC after: 3 days
been defragged and had their headers in the same cluster as their
payload would be fed to the NIC in header-sized chunks, and would
likely exceed the number of available transmit descriptors.
- If a TSO frame exceeds the number of available transmit descriptors,
don't leak busdmma resources when freeing it.
Sponsored by: Myricom Inc.
in the putc() method. Likewise, in the getc() method, don't check for
received characters with an interval defined in terms of the baudrate.
In both cases it works equally well to implement a fixed delay. More
importantly, it avoids calculating a delay that's roughly 1/10th the
time it takes to send/receive a character. The calculation is costly
and happens for every character sent or received, affecting low-level
console or debug port performance significantly. Secondly, when the
RCLK is not available or unreliable, the delays could disrupt normal
operation.
The fixed delay is 1/10th the time it takes to send a character at
230400 bps.
it obtained through the uart_class structure. This allows us
to declare the uart_class structure as weak and as such allows
us to reference it even when it's not compiled-in.
It also allows is to get the uart_ops structure by name, which
makes it possible to implement the dt tag handling in uart_getenv().
The side-effect of all this is that we're using the uart_class
structure more consistently which means that we now also have
access to the size of the bus space block needed by the hardware
when we map the bus space, eliminating any hardcoding.
execution should help us avoiding potential deadlock and illegal locking
while sleeping in various mixer -> usb calls. To enable it, use
hint.uaudio.%d.async="1" or sysctl dev.uaudio.%d.async=1. Default is
disable, to remain compatible with old behaviour (with slight risk of
potential deadlock).
When the linux port changes were imported which split the
target command list to be separate from the initiator command
list and the handle format changed to encode a type in the handle
the implications to the function isp_handle_index (which only
the NetBSD/OpenBSD/FreeBSD ports use) were overlooked.
The fault is twofold: first, the index into the DMA maps
in isp_pci is wrong because a target command handle with
the type bit left in place caused a bad index (and panic)
into dma map. Secondly, the assumption of the array
of DMA maps in either PCS or SBUS attachment structures is
that there is a linear mapping between handle index and
DMA map index. This can no longer be true if there are
overlapping index spaces for initiator mode and target
mode commands.
These changes bandaid around the problem by forcing us
to not have simultaneous dual roles and doing the appropriate
masking to make sure things are indexed correctly. A longer
term fix is being devloped.
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.
A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.
Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.
The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.
The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.
The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.
MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd
that the driver clock is identical to the processor or bus clock.
This is the case for the PowerQUICC processor. When the clock is
high enough, overflows happen in the calculation of the time it
takes to send 1/10 of a character, used in delay loops. Fix the
overflows so as to fix bugs in the delay loops that can cause either
insufficient delays or excessive delays.
system devices (i.e. console, debug port or keyboard), don't stop
after the first match. Find them all and keep track of the last.
The reason for this change is that the low-level console is always
added to the list of system devices first, with other devices added
later. Since new devices are added to the list at the head, we have
the console always at the end. When a debug port is using the same
UART as the console, we would previously mark the "newbus" UART as
a debug port instead of as a console. This would later result in a
panic because no "newbus" device was associated with the console.
By matching all possible system devices we would mark the "newbus"
UART as a console and not as a debug port.
While it is arguably better to be able to mark a "newbus" UART as
both console and debug port, this fix is lightweight and allows
a single UART to be used as the console as well as a debug port
with only the aesthetic bug of not telling the user about it also
being a debug port.
Now that we match all possible system devices, update the rclk of
the system devices with the rclk that was obtained through the
bus attachment. It is generally true that clock information is
more reliable when obtained from the parent bus than by means of
some hardcoded or assumed value used early in the boot. This by
virtue of having more context information.
MFC after: 1 month
by driver backends to mark individual channels as enabled or not.
The default implementation of this method always mark channels as
enabled.
This method is currently not used, but is added with the PowerQUICC
in mind where the 2nd SCC channel can be disabled.
watchdog might hide the succesful arming of an earlier one. Accept that on
failing to arm any watchdog (because of non-supported timeouts) EOPNOTSUPP is
returned instead of the more appropriate EINVAL.
MFC after: 3 days
When submitting rx buffers and not using WC fifo, always replace the
invalid DMA address with the real one, otherwise allocation failures
could lead to the invalid DMA address being given to the NIC, and
that would cause the receive side to lockup.
it via pci_get_vpd_*() rather than always reading it for each device during
boot. I've left the tunable so that it can still be turned off if a device
driver causes a lockup via a query to a broken device, but devices whose
drivers do not use VPD (the vast majority) should no longer result in
lockups during boot, and most folks should not need to tweak the tunable
now.
Tested on: bge(4)
Silence from: jmg
one (hardware & global lock). This should address witness complaints that
a duplicate mutex is being acquired. Be sure to free the mutex to fix a
potential memory leak.
MFC after: 3 days
some devices (and not others). To get instances onto the iicbus, one
now needs hints or an identify routine. We also do not probe the bus
for devices because many iic devices cannot be safely probed (and when
they can, the probe order turns out to be somewhat difficult to get
right).
# I'm not 100% sure that the iicsmb removal is right. Please contact me if
# this causes difficulty.
- Change exca_activate_resource() to call BUS_ACTIVATE_RESOURCE() before
calling exca_(io|mem)_map() since the latter use rman_get_bus(tag|handle)
and the recent changes to nexus(4) mean that you need to activate a
resource before reading the bus tag and handle. This was true before,
but now the nexus(4) drivers on x86 and ia64 are more forceful about it.
Reviewed by: imp