appeared to rely on all kinds of non-guaranteed behaviours: the
transfer abort code assumed that TDs with no interrupt timeout
configured would end up on the done queue within 20ms, the done
queue processing assumed that all TDs from a transfer would appear
at the same time, and there were access-after-free bugs triggered
on failed transfers.
Attempt to fix these problems by the following changes:
- Use a maximum (6-frame) interrupt delay instead of no interrupt
delay to ensure that the 20ms wait in ohci_abort_xfer() is enough
for the TDs to have been taken off the hardware done queue.
- Defer cancellation of timeouts and freeing of TDs until we either
hit an error or reach the final TD.
- Remove TDs from the done queue before freeing them so that it
is safe to continue traversing the done queue.
This appears to fix a hang that was reproducable with revision 1.67
or 1.68 of ulpt.c (earlier revisions had a different transfer
pattern). With certain HP printers, the command "true > /dev/ulpt0"
would cause ohci_add_done() to spin because the done queue had a
loop. The list corruption was caused by a 3-TD transfer where the
first TD completed but remained on the internal host controller
done queue because it had no interrupt timeout. When the transfer
timed out, the TD got freed and reused, so it caused a loop in the
done queue when it was inserted a second time from a different
transfer.
Reported by: Alex Pivovarov
MFC after: 1 week
transfer, which lead to panics or page faults. For example if a
transfer timed out, another thread could come along and attempt to
abort the same transfer while the timeout task was sleeping in
the *_abort_xfer() function.
Add an "aborting" flag to the private transfer state in each host
controller driver and use this to ensure that the abort is only
executed once. Also prioritise normal abort requests over timeouts
so that the callback is always given a status of USB_CANCELLED even
if the timeout-initiated abort began first.
The crashes caused by this bug were mainly reported in connection
with lpd printing to a USB printer.
PR: usb/78208, usb/78986
to better keep track of the total amoutn transferred during a
transfer. Seems similar to some code in the NetBSD version.
I notice they have incorporated matches from him so I don't know which
direction it went.
Submitted by: damien.bergamini@free.fr
Obtained from: patches to make the ueagle driver work
MFC after: 1 week
backed out commits were trying to address: when cancelling the timeout
callout, also cancel the abort_task event, since it is possible that
the timeout has already fired and set up an abort_task.
reports of problems. The bug is probably that there are cases where
`xfer->timeout && !sc->sc_bus.use_polling' is not a suitable test
for an active timeout callout, so an explicit flag will be necessary.
Apologies for the breakage.
transfer timeouts that typically cause a transfer to be completed
twice, resulting in panics and page faults:
o A transfer completion interrupt could arrive while an abort_task
event was set up, so the transfer would be aborted after it had
completed. This is very easy to reproduce. Fix this by setting
the transfer status to USBD_TIMEOUT before scheduling the
abort_task so that the transfer completion code will ignore it.
o The transfer completion code could execute concurrently with the
timeout callout, leaving the callout blocked (e.g. waiting for
Giant) while the transfer completion code runs. In this case,
callout_stop() does not prevent the callout from running, so
again the timeout code would run after the transfer was complete.
Handle this case by checking the return value from callout_stop(),
and ignoring the transfer if the callout could not be removed.
o Finally, protect against a timeout callout occurring while a
transfer is being aborted by another process. Here we arrange
for the timeout processing to ignore the transfer, and use
callout_drain() to ensure that the callout has really gone before
completing the transfer.
This was tested by repeatedly performing USB transfers with a timeout
set to approximately the same as the normal transfer completion
time. In the PR below, apparently this occurred by accident with a
particular printer and the default timeout.
PR: kern/71491
ohci.c (1.147), author: mycroft
Failure to properly mask off UE_DIR_IN from the endpoint address
was causing OHCI_ED_FORMAT_ISO and EHCI_QH_HRECL to get set
spuriously, causing rather interesting lossage.
Suddenly I get MUCH better performance with ehci...
ohci.c (1.148), author: mycroft
Adjust a couple of comments to make it clear WTF is going on.
Obtained from: NetBSD
broken BIOS. Separate ohci_controller_init() from ohci_init(),
and call ohci_controller_init() at resume process once more.
Discussed on [bsd-nomads:16737] - [bsd-nomads:16746].
Submitted by Hiroyuki Aizu <eyes@navi.org> [bsd-nomads:16741]
methods for USB devices in the same way of uhci driver. But this change
is not complete because some ohci controlers are not initialized completely.
So "kernel: usb0: 1 scheduling overruns" interrupt will generate many times.
This change will be same one in PR kern/60099.
Discussed on [bsd-nomads:16737] - [bsd-nomads:16746].
transfer descriptors when a large request needs to be split into
more than one 8k chunk. The bug was that the calculation did not
take into account the offset of the chunk within the overall request.
This is reported to fix crashes and data corruption on ohci
controllers.
Submitted by: green
Approved by: re
revision 1.142
date: 2003/10/11 03:04:26; author: toshii
Fix a done list handling bug which exhibits under high shared
interrupt rate and bus traffic. As the interrupt register is
read after checking hcca_done_head, there was a small chance
of dropping a done list. Ignore OHCI_WDH interrupt bit if
hcca_done_head is zero so that OHCI_WDH is processed later.
revision 1.141
date: 2003/09/10 20:08:29; author: mycroft;
Update actlen even in the case where a TD returns an error --
this is critical for the umass bulk-only STALL case.
- remove the unnecessary elm arg from SIMPLEQ_REMOVE_HEAD().
this mirrors the functionality of SLIST_REMOVE_HEAD() (the other
singly-linked list type) and FreeBSD's STAILQ_REMOVE_HEAD()
sync of the NetBSD code.
fix isochornous support for ohci. This gets webcams like my OV511
working on sparc64.
PR: kern/52589
Submitted by: Bruce R. Montague (isochonous support)
Reviewed by: joe among others
MFNetBSD: revision 1.137
date: 2003/01/20 07:12:13; author: simonb;
Grrr. So much for my ability to use grep(1) effectively. Pointed out
by Stephen Degler in private mail.
date: 2002/12/10 14:07:37; author: toshii; state: Exp; lines: +6 -6
Add a couple of le32toh which were missing in the previous.
Pointed out by SOMEYA Yoshihiko.
date: 2002/12/07 07:33:20; author: toshii; state: Exp; lines: +50 -29
Update xfer->frlengths for input isoc transfer. Based on patches from
SOMEYA Yoshihiko.
Also fix error handling for isoc transfer somewhat; usb_transfer_complete
shouldn't be called for more than once.
date: 2002/12/07 07:14:28; author: toshii;
Fix several nits. Mostly from SOMEYA Yoshihiko.
- Call usbd_transfer_complete at splusb.
- Fix a botched for loop in ohci_rem_ed.
- In ohci_close_pipe, wait 1ms after removing an ED to avoid possible race
condition.
The splusb change is non-functional on FreeBSD.
The botched loop and race condition changes came from us.
This patch is non-functional.
date: 2002/09/29 20:58:25; author: augustss;
Add some spl calls to protect critical regions. From kern/18440,
Takeshi Nakayama.
(No functional change on FreeBSD).
Submitted by Hiroyuki Aizu <eyes@navi.org>
(refer to [FreeBSD-users-jp 65061])
Tested by Hiroharu Tamaru <tamaru@myn.rcast.u-tokyo.ac.jp>
(refer to [bsd-usb:689])
the dataphysend calculation could only possibly work if the virtual buffer
is also physically contiguous. Calculate dataphysend by calculating the
ending virtual address first, then converting to a physical address.
The second bug applies only to NetBSD and OpenBSD and involves the curlen
calculation in the two-contiguous-physical-pages case (which we don't support).
Also cleanup the use of the OHIC_PAGE() macro on dataphysend and add a panic
if len goes negative (meaning we lost the physical page translation
representing the end of the buffer).
IMHO the dataphysend is still bokered since it might be misrepresented
by shared userland page mappings. The whole section needs to be rewritten
to use the virtual address range.
MFC after: 3 days
code path to fix a bug in the non USB_USE_SOFTINTR path that caused
the usb bus to hang and generally misbehave when devices were unplugged.
In the process though it also reduced the throughput of usb devices because
of a less than optimal implementation under FreeBSD.
This commit fixes the non USB_USE_SOFTINTR code in uhci and ohci
so that it works again, and switches back to using this code path.
The uhci code has been tested, but the ohci code hasn't. It's
essentially the same anyway and so I don't envisage any difficulties.
Code for uhci submitted by: Maksim Yevmenkin <myevmenk@exodus.net>
debugging levels to off by default. Now that debug levels can be
tweaked by sysctl we don't need to go through hoops to get the
different usb parts to produce debug data.
date: 2002/05/28 12:42:39; author: augustss;
Change DMAADDR macro slightly.
Update the $NetBSD$ tags to reflect this and make slight changes to
usb_mem.h so that we're in sync with each other.