Commit Graph

86771 Commits

Author SHA1 Message Date
Maksim Yevmenkin
cd1fb2e095 Before it gets lost in the noise.
Put a bandaid to prevent ixgbe(4) from completely locking up the system
under high load. Our platform has a few CPU cores and a single active
ixgbe(4) port with 4 queues. Under high enough traffic load, at about
7.5GBs and 700,000 packets/sec (outbound), the entire system would
deadlock. What we found was that each CPU was in an endless loop on a
different ix taskqueue thread. The OACTIVE flag had gotten set on each
queue, and the ixgbe_handle_queue() function was continuously rescheduling
itself via the taskqueue_enqueue. Since all CPUs were busy with their
taskqueue threads, the ixgbe_local_timer() function couldn't run to clear
the OACTIVE flag.

Submitted by:	scottl
MFC after:	1 week
2012-06-05 18:48:02 +00:00
David E. O'Brien
98663aa0e4 Only build filemon(4) on x86. 2012-06-05 17:44:54 +00:00
Alexander Motin
a839e33278 Add missing newlines into XML output.
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
2012-06-05 16:46:34 +00:00
Warner Losh
a687c5ecc9 Remove dead code. 2012-06-05 14:19:59 +00:00
Bjoern A. Zeeb
15cc25e9c0 Plug two interface address refcount leaks in early error return cases
in the ioctl path.

Reported by:	rpaulo
 Reviewed by:	emax
MFC after:	3 days
2012-06-05 13:27:37 +00:00
Alexander Motin
a4d953c44e Tune and add some more CAM_DEBUG() points for the probe sequences. 2012-06-05 11:48:32 +00:00
Alexander Motin
2d89c12567 Replace #ifdef CAMDEBUG + if + panic() with single KASSERT(). 2012-06-05 10:23:41 +00:00
Alexander Motin
62275a906f Do not reinvent a wheel and let default error handler do its job. 2012-06-05 10:08:22 +00:00
Alexander Motin
fddde2b8ef Tune and add some missing CAM_DEBUG() points for better consistency. 2012-06-05 09:45:42 +00:00
Adrian Chadd
9f95609828 Mostly revert previous commit(s). After doing a bunch of local testing,
it turns out that it negatively affects performance.  I'm stil investigating
exactly why deferring the IO causes such negative TCP performance but
doesn't affect UDP preformance.

Leave the ath_tx_kick() change in there however; it's going to be useful
to have that there for if_transmit() work.

PR:		kern/168649
2012-06-05 06:03:55 +00:00
Gleb Smirnoff
36eeafa0e5 style(9) for r236563. 2012-06-05 05:16:04 +00:00
Adrian Chadd
14d33c7e35 Create a function - ath_tx_kick() - which is called where ath_start() is
called to "kick" along TX.

For now, schedule a taskqueue call.

Later on I may go back to the direct call of ath_rx_tasklet() - but for
now, this will do.

I've tested UDP and TCP TX. UDP TX still achieves 240MBit, but TCP
TX gets stuck at around 100MBit or so, instead of the 150MBit it should
be at.  I'll re-test with no ACPI/power/sleep states enabled at startup
and see what effect it has.

This is in preparation for supporting an if_transmit() path, which will
turn ath_tx_kick() into a NUL operation (as there won't be an ifnet
queue to service.)

Tested:
	* AR9280 STA

TODO:
	* test on AR5416, AR9160, AR928x STA/AP modes

PR:		kern/168649
2012-06-05 03:14:49 +00:00
Eitan Adler
3e0efd2ec4 Fix style nit: don't use leading zero for dates in .Dd
Prompted by:	brueffer
Approved by:	brueffer
MFC after:	3 days
2012-06-05 03:14:39 +00:00
David E. O'Brien
eb9aea5ac0 Add the 'filemon' device. 'filemon' is a kernel module that provides a device
interface for processes to record system calls of its children.

Submitted by:	Juniper Networks.
2012-06-04 22:54:19 +00:00
Adrian Chadd
470a7f4191 Migrate the TX path to a taskqueue for now, until a better way of
implementing parallel TX and TX/RX completion can be done without
simply abusing long-held locks.

Right now, multiple concurrent ath_start() entries can result in
frames being dequeued out of order.  Well, they're dequeued in order
fine, but if there's any preemption or race between CPUs between:

* removing the frame from the ifnet, and
* calling and runningath_tx_start(), until the frame is placed on a
  software or hardware TXQ

Then although dequeueing the frame is in-order, queueing it to the hardware
may be out of order.

This is solved in a lot of other drivers by just holding a TX lock over
a rather long period of time.  This lets them continue to direct dispatch
without races between dequeue and hardware queue.

Note to observers: if_transmit() doesn't necessarily solve this.
It removes the ifnet from the main path, but the same issue exists if
there's some intermediary queue (eg a bufring, which as an aside also
may pull in ifnet when you're using ALTQ.)

So, until I can sit down and code up a much better way of doing parallel
TX, I'm going to leave the TX path using a deferred taskqueue task.
What I will likely head towards is doing a direct dispatch to hardware
or software via if_transmit(), but it'll require some driver changes to
allow queues to be made without using the really large ath_buf / ath_desc
entries.

TODO:

* Look at how feasible it'll be to just do direct dispatch to
  ath_tx_start() from if_transmit(), avoiding doing _any_ intermediary
  serialisation into a global queue.  This may break ALTQ for example,
  so I have to be delicate.

* It's quite likely that I should break up ath_tx_start() so it
  deposits frames onto the software queues first, and then only fill
  in the 802.11 fields when it's being queued to the hardware.
  That will make the if_transmit() -> software queue path very
  quick and lightweight.

* This has some very bad behaviour when using ACPI and Cx states.
  I'll do some subsequent analysis using KTR and schedgraph and file
  a follow-up PR or two.

PR:		kern/168649
2012-06-04 22:01:12 +00:00
Marius Strobl
10ee2f9a87 The loaddev environment variable is not modifiable once set, so it is not
update for ZFS. It seems that this does not really affect anything except
the help command. Nevertheless, rearrange things so loaddev is set only
once in all cases in order to get it right.
Pointed out by: avg

MFC after:	r235364
2012-06-04 20:56:40 +00:00
Marius Strobl
f6dd28dc27 The workaround added in r151650 for handling firmwares that don't allow
a single device to be opened multiple times concurrently unfortunately
isn't sufficient with ZFS. This is due to the fact, that ZFS may open
different partitions of a single device simultaneously. So the best we
can do in this case is to cache the lastly used device path and close
and open devices in ofwd_strategy() as needed.

PR:		165025
Submitted by:	Gavin Mu
MFC after:	1 week
2012-06-04 20:45:33 +00:00
Dimitry Andric
56e4bfe54d Fix build of aicasm when CC=clang. This was due to a side-effect of the
EARLY_BUILD macro: the -Qunused-arguments flag isn't passed anymore when
building this particular program.  However, with clang 3.1 and -Werror,
such unused argument warnings are flagged as errors, causing buildkernel
to fail at this stage, due to the -nostdinc flag passed during linking.
Since the -nostdinc flag isn't actually needed, just remove it.

X-MFC-With:	r236528
2012-06-04 20:36:11 +00:00
Maksim Yevmenkin
77d396fd18 Plug more refcount leaks and possible NULL deref for interface
address list.

Submitted by:	scottl@
MFC after:	3 days
2012-06-04 18:43:51 +00:00
Dimitry Andric
cec20e143c Make aicasm compile without warnings if -Wpointer-sign is enabled.
MFC after:	3 days
2012-06-04 17:22:43 +00:00
George V. Neville-Neil
4737d389b0 Integrate a fix for a very odd signal delivery problem found
by Bryan Cantril and others in the Solaris/Illumos version of DTrace.

Obtained from: https://www.illumos.org/issues/789
MFC after:	2 weeks
2012-06-04 16:15:40 +00:00
Zachary Loafman
db5c7d363d Fix DTrace TSC skew calculation:
The skew calculation here is exactly backwards. We were able to repro
it on a multi-package ESX server running a FreeBSD VM, where the TSCs
can be pretty evil.

MFC after: 1 week

Submitted by: Jeff Ford <jeffrey.ford2@isilon.com>
Reviewed by: avg, gnn
2012-06-04 16:04:01 +00:00
Gleb Smirnoff
8955d2720f Microoptimisation of code from r236560, also coming from Nginx Inc.
Submitted by:	ru
2012-06-04 14:18:13 +00:00
Gleb Smirnoff
835d890042 Optimise kern_sendfile(): skip cycling through the entire mbuf chain in
m_cat(), storing pointer to last mbuf in chain in local variable and
attaching new mbuf to the end of chain.

Submitter reports that CPU load dropped for > 10% on a web server
serving large files with this optimisation.

Submitted by:	Sergey Budnevitch <sb nginx.com>
2012-06-04 12:49:21 +00:00
Alexander V. Chernikov
784292f89a Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface.
Add several comments on locking

Found by:         avg
Approved by:      ae(mentor)
Tested by:        avg
MFC after:        1 week
2012-06-04 12:36:58 +00:00
Alexander Motin
c6cba2497a Remove some dead code that I doubt will ever be implemented. 2012-06-04 09:47:19 +00:00
Grzegorz Bernacki
9fa69148a3 Restore changes accidentally removed in r235537.
Noticed by:	avg
2012-06-04 08:40:14 +00:00
Warner Losh
537cdfaff1 Eliminate the now-unused AT91C_MASTER_CLOCK option and change the one
place in the source it was used to the more correct AT91C_MAIN_CLOCK.
Sort AT91C_MAIN_CLOCK into a better location in the options.arm file.
2012-06-04 04:24:59 +00:00
Alan Cox
23c0d041ba Various small changes to PV entry management:
Constify pc_freemask[].

pmap_pv_reclaim()
  Eliminate "freemask" because it was a pessimization.  Add a comment about
  the resident count adjustment.

free_pv_entry() [i386 only]
  Merge an optimization from amd64 (r233954).

get_pv_entry()
  Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
  (The right strategy needs more thought.  Moreover, there were unintended
  differences between the amd64 and i386 implementation.)

pmap_remove_pages()
  Eliminate unnecessary ()'s.
2012-06-04 03:51:08 +00:00
Marius Strobl
4c87055c0c Disable verification of the flashed content for now; for reasons unknown
it sometimes causes physwr to hang.
2012-06-03 21:03:16 +00:00
Warner Losh
4623180919 Minor rearrangement of the locore <-> initarm interface. Pass in a
structure with the first 4 registers to allow a wider range of boot
loaders to work.  Future commits will make use of this to centralize
support for the different loaders.
2012-06-03 18:34:32 +00:00
Michael Tuexen
2faa5be555 Remove code which is not needed.
MFC after: 3 days
2012-06-03 18:14:57 +00:00
Konstantin Belousov
bba080854d Add a knob to disable vn_io_fault.
MFC after:	1 month
2012-06-03 16:19:37 +00:00
Konstantin Belousov
bb2f52a61d Count and export the number of prefaulting happen.
MFC after:	 1 month
2012-06-03 16:06:56 +00:00
Michael Tuexen
b82bd838f6 Use an existing function to get the source address.
MFC after: 3 days
2012-06-03 14:54:50 +00:00
Ulrich Spörlein
5355e5b582 Fix make depend 2012-06-03 12:19:16 +00:00
Andriy Gapon
7adc598a15 free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG
Those calls are useful with hardware watchdog drivers too.

MFC after:	3 weeks
2012-06-03 08:01:12 +00:00
Maksim Yevmenkin
3df0e439b0 Plug reference leak.
Interface routes are refcounted as packets move through the stack,
and there's garbage collection tied to it so that route changes can
safely propagate while traffic is flowing. In our setup, we weren't
changing or deleting any routes, but the refcounting logic in
ip6_input() was wrong and caused a reference leak on every inbound
V6 packet. This eventually caused a 32bit overflow, and the resulting
0 value caused the garbage collection to run on the active route.
That then snowballed into the panic.

Reviewed by:	scottl
MFC after:	3 days
2012-06-03 07:36:59 +00:00
Warner Losh
5fd9ec69d6 Remove stray repeated line... 2012-06-03 05:36:25 +00:00
Marius Strobl
57974eb576 - Now that the DataFlash related drivers work properly (at91_spi(4) since
r236495 and at45d(4) since r236496), enable them by default.
- Sort BOOTP options.
2012-06-03 01:07:55 +00:00
Marius Strobl
7f2107d400 - Loop up to 3 seconds when waiting for a device to get ready. [1]
- Make the device description match the driver name.
- Identify the chip variant based on the JEDEC and use that information
  to use the proper values for page count, offset and size instead of
  hardcoding a AT45DB642x with 2^N byte page support disabled.
- Take advantage of bioq_takefirst().
- Given that CONTINUOUS_ARRAY_READ_HF (0x0b) command isn't even mentioned
  in Atmel's DataFlash Application Note, as suggested by the previous
  comment may not work on all all devices and actually doesn't properly
  on at least AT45DB321D (JEDEC 0x1f2701), rewrite at45d_task() to use
  CONTINUOUS_ARRAY_READ (0xe8) for reading instead. This rewrite is laid
  out in a way allowing to easily add support for BIO_DELETE later on.
- Add support for reads and writes not starting on a page boundary.
- Verify the flash content after writing.
- Let at45d_task() gracefully handle errors on SPI transfers and the
  device not becoming ready afterwards again. [1]
- Use DEVMETHOD_END. [1]
- Use NULL instead of 0 for pointers. [1]

Additional testing by:	Ian Lepore

Submitted by:	Ian Lepore [1]
MFC after:	1 week
2012-06-03 01:00:55 +00:00
Marius Strobl
31a2c906d7 - Prepend the device description with "AT91" to reflect its nature. [1]
- Move DMA tag and map creature to at91_spi_activate() where the other
  resource allocation also lives. [1]
- Flesh out at91_spi_deactivate(). [1]
- Work around the "Software Reset must be Written Twice" erratum.
- For now, run the bus at the slowest speed possible in order to work
  around data corruption on transit even seen with 9 MHz on ETHERNUT5
  (15 MHz maximum) and AT45DB321D (20 MHz maximum). This also serves as
  a poor man's work-around for the "NPCSx rises if no data data is to be
  transmitted" erratum of RM9200. Being able to use the appropriate bus
  speed would require:
  1) Adding a proper work-around for the RM9200 bug consisting of taking
     the chip select control away from the SPI peripheral and managing it
     directly as a GPIO line.
  2) Taking the maximum frequencies supported by the actual board and the
     slave devices into account and basing the whole thing on the master
     clock instead of hardcoding a divisor as previously done.
  3) Fixing the above mentioned data corruption.
- KASSERT that TX/RX command and data sizes match on transfers.
- Introduce a mutex ensuring that only one child device is running a SPI
  transfer at a time. [1]
- Add preliminary, #ifdef'ed out support for setting the chip select. [1]
- Use the RX instead of the TX commando size when setting up the RX side
  of a transfer.
- For controllers having SPI_SR_TXEMPTY, i.e. !RM9200, also wait for the
  completion of the TX part of transfers before stopping the whole thing
  again.
- Use DEVMETHOD_END. [1]
- Use NULL instead of 0 for pointers. [1, partially]

Additional testing by:  Ian Lepore

Submitted by:   Ian Lepore [1]
MFC after:      1 week
2012-06-03 00:54:10 +00:00
Alan Cox
0d6f49d84a Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.
2012-06-02 22:14:10 +00:00
Michael Tuexen
2566e071ec Honor sysctl for TTL.
MFC after: 3 days
2012-06-02 21:22:26 +00:00
Michael Tuexen
962cef4089 Don't request data from the IPv6 layer, which is not used.
MFC after: 3 days
2012-06-02 20:53:23 +00:00
Marius Strobl
ae5d8757bf Add missing prototypes. While at it, sort them alphabetically.
MFC after:	3 days
2012-06-02 20:47:00 +00:00
Marius Strobl
f3a4392048 Remove nitems() now that it lives in <sys/param.h> since r236486. 2012-06-02 20:00:52 +00:00
Marius Strobl
47f4a4dc9a Take advantage of nitems().
MFC after:	3 days
2012-06-02 19:41:28 +00:00
Konstantin Belousov
d1b07fd498 Fix typo [1]. Use commas to separate flag printouts, in style with
other parts of function.

Submitted by: bf [1]
MFC after:   1 week
2012-06-02 19:39:12 +00:00
Marius Strobl
cbb000304c Add nitems(), a macro for determining the number of elements in a
statically-allocated array.

Obtained from:	OpenBSD (in principle)
MFC after:	3 days
2012-06-02 19:30:49 +00:00