106690 Commits

Author SHA1 Message Date
Adrian Chadd
941f53b9a9 mips74k: use cache-writeback for memory, not writethrough.
When I ported this code from netbsd I was .. slightly mips74k greener.
I used writethrough because (a) it's what netbsd did, and (b) if I used
writethrough then things "didn't work."

Fast-forward a couple years, more MIPS hacking and a whole lot more
understanding of the bus APIs (the last few commits notwithstanding;
it's been a long week, ok?) and I have this working for arge,
argemdio, spi and ath.  Hans has it working for USB.  The ath barrier
code will come in a later commit.

This gets the routing throughput up from 220mbit -> 337mbit.
I'm sure the bridging throughput will be similarly improved.

Tested:

* QCA955x SoC, routing workload.
2015-10-31 00:04:44 +00:00
Adrian Chadd
f17acb5fbe arge_mdio: fix barriers; correctly check MII indicator register.
* use barriers in a slightly better fashion.  You can blame this
  glass of whiskey on putting barriers in the wrong spot.  Grr adrian.

* steal/rewrite the mdio busy check from ag7100 from openwrt and
  refactor the existing code out.  This is .. more correct.

This seems to fix the boot-to-boot variation that I've been seeing
and it quietens the switch port status flapping.

Tested:

* QCA9558 SoC (AP135.)

Obtained from:	Linux OpenWRT
2015-10-30 23:59:52 +00:00
Adrian Chadd
78e1370bbc arge: fix barrier macro. 2015-10-30 23:57:20 +00:00
Adrian Chadd
29f88ae706 arge: attempt to close a transmit race by only enabling the descriptor at the end of setup.
This driver and the linux ag71xx driver both treat the transmit ring
as a circular linked list of descriptors.  There's no "end" pointer
that is ever NULL - instead, it expects the MAC to hit a finished
descriptor (ARGE_DESC_EMPTY) and stop.

Now, since it's a circular buffer, we may end up with the hardware
hitting the beginning of our multi-descriptor frame before we've finished
setting it up. It then DMA's it in, starts sending it, and we finish
writing out the new descriptor.  The hardware may then write its
completion for the next descriptor out; then we do, and when we next
read it it'll show up as "not done" and transmit completion stops.

This unfortunately manifests itself as the transmit queue always
being active and a massive TX interrupt storm.  We need to actively
ACK packets back from the transmit engine and if we don't (eg because
we think the transmit isn't finished but it is) then the unit will
just keep generating interrupts.

I hit this finally with the below testing setup.  This fixed it for me.

Strictly speaking I should put in a sync in between writing out all of
the descriptors and writing out that final descriptor.

Tested:

* QCA9558 SoC (AP135 reference board) w/ arge1 + vlans acting as a
  router, and iperf -d (tcp, bidirectional traffic.)

Obtained from:	Linux OpenWRT (ag71xx_main.c.)
2015-10-30 23:18:02 +00:00
Adrian Chadd
70487bd29b arge: just use 1U since it's a 32 bit unsigned destination value. 2015-10-30 23:09:08 +00:00
Adrian Chadd
a73d5cc09f arge: do an explicit flush between updating the TX ring and starting transmit.
The MIPS busdma sync operations currently are a big no-op on coherent memory.
This isn't strictly correct behaviour as we need a SYNC in here to ensure that
the writes have finished and are visible in main memory before the MMIO accesses
occur.  This will have to be addressed in a later commit.

But, before that happens, let's at least do a flush here to make things
more "correct".

This is required for even remotely sensible behaviour on mips74k with
write-through memory enabled.
2015-10-30 23:07:32 +00:00
Adrian Chadd
ab2477c2c1 arge_mdio: add explicit read barriers for MDIO_READs.
The mips74k programmers guide notes that reads can be re-ordered, even
uncached ones, so we need an explicit SYNC between them.

Yes, this is a case of a driver author actively doing a bus barrier
operation.

This ends up being necessary when the mips74k core is run in write-back
mode rather than write-through mode.  That's coming in an upcoming
commit.

Tested:

* mips74k, QCA9558 SoC (AP135 reference board), arge<->arge interface
  routing traffic tests.
2015-10-30 23:00:47 +00:00
Adrian Chadd
47ed24efe2 arge: ensure there's enough space in the TX ring before attempting to
send frames.

This matches the other check for space.

"enough" is a misnomer, for "reasons".  The biggest reason is that
the TX ring is actually a circular linked list, with no head/tail pointers.
This is just a bit more headroom between head/tail so we have time to
schedule frames before we hit where the hardware is at.

Ideally this would be tunable and a little larger.
2015-10-30 22:55:41 +00:00
Adrian Chadd
3b8a3b85eb arge: do a read-after-write on all arge register writes, not just MDIO writes.
This flushes out the write to the system before anything continues.

The mips74k guide, chapter 3.3.3 (write gathering) notes that writes
can be buffered in FIFOs - even uncached ones - so we can't guarantee
the device has felt its effects.  Now, since we're all lazy driver
authors and don't pepper read/write barriers everywhere, fake it here.

tested:

* mips74k - QCA9558 SoC (AP135 reference board)
2015-10-30 22:53:30 +00:00
Jung-uk Kim
7bded2db17 Merge OpenSSL 1.0.2d. 2015-10-30 20:51:33 +00:00
Konstantin Belousov
50657fd342 Minor (and incomplete) style cleanup.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-30 20:47:42 +00:00
Konstantin Belousov
2936e0013c Also mark compat32 umtx op table as constant.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-30 19:32:30 +00:00
Konstantin Belousov
c539e87014 Use C99 array initialization, which also makes the code
self-documented, and eases addition of new ops.

For the similar reasons, eliminate UMTX_OP_MAX.  nitems() handles the
only use of the symbol.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-30 19:20:40 +00:00
Simon J. Gerraty
ce8df48b73 Do not FALLTHROUGH for SIOC{ADD,DEL}MULTI
ifmedia_ioctl() returns EINVAL

Differential Revision:	3897
Submitted by:	aronen@juniper.net
Reviewed by:	marcel
2015-10-30 17:12:15 +00:00
Jim Harris
fdbd3d8068 nvd, nvme: report stripesize through GEOM disk layer
MFC after:	3 days
Sponsored by:	Intel
2015-10-30 16:35:18 +00:00
Jim Harris
e7e7bad3d7 nvme: fix race condition in split bio completion path
Fixes race condition observed under following circumstances:

1) I/O split on 128KB boundary with Intel NVMe controller.
   Current Intel controllers produce better latency when
   I/Os do not span a 128KB boundary - even if the I/O size
   itself is less than 128KB.
2) Per-CPU I/O queues are enabled.
3) Child I/Os are submitted on different submission queues.
4) Interrupts for child I/O completions occur almost
   simultaneously.
5) ithread for child I/O A increments bio_inbed, then
   immediately is preempted (rendezvous IPI, higher priority
   interrupt).
6) ithread for child I/O B increments bio_inbed, then completes
   parent bio since all children are now completed.
7) parent bio is freed, and immediately reallocated for a VFS
   or gpart bio (including setting bio_children to 1 and
   clearing bio_driver1).
8) ithread for child I/O A resumes processing.  bio_children
   for what it thinks is the parent bio is set to 1, so it
   thinks it needs to complete the parent bio.

Result is either calling a NULL callback function, or double freeing
the bio to its uma zone.

PR:		203746
Reported by:	Drew Gallatin <gallatin@netflix.com>,
		Marc Goroff <mgoroff@quorum.net>
Tested by:	Drew Gallatin <gallatin@netflix.com>
MFC after:	3 days
Sponsored by:	Intel
2015-10-30 16:06:34 +00:00
Edward Tomasz Napierala
665aea9323 After r290196, the kernel won't wait for stuff like gmirror nodes
if they are not required for mounting rootfs.  However, it's possible
that some setups try to mount them in mountcritlocal (ie from fstab).

Export the list of current root mount holds using a new sysctl,
vfs.root_mount_hold, and make mountcritlocal retry if "mount -a" fails
and the list is not empty.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3709
2015-10-30 15:52:10 +00:00
Edward Tomasz Napierala
a3ba3d09c2 Make root mount wait mechanism smarter, by making it wait only if the root
device doesn't yet exist.

Reviewed by:	kib@, marcel@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3709
2015-10-30 15:35:04 +00:00
Hans Petter Selasky
52d7c63839 Reduce the DWC OTG interrupt load by not reading all the host channel
status registers for every interrupt. Check a common host channel
status interrupt register first, then conditionally read the
individual host channel status registers.

Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	1 week
2015-10-30 14:50:29 +00:00
Andriy Gapon
abc37121c4 l2arc: do not call trim_map_free() for blocks with zero b_asize
b_asize can be zero if the block is compressed into an empty block
(ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless
zero-sized trimming is not attempted.
The logic for calling trim_map_free() is extracted into a new function
l2arc_trim() to minimize code duplication.

PR:		203473
Reported by:	Willem Jan Withagen <wjw@digiware.nl>
Tested by:	Willem Jan Withagen <wjw@digiware.nl>
MFC after:	11 days
2015-10-30 12:00:34 +00:00
Konstantin Belousov
05f1048743 The prefix for CLFLUSHOPT is 0x66. It was right on amd64.
Sponsored by:	The FreeBSD Foundation
2015-10-30 09:53:33 +00:00
Oleksandr Tymoshenko
c26ee519d1 Fix BULK read transfer if destination buffer is not cache line-aligned.
We can't use copyout because destination memory is userland address
in another process but we have reference to respective page so map
the page into kernel address space and copy fragments there
2015-10-30 01:19:04 +00:00
Navdeep Parhar
baa7d0bf9d cxgbe/tom: decide whether to shove segments or not only if there is
payload to transmit.

MFC after:	1 week
2015-10-30 01:18:07 +00:00
Oleksandr Tymoshenko
5ab55ce398 Fix framebuffer compatibility with new RPi firmware. Framebuffer driver
receives video memory address from VideoCore through property mailbox
channel. Older versions of firmware (and the one that is currently part
of sysutils/u-boot-rpi and sysutils/u-boot-rpi2) returned real physical
address, newer one returns VideoCore bus address, so we need to convert
it to actual physical address. this version works with both older and
newer interface.
2015-10-30 00:24:37 +00:00
Bryan Drewery
243c115e92 Remove unneeded NULL as this is initialized with M_ZERO.
MFC after:	2 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-10-29 23:56:34 +00:00
Oleksandr Tymoshenko
5da8f2b69b Fix LEAVE_HYP macro: spsr is not guaranteed to contain valid value at this
point, e.g. on RaspberryPi 2 when control is passed from loader to kernel
it contains garbage. So we use cpsr as a base for new cpsr value: if we
have reached this point it means current value is OK

Reviewed by:	andrew
2015-10-29 22:12:03 +00:00
George V. Neville-Neil
02b90dbf45 Set the proper direction to check for policies in this one case.
Pointed out by: eri
Sponsored by:	Rubicon Communications (Netgate)
2015-10-29 21:26:32 +00:00
John Baldwin
35aafbeda8 Use movw instead of movl (or plain mov) when moving segment registers
into memory.  This is a nop on clang's assembler, but some assemblers
complain if the size suffix is incorrect.

Submitted by:	bde
2015-10-29 21:25:46 +00:00
Kristof Provost
679e3c77b7 pf: Fix IPv6 checksums with route-to.
When using route-to (or reply-to) pf sends the packet directly to the output
interface. If that interface doesn't support checksum offloading the checksum
has to be calculated in software.
That was already done in the IPv4 case, but not for the IPv6 case. As a result
we'd emit packets with pseudo-header checksums (i.e. incorrect checksums).

This issue was exposed by the changes in r289316 when pf stopped performing full
checksum calculations for all packets.

Submitted by:	Luoqi Chen
MFC after:	1 week
2015-10-29 20:45:53 +00:00
Alexander Motin
2626fa27ad Remove some unneeded code. 2015-10-29 20:43:13 +00:00
Alexander Motin
030eb8d0f2 Remove reset delays for which I see neither explanation nor need. 2015-10-29 20:34:01 +00:00
Conrad Meyer
217b098a1e ntb: Revert r290130 now that r290156 has landed
Nagged by:	vangyzen
Sponsored by:	EMC / Isilon Storage Division
2015-10-29 19:35:01 +00:00
Conrad Meyer
0e5d2011ae pmap_change_attr: Only fixup DMAP for DMAPed ranges
pmap_change_attr must change the memory type of both the requested KVA
and the corresponding DMAP mappings (if such mappings exist), to satisfy
an Intel requirement that two or more mappings to the same physical
pages must have the same memory type.

However, not all kernel mapped pages have corresponding DMAP mappings --
for example, 64-bit BARs.  Skip fixing up the DMAP for out-of-bounds
addresses.

Submitted by:	Steve Wahl <steve_wahl@dell.com>
Reviewed by:	alc, jhb
Sponsored by:	Dell Compellent
Differential Revision:	https://reviews.freebsd.org/D4030
2015-10-29 19:07:00 +00:00
Bryan Drewery
156c04f793 getnewbuf: Initialize bp to avoid uninitialized pointer dereference and brelse().
This came in recently in r289279.

Coverity CID:	1331561
2015-10-29 19:02:24 +00:00
Bryan Drewery
2780ba06c7 Avoid passing an uninitialized 'i'. Currently nothing was depending on it
anyhow.

Coverity CID:	1331562
2015-10-29 18:58:18 +00:00
Alexander Motin
2e6beaf19e Fix and improve error masking and reporting. 2015-10-29 16:48:12 +00:00
John Baldwin
2219c44a1f Update for LINUX32 rename. The assembler didn't complain about undefined
symbols but just used 0 after the rename.
2015-10-29 15:20:47 +00:00
John Baldwin
6cea44a704 Fix build with DEBUG defined.
Reported by:	hselasky
2015-10-29 15:16:47 +00:00
Hans Petter Selasky
cb3450e26e Add missing NULL check in physio().
When destroying a character device the si_devsw field is set to NULL
before all references are gone, to indicate the character device is
going away. This can cause a NULL-dereference fault inside physio().

The callers of physio() should own a thread reference on the cdev and
if si_devsw is seen as non-NULL, it is usable during the execution of
the function. Else an ENXIO error code is returned.

Reviewed by:	kib
MFC after:	2 weeks
2015-10-29 13:53:37 +00:00
Hans Petter Selasky
8d59ecb214 Finish process of moving the LinuxKPI module into the default kernel build.
- Move all files related to the LinuxKPI into sys/compat/linuxkpi and
  its subfolders.
- Update sys/conf/files and some Makefiles to use new file locations.
- Added description of COMPAT_LINUXKPI to sys/conf/NOTES which in turn
  adds the LinuxKPI to all LINT builds.
- The LinuxKPI can be added to the kernel by setting the
  COMPAT_LINUXKPI option. The OFED kernel option no longer builds the
  LinuxKPI into the kernel. This was done to keep the build rules for
  the LinuxKPI in sys/conf/files simple.
- Extend the LinuxKPI module to include support for USB by moving the
  Linux USB compat from usb.ko to linuxkpi.ko.
- Bump the FreeBSD_version.
- A universe kernel build has been done.

Reviewed by:	np @ (cxgb and cxgbe related changes only)
Sponsored by:	Mellanox Technologies
2015-10-29 08:28:39 +00:00
Kevin Lo
e3cf3d4428 Remove the static function declaration. 2015-10-29 04:51:27 +00:00
Kevin Lo
1c1cd920d7 - Add a missing prototype
- Fix typos
2015-10-29 04:21:34 +00:00
Conrad Meyer
1ffae6e80a ioat_test: Handled forced hardware resets gracefully
Sponsored by:	EMC / Isilon Storage Division
2015-10-29 04:16:52 +00:00
Conrad Meyer
5f77bd3e24 ioat: Drain/quiesce the device less racily
On detach and during a forced HW reset.

Sponsored by:	EMC / Isilon Storage Division
2015-10-29 04:16:39 +00:00
Conrad Meyer
79c1a0199f ntb: Do not attempt to set write-combining on MWs
AMD64 pmap assumes ranges will be in the DMAP, which isn't necessarily
true for NTB memory windows (especially 64-bit BARs).

Suggested by:	pmap_change_attr_locked -> kassert_panic
Sponsored by:	EMC / Isilon Storage Division
2015-10-29 04:16:28 +00:00
Conrad Meyer
e9497f9bbd ioatcontrol(8): Add and document "raw" testing mode
Allows DMA from/to arbitrary KVA or physical address.  /dev/ioat_test
must be enabled by root and is only R/W root, so this is approximately
as dangerous as /dev/mem and /dev/kmem.

Sponsored by:	EMC / Isilon Storage Division
2015-10-29 04:16:16 +00:00
Adrian Chadd
948457f1be Oops - use the wrong array offset. 2015-10-28 23:39:33 +00:00
Hiren Panchasara
12eeb81fc1 Calculate the correct amount of bytes that are in-flight for a connection as
suggested by RFC 6675.

Currently differnt places in the stack tries to guess this in suboptimal ways.
The main problem is that current calculations don't take sacked bytes into
account. Sacked bytes are the bytes receiver acked via SACK option. This is
suboptimal because it assumes that network has more outstanding (unacked) bytes
than the actual value and thus sends less data by setting congestion window
lower than what's possible which in turn may cause slower recovery from losses.

As an example, one of the current calculations looks something like this:
snd_nxt - snd_fack + sackhint.sack_bytes_rexmit
New proposal from RFC 6675 is:
snd_max - snd_una - sackhint.sacked_bytes + sackhint.sack_bytes_rexmit
which takes sacked bytes into account which is a new addition to the sackhint
struct. Only thing we are missing from RFC 6675 is isLost() i.e. segment being
considered lost and thus adjusting pipe based on that which makes this
calculation a bit on conservative side.

The approach is very simple. We already process each ack with sack info in
tcp_sack_doack() and extract sack blocks/holes out of it. We'd now also track
this new variable sacked_bytes which keeps track of total sacked bytes reported.

One downside to this approach is that we may get incorrect count of sacked_bytes
if the other end decides to drop sack info in the ack because of memory pressure
or some other reasons. But in this (not very likely) case also the pipe
calculation would be conservative which is okay as opposed to being aggressive
in sending packets into the network.

Next step is to use this more accurate pipe estimation to drive congestion
window adjustments.

In collaboration with:	rrs
Reviewed by:		jason_eggnet dot com, rrs
MFC after:		2 weeks
Sponsored by:		Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D3971
2015-10-28 22:57:51 +00:00
Jason A. Harmening
d58d7ad4a8 Retire pmap_dmap_iscurrent(). It is only a wrapper around pmap_is_current(), and is no longer called. 2015-10-28 21:17:38 +00:00
Alexander Motin
668c0ec64f Change the way how target mode is enabled on 23xx chips.
Without docs I am not completely sure about this, but on my tests new
method works better then previous, at least with our latest firmware.
2015-10-28 19:08:51 +00:00