Commit Graph

117315 Commits

Author SHA1 Message Date
Pedro F. Giffuni
b5f0918be5 extfs: fix the build with no UFS_ACL.
Some people may want to drop UFS-style ACLs for slimmer kernels.
Let's just not assume everyone needs ACLs.

Reported by:	bde
Submitted by:	Fedor Uporov
Differential Revision:	https://reviews.freebsd.org/D11145
2017-06-11 19:05:45 +00:00
Konstantin Belousov
fc8929cb29 More accurately handle early EFER restoration on resume.
Do not try to set LMA bit while CPU is still in legacy mode.
Apparently Intel CPUs ignore non-id writes to LMA, while AMD's
(over-)react with #GP.

Reported and tested by:	danfe
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-06-11 14:39:08 +00:00
Ian Lepore
a8f3fd8253 Convert from local code and constants for mac<->phy connection type to new
common fdt helper code.
2017-06-11 00:44:19 +00:00
Ian Lepore
b68031718e Add a driver for the Vitesse/Microsemi VSC8501 PHY. 2017-06-11 00:38:16 +00:00
Ian Lepore
7f153db853 Add some utility functions to help a PHY driver on an FDT-configured
system retrieve its config data from the fdt data.

The properties that are common to all phys are decoded and returned in a
structure.  The fdt node handles for the mac and phy devices are also
returned in the config data struct, so a driver can easily obtain additional
hardware-specific config values from the fdt data.
2017-06-11 00:16:21 +00:00
Ian Lepore
0f75981ae9 Add a set of constants describing the ways a MAC and PHY can be connected.
While the initial need for this is to help support phy drivers which are
configured with FDT data, there is nothing devicetree-specific about the
concept or the names, so they are available for use even on non-FDT systems.

The initial list of connection types comes from the current devicetree
bindings documentation, but values not documented there can be added to
the list in the future as needed, the values could be sorted into a
different order without perturbing FDT code, etc.  The only invariant
is that MII_CONTYPE_UNKNOWN should be first (so it has a value of zero,
so that a con-type variable in a softc, for example, is initialized to
MII_CONTYPE_UNKNOWN by default).
2017-06-10 23:55:13 +00:00
Ian Lepore
f5c49e5c89 Allow building if_ffec as a module. 2017-06-10 23:45:26 +00:00
Ian Lepore
ab3ad5bc81 if_ffec bugfixes related to harvesting of hardware-maintained statistics...
After harvesting the hardware statistics counters and summing them into the
interface stats, properly clear the hardware counters back to zero.  On imx5
and earlier hardware it is necessary to disable collection of stats while
writing zeroes to all the registers.  On imx6 and newer it turns out it's
not even possible to write zeroes, instead you have to toggle a special
"zero everything" control bit in a register.

Count incoming packets with a bad start frame delim as input errors, and
incoming packets dropped due to no fifo space as input drops.

Remove all code related to harvesting the hardware stats less often than
once per second.  It turns out the 32-bit stats registers are backed by
16-bit counters under the hood, and they can easily roll over if you only
harvest them once every 3 seconds like the old code was doing.  Now we just
read all the regs once a second.

The combination of not properly zeroing the stats registers and 16-bit
counters sometimes wrapping between harvest calls resulted in basically
unusable statistics before these changes.
2017-06-10 23:26:25 +00:00
Edward Tomasz Napierala
4c7008824b Switch the example name for variables controlling loading memory images
in /boot/defaults/loader.conf to something that's actually commonly used,
"mdroot".  It's arbitrary, but it's easier to find this way.

MFC after:	2 weeks
2017-06-10 19:05:45 +00:00
Eric Joyner
5b83a512c1 ixl(4)/ixlv(4): Fix some busdma tags and improper map NULL.
Description from Brett:

"The busdma tags used to create mappings for the tx and rx rings did not have
the device's tag as parents, meaning that they did not respect the device's
busdma properties. The other tags used in the driver had their parents set
appropriately.

Also, the dma maps for each buffer in ixl_txeof() were being NULLed after
being unloaded, which is an error because those maps are then reused without
being recreated (I believe this also leaked resources since the maps were not
destroyed). Simply removing the line that sets the maps to NULL gives the
desired behavior. There does not seem to be a similar problem with ixl_rxeof().
Functions to free the tx and rx rings also NULL out the dma maps for each
buffer, but this seems okay because the maps are destroyed and not reused in
this case.

With these fixes, my ixl card seems to be working with the IOMMU enabled."

Submitted by:	Brett Gutstein <bgutstein@rice.edu>
Reviewed by:	erj
Approved by:	Alan Cox <alc@rice.edu>
MFC after:	1 week
2017-06-10 18:56:30 +00:00
Alan Cox
015d7db6b6 Remove an unnecessary field from struct blist. (The comment describing
what this field represented was also inaccurate.)  Suggested by: kib

In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be
specified.  However, blist_create() was not modified to handle the
possibility that a malloc() call failed.  Address this omission.

Increase the width of the local variable "radix" to 64 bits.  (This
matches the width of the corresponding field in struct blist.)

Reviewed by:	kib
MFC after:	6 weeks
2017-06-10 16:11:39 +00:00
Luiz Otavio O Souza
595d629c09 Remove an unnecessary variable from the switch softc structure and make the
functions that are used as booleans return real boolean values.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-09 20:38:18 +00:00
Justin Hibbits
880870b41a Follow up r313841 on powerpc
Close a potential race in reading the CPU dtrace flags, where a thread can
start on one CPU, and partway through retrieving the flags be swapped out,
while another thread traps and sets the CPU_DTRACE_NOFAULT.  This could
cause the first thread to return without handling the fault.

Discussed with:	markj@
2017-06-09 20:26:42 +00:00
Mark Johnston
f67b5de754 Implement pci_disable_device() in the LinuxKPI.
Submitted by:	kmacy
MFC after:	2 weeks
2017-06-09 19:57:27 +00:00
Mark Johnston
465659643b Augment wait queue support in the LinuxKPI.
In particular:
- Don't evaluate event conditions with a sleepqueue lock held, since such
  code may attempt to acquire arbitrary locks.
- Fix the return value for wait_event_interruptible() in the case that the
  wait is interrupted by a signal.
- Implement wait_on_bit_timeout() and wait_on_atomic_t().
- Implement some functions used to test for pending signals.
- Implement a number of wait_event_*() variants and unify the existing
  implementations.
- Unify the mechanism used by wait_event_*() and schedule() to put the
  calling thread to sleep.

This is required to support updated DRM drivers. Thanks to hselasky for
finding and fixing a number of bugs in the original revision.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10986
2017-06-09 19:41:12 +00:00
Alan Cox
1bece9d562 Style and comment fixes only.
Reviewed by:	kib
MFC after:	6 weeks
2017-06-09 17:19:27 +00:00
Alan Cox
a7249a6c85 blist_fill()'s return type is too narrow. blist_fill() accepts a 64-bit
quantity as the size of the range to fill, but returns a 32-bit quantity
as the number of blocks that were allocated to fill that range.  This
revision corrects that mismatch.  Currently, swaponsomething() limits
the size of a swap area to prevent arithmetic arithmetic overflow in
other parts of the blist allocator.  That limit has also prevented this
type mismatch from causing problems.

Reviewed by:	kib, markj
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D11096
2017-06-09 16:19:24 +00:00
Gleb Smirnoff
438902edbd Fix stat(2) on a listening socket. 2017-06-09 15:54:48 +00:00
Andrew Turner
52453d22f2 Allow the arm64 machine/vfp.h to be included without first including
machine/pcb.h. It he latter is only needed for struct pcb.
2017-06-09 15:47:14 +00:00
Andrew Turner
9a19869a5f Store the read-only thread pointer when scheduling a new thread. This is
not currently set, however we may wish to set it later.
2017-06-09 15:37:17 +00:00
Andriy Gapon
667002fa27 MFV r319741: 8156 dbuf_evict_notify() does not need dbuf_evict_lock
illumos/illumos-gate@dbfd9f9300
dbfd9f9300

https://www.illumos.org/issues/8156
  dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
  the eviction itself (because the evict thread is not able to keep up).
  This can result in massive lock contention.
  It isn't necessary to hold the lock, because if we make the wrong choice
  occasionally, nothing bad will happen.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	1 week
2017-06-09 15:28:57 +00:00
Andriy Gapon
5f9cf93878 MFV r319739: 8005 poor performance of 1MB writes on certain RAID-Z configurations
illumos/illumos-gate@5b06278253
5b06278253

https://www.illumos.org/issues/8005
  RAID-Z requires that space be allocated in multiples of P+1 sectors,
  because this is the minimum size block that can have the required amount
  of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
  sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
  is a unit of 2^ashift bytes, typically 512B or 4KB.
  To satisfy this constraint, the allocation size is rounded up to the
  proper multiple, resulting in up to 3 "pad sectors" at the end of some
  blocks. The contents of these pad sectors are not used, so we do not
  need to read or write these sectors. However, some storage hardware
  performs much worse (around 1/2 as fast) on mostly-contiguous writes
  when there are small gaps of non-overwritten data between the writes.
  Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
  include pad sectors. If writing a pad sector will fill the gap between
  two (required) writes, we will issue the optional zio, thus doubling
  performance. The gap-filling performance improvement was introduced in
  July 2009.
  Writing the optional zio is done by the io aggregation code in
  vdev_queue.c. The problem is that it is also subject to the limit on
  the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
  default 128KB. For a given block, if the amount of data plus padding
  written to a leaf device exceeds zfs_vdev_aggregation_limit, the
  optional zio will not be written, resulting in a ~2x performance
  degradation.
  The problem occurs only for certain values of ashift, compressed block
  size, and RAID-Z configuration (number of parity and data disks). It
  cannot occur with the default recordsize=128KB. If compression is
  enabled, all configurations with recordsize=1MB or larger will be
  impacted to some degree.
  The problem notably occurs with recordsize=1MB, compression=off, with 10
  disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	10 days
2017-06-09 15:27:22 +00:00
Andriy Gapon
9f141f8d71 MFV r319738: 8155 simplify dmu_write_policy handling of pre-compressed buffers
illumos/illumos-gate@adaec86ad2
adaec86ad2

https://www.illumos.org/issues/8155
  When writing pre-compressed buffers, arc_write() requires that the compression
  algorithm used to compress the buffer matches the compression algorithm
  requested by the zio_prop_t, which is set by dmu_write_policy().
  This makes dmu_write_policy() and its callers a bit more complicated.
  We can simplify this by making arc_write() trust the caller to supply the type
  of pre-compressed buffer that it wants to write, and override the compression
  setting in the zio_prop_t.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	10 days
2017-06-09 15:26:03 +00:00
Andriy Gapon
61722264cd remove an unrelated local change from r319746
MFC after:	1 day
X-MFC with:	r319746
2017-06-09 15:21:28 +00:00
Andriy Gapon
ad2b1a296f MFV r319744,r319745: 8269 dtrace stddev aggregation is normalized incorrectly
illumos/illumos-gate@79809f9cf4
79809f9cf4

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
Sponsored by:	Panzura
2017-06-09 15:16:39 +00:00
Konstantin Belousov
40373cf5b8 Remove msdosfs -o large support.
Its purpose was to translate the values for msdosfs inode numbers,
which is calculated from the msdosfs structures describing the file,
into the range representable by 32bit ino_t.  The translation acted
for filesystems larger than 128Gb, it reserved the range 0xf0000000
(FILENO_FIRST_DYN) to UINT32_MAX and remembered some arbitrary
translation of ino >= FILENO_FIRST_DYN into this range.  It consumed
memory that could be only freed by unmount, and the translation was
not stable across remounts.

With ino_t type extended to 64 bit, there is no such issue and values
can be returned without compaction to 32bit.  That is, for the native
environments, the translation layer is not necessary and adds
significant undeserved code complexity.  For compat ABIs which use
32bit ino_t, the vfs.ino64_trunc_error sysctl provides some measures
to soften the failure mode when inode numbers truncation is not safe.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-09 12:06:22 +00:00
Konstantin Belousov
7abe0df223 Enhance vfs.ino64_trunc_error sysctl.
Provide a new mode "2" which returns a special overflow indicator in
the non-representable field instead of the silent truncation (mode
"0") or EOVERFLOW (mode "1").

In particular, the typical use of st_ino to detect hard links with
mode "2" reports false positives, which might be more suitable for
some uses.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-09 11:17:08 +00:00
Andriy Voskoboinyk
15eaaf082a rtwn: rename module (if_rtwn.ko -> rtwn.ko) to match module name + drop
manpage link.

Reported by:	mav, hselasky
2017-06-09 07:08:58 +00:00
Gleb Smirnoff
77e1943785 When we are in UMA_STARTUP use startup_alloc() for any zone, not for
internal zones only.  This allows to create new zones at early stages
of boot, without need to mark them as internal to UMA, which isn't
always true.

Reviewed by:	alc
2017-06-08 21:33:19 +00:00
John Baldwin
1496376fee Fix the software fallback for GCM to validate the existing tag for decrypts.
Sponsored by:	Chelsio Communications
2017-06-08 21:33:10 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
John Baldwin
4623e047a7 Add explicit handling for requests with an empty payload.
- For HMAC requests, construct a special input buffer to request an empty
  hash result.
- For plain cipher requests and requests that chain an AES cipher with an
  HMAC, fail with EINVAL if there is no cipher payload.  If needed in
  the future, chained requests that only contain AAD could be serviced as
  HMAC-only requests.
- For GCM requests, the hardware does not support generating the tag for
  an AAD-only request.  Instead, complete these requests synchronously
  in software on the assumption that such requests are rare.

Sponsored by:	Chelsio Communications
2017-06-08 21:06:18 +00:00
Jonathan T. Looney
dd776f4593 With EARLY_AP_STARTUP enabled, we are seeing crashes in softclock_call_cc()
during bootup. Debugging information shows that softclock_call_cc() is
trying to execute the vt_consdev.vd_timer callout, and the callout
structure contains a NULL c_func.

This appears to be due to a race between vt_upgrade() running
callout_reset() and vt_resume_flush_timer() calling callout_schedule().

Fix the race by ensuring that vd_timer_armed is always set before
attempting to (re)schedule the callout.

Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D9828
2017-06-08 20:47:18 +00:00
Jonathan T. Looney
dc6a41b936 Add the infrastructure to support loading multiple versions of TCP
stack modules.

It adds support for mangling symbols exported by a module by prepending
a string to them. (This avoids overlapping symbols in the kernel linker.)

It allows the use of a macro as the module name in the DECLARE_MACRO()
and MACRO_VERSION() macros.

It allows the code to register stack aliases (e.g. both a generic name
["default"] and version-specific name ["default_10_3p1"]).

With these changes, it is trivial to compile TCP stack modules with
the name defined in the Makefile and to load multiple versions of the
same stack simultaneously. This functionality can be used to enable
side-by-side testing of an old and new version of the same TCP stack.
It also could support upgrading the TCP stack without a reboot.

Reviewed by:	gnn, sjg (makefiles only)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D11086
2017-06-08 20:41:28 +00:00
Ed Maste
c2aa86d19c arm64: add ".arch armv8-a+crc" to allow use of crc instructions
With Clang 5.0 the .arch directive is required, otherwise Clang
complains "error: instruction requires: crc".

This was reported in D10499 but not added initially, because clang 3.8
available on a ref machine reported unknown directive.  Clang 4.0 allows
but does not require the directive.

Submitted by:	andrew
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2017-06-08 20:06:09 +00:00
Baptiste Daroussin
75d7690b02 Bump _FreeBSD_version after removal of groff
Reported by:	antoine
2017-06-08 17:06:16 +00:00
Zbigniew Bodek
5d83a7b631 Add function to dump PCIE MBUS decoding windows and bars
This commit allows to dump PCIE MBUS and bars configuration
for Marvell platforms.

Submitted by:   Michal Mazur <mkm@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Netgate
Differential revision: https://reviews.freebsd.org/D10908
2017-06-08 16:57:06 +00:00
Zbigniew Bodek
1717c1f1a3 Restore DTS node of PCIe controller for A38X boards
Add pcie-controller node as a bus-parent of pcie nodes for Armada38x
boards. This reduces diff between Linux and FreeBSD PCIe device tree
representation to the minimum. This commit also allows for using multiple
PCIe ports, thanks to the recent driver updates, which support such
hierarchy. Restore original PCIe nodes in armada-385.dtsi and
apply necessary changes in hitherto unused armada-380.dtsi.

Submitted by:	Michal Mazur <mkm@semihalf.com>
		Marcin Wojtas <mw@semihalf.com>
Obtained from:	Semihalf
Sponsored by:	Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10907
2017-06-08 16:55:58 +00:00
Zbigniew Bodek
8595864992 Support multi-port PCIe hierarchy in Marvell boards DTS
This commit is another part of preparation for PCIe multi-port
support for Marvell SoCs. Some device trees include pcie-controller
node as a bus-parent of pcie nodes. This patch adds support for
new bus, collects and configures device informations and finally
adds PCIB devices as a childs of pcie-controller in Newbus hierarchy.

Submitted by:	Marcin Mazurek <mma@semihalf.com>
Obtained form:	Semihalf
Sponsored by:	Stormshield
Reviewed by:    https://reviews.freebsd.org/D10906
2017-06-08 16:54:02 +00:00
Zbigniew Bodek
73e48bc6d6 Fix PCIe window decoding on Armada 38x
Original PCIe nodes for Marvell SoCs consists of ports' nodes
under main controller node. In order to properly parse
this kind of representation in DT a mechanism for traversing
through the tree required an update. Moreover, processing FDT
data consisting of more than 2 cells had to be fixed,
because the 'reg' property of mrvl,pcie node have additional
parameter in front of 64-bit address. It should be skipped
by default. This commit works properly with old mrvl,pcie
representation for Kirkwood and ArmadaXP SoCs.

Submitted by:	Wojciech Macek <wma@semihalf.com>
		Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10905
2017-06-08 16:51:46 +00:00
Zbigniew Bodek
dc3b75aeef Enable MBUS bridge configuration in mv_rtc driver
This patch fixes sporadic problems with updating time
with mv_rtc driver by configuring access to it via MBUS.
For this purpose already existing second set of resources
in rtc@3800 node of Armada 38x DT is used.

Submitted by: Dominik Ermel <der@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10901
2017-06-08 16:48:09 +00:00
Zbigniew Bodek
054beaac09 Add reset capability to mv_rtc driver
This commit enables optional reset of the RTC, in case
its registers' contents did not sustain the reboot or power-off/on
sequence. Without it, further usage of RTC is impossible
(e.g. writing values to RTC_TIME register will not succeed).

The reset is performed only if Clock Correction register
does not comprise RTC_NOMINAL_TIMING, what helps to distinguish,
whether the software configured RTC before or it comprises
the default value.

Submitted by: Bartosz Szczepanek <bsz@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10900
2017-06-08 16:46:38 +00:00
John Baldwin
4bd7e351f1 Fix an off-by-one error in the VM page array on some systems.
r31386 changed how the size of the VM page array was calculated to be
less wasteful.  For most systems, the amount of memory is divided by
the overhead required by each page (a page of data plus a struct vm_page)
to determine the maximum number of available pages.  However, if the
remainder for the first non-available page was at least a page of data
(so that the only memory missing was a struct vm_page), this last page
was left in phys_avail[] but was not allocated an entry in the VM page
array.  Handle this case by explicitly excluding the page from
phys_avail[].

Reviewed by:	alc
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D11000
2017-06-08 16:18:41 +00:00
Alan Cox
86dd278f03 When allocating swap blocks, if the available number of free blocks in a
subtree is already zero, then setting the "largest contiguous free block"
hint for that subtree to anything other than zero makes no sense.  To be
clear, assigning a value to the hint that is too large is not a correctness
problem, only a pessimization.

Dragonfly BSD has applied the same change to blst_meta_alloc() but not
blst_meta_fill().

MFC after:	6 weeks
2017-06-08 15:48:54 +00:00
Dexuan Cui
6944b2e68b hyperv/pcib: use the device serial number as PCI domain
Currently the PCI domain is initialized with the instance GUID in
vmbus_pcib_attach(). It turns out the GUID can change across VM reboot,
while some users want a persistent value for PCI domain. The solution is
that we can change to use the device serial number, which starts with 1
and is unique within a VM.

Obtained from:	Haiyang Zhang
MFC after:	1 day
Sponsored by:	Microsoft
2017-06-08 12:11:30 +00:00
Gleb Smirnoff
3acfe1e1b0 This code was missing socket unlock and socket buffer lock, but it
worked since right now these two locks are the same.
2017-06-08 06:37:11 +00:00
Gleb Smirnoff
12d8a8e7a3 The desired lock here is socket buffer, not socket.
Right now they match, but won't in future.
2017-06-08 06:34:09 +00:00
Gleb Smirnoff
8d40bada3e Fix a degenerate case when soisdisconnected() would call soisconnected().
This happens when closing a socket with upcall, and trace is: soclose()->
... protocol ... -> soisdisconnected() -> socantrcvmore_locked() ->
sowakeup() -> soisconnected().

Right now this case is innocent for two reasons.  First, soisconnected()
doesn't clear SS_ISDISCONNECTED flag.  Second, the mutex to lock the
socket is the socket receive buffer mutex, and sodisconnected() first
disables the receive buffer. But in future code, the mutex to lock
socket is different to buffer mutex, and we would get undesired mutex
recursion.

The fix is to check SS_ISDISCONNECTED flag before calling upcall.
2017-06-08 06:16:47 +00:00
Marcelo Araujo
e0a6a23c6d Allow sysctl kern.vm_guest to return bhyve when running under bhyve.
Submitted by:	Sean Fagan <sef@ixsystems.com>
Reviewed by:	grehan
MFH:		4 weeks.
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11090
2017-06-08 04:02:14 +00:00
Justin Hibbits
864092bcaa Remove ARM and MIPS from linuxkpi ioremap_attr definition
ARM and MIPS fail universe builds.

ARM and MIPS are missing the following:
* VM_MEMATTR_WRITE_THROUGH
* VM_MEMATTR_WRITE_COMBINING

Pointy-hat to:	jhibbits
2017-06-08 02:44:34 +00:00
Bryan Drewery
3c4c0efd1d vm.defer_swapspace_pageouts was removed in r308474. 2017-06-07 19:36:17 +00:00
Justin Hibbits
287e7a861a Add more #ifdef arch checks to the linuxkpi
arm, mips, and powerpc all implement pmap_mapdev_attr() and pmap_unmapdev(),
so add those archs to the checks.  powerpc also includes the atomic_swap_*()
functions, so add that to the supported list as well.  Not tested except by
compiling powerpc.

Reviewed by:	markj
2017-06-07 18:08:11 +00:00
Alan Cox
d90bf7d850 Originally, this file could be compiled as a user-space application for
testing purposes.  However, over the years, various changes to the kernel
have broken this feature.  This revision applies some fixes to get user-
space compilation working again.  There are no changes in this revision
to code that is used by the kernel.

MFC after:	3 days
2017-06-07 16:04:34 +00:00
Kevin Lo
a8cf90eb9a Change R88E_EFUSE_MAX_LEN to use the same value as the vendor's driver
that contains the length of the efuse content.

Reviewed by:	avos
2017-06-07 09:10:24 +00:00
Gleb Smirnoff
b3244df799 Provide typedef for socket upcall function.
While here change so_gen_t type to modern uint64_t.
2017-06-07 01:48:11 +00:00
Gleb Smirnoff
b94f68dc52 Remove a piece of dead code. 2017-06-07 01:21:34 +00:00
Alan Cox
761097c85e Starting in r118390, swaponsomething() began to reserve the blocks at the
beginning of a swap area for a disk label.  However, neither r118390 nor
r118544, which increased the reservation from one to two blocks, correctly
accounted for these blocks when updating the variable "swap_pager_avail".
This change corrects that error.

Reviewed by:	kib
MFC after:	5 days
2017-06-06 16:52:07 +00:00
Hans Petter Selasky
25b3ef2c99 Fix init order in the LinuxKPI for IDR support after recent changes.
CPU_FOREACH() is not available until SI_SUB_CPU at SI_ORDER_ANY
when the LinuxKPI is loaded as part of the kernel.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-06-06 10:12:58 +00:00
Alan Cox
03bdd65f18 When the function blist_fill() was added to the kernel in r107913, the swap
pager used a different scheme for striping the allocation of swap space
across multiple devices.  And, although blist_fill() was intended to support
fill operations with large counts, the old striping scheme never performed a
fill larger than the stripe size.  Consequently, the misplacement of a
sanity check in blst_meta_fill() went undetected.  Now, moving forward in
time to r118390, a new scheme for striping was introduced that maintained a
blist allocator per device, but as noted in r318995, swapoff_one() was not
fully and correctly converted to the new scheme.  This change completes what
was started in r318995 by fixing the underlying bug in blst_meta_fill() that
stops swapoff_one() from simply performing a single blist_fill() operation.

Reviewed by:	kib
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D11043
2017-06-06 03:32:17 +00:00
Allan Jude
e28f9b7d03 Jails: Optionally prevent jailed root from binding to privileged ports
You may now optionally specify allow.noreserved_ports to prevent root
inside a jail from using privileged ports (less than 1024)

PR:		217728
Submitted by:	Matt Miller <mattm916@pulsar.neomailbox.ch>
Reviewed by:	jamie, cem, smh
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D10202
2017-06-06 02:15:00 +00:00
Alan Cox
3e78e98337 The variable "breakout" is used like a Boolean, so actually define it as
one.

Reviewed by:	kib
MFC after:	5 days
2017-06-05 18:07:56 +00:00
Alan Cox
064650c180 Halve the memory being internally allocated by the blist allocator. In
short, half of the memory that is allocated to implement the radix tree is
wasted because we did not change "u_daddr_t" to be a 64-bit unsigned int
when we changed "daddr_t" to be a 64-bit (signed) int.  (See r96849 and
r96851.)

Reviewed by:	kib, markj
Tested by:	pho
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D11028
2017-06-05 17:14:16 +00:00
Konstantin Belousov
3df7ebc4ed Add sysctl vfs.ino64_trunc_error controlling action on truncating
inode number or link count for the ABI compat binaries.

Right now, and by default after the change, too large 64bit values are
silently truncated to 32 bits.  Enabling the knob causes the system to
return EOVERFLOW for stat(2) family of compat syscalls when some
values cannot be completely represented by the old structures.  For
getdirentries(2), knob skips the dirents which would cause non-trivial
truncation of d_ino.

EOVERFLOW error is specified by the X/Open 1996 LFS document
('Adding Support for Arbitrary File Sizes to the Single UNIX
Specification').

Based on the discussion with:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-05 11:40:30 +00:00
Edward Tomasz Napierala
77c97f893a Remove unused tlb_write_random().
Sponsored by:	DARPA, AFRL
2017-06-05 11:04:22 +00:00
Edward Tomasz Napierala
ab1318b7ed Remove extraneous parentheses.
Sponsored by:	DARPA, AFRL
2017-06-05 10:59:47 +00:00
Adrian Chadd
d17a5d1bce [iwm] Remove support for fw older than -17 and -22
* iwm(4) didn't use any of these definitions yet, anyway.

Obtained from:	dragonflybsd.git f95003b8f1f7382c8396a6d408e3072632afdd3d
2017-06-04 21:28:52 +00:00
Adrian Chadd
d057daedda [iwmfw] bump built firmware now to version 22 for 7265D and 8000C. 2017-06-04 21:28:03 +00:00
Adrian Chadd
13811c2677 [iwmfw] 8000C ver 22 firmware. 2017-06-04 21:27:39 +00:00
Adrian Chadd
ab5bad9ee3 [iwmfw] add 7265D-22 firmware 2017-06-04 21:26:31 +00:00
Adrian Chadd
f0949f0cc6 [ath_hal] add USB reset PLL work around for AR9331/AR9344 (Hornet/Wasp.)
It turns out that this is useful on hornet and wasp SoCs but it isn't
enabled in ye olde HAL /unless/ you were using a version from one of the
business units building USB targetted devices.  It eventually got fixed
for all of them as people started wanting to use the USB ports on their
SoCs (eg for flash storage, bluetooth, 4G/LTE widgets, etc.)

This is actually a fix from ath9k but I'm merging it with the available-but-
disabled code in the QCA reference HAL.

Tested:

* AR9331 SoC
2017-06-04 21:21:44 +00:00
Adrian Chadd
50c5bb10c5 [iwm] Ignore IWM_DEBUG_LOG_MSG notifications.
* Firmware versions 21 and 22 generate some IWM_DEBUG_LOG_MSG notifications,
  which seem to be harmless. Avoid spamming the system log with
  "frame ... UNHANDLED (this should not happen)" messages.

Obtained from:	dragonflybsd.git dda889ac57d8e5b46bb1b1ecf53c17a18481c7c8
2017-06-04 21:14:23 +00:00
Adrian Chadd
51382483c4 [iwm] Set command code for PHY_DB as well.
Obtained from:	dragonflybsd.git 58318c956a74382d1286ccabaf767012fdcfe1a2
2017-06-04 21:13:13 +00:00
Adrian Chadd
eecff6a7e0 [iwm] Set correct state in smart-fifo configuration.
Obtained from:	dragonflybsd.git 666737f64b4f6dd42ffd9f0ace9fc46ccc1ebaab
2017-06-04 21:12:11 +00:00
Adrian Chadd
e470115fc5 [iwm] Remove dead code from iwm_pcie_load_cpu_sections().
* If device family is 8000 then iwm_pcie_load_cpu_sections()
  won't be called at all (iwm_pcie_load_cpu_sections_8000() is
  called in that case) so this piece of code never gets called.

Obtained from:	dragonflybsd.git 3e9aaef308100a4d630feffc131e3aca2ae12f8a
2017-06-04 21:11:28 +00:00
Adrian Chadd
19d956ec90 [iwm] Check for lar_disable tunable, and lar_enabled flag from NVM.
* LAR can be disabled with the hw.iwm.lar.disable tunable now.

* On Family 8000 devices we need to check the lar_enabled flag from
  nvm_data in addition to the TLV_CAPA_LAR_SUPPORT flag from the firmware.

* Add a separate IWM_DEBUG_LAR debugging flag.

Obtained from:	dragonflybsd.git 0593e39cb295aa996ecf789ed4990c3b255f1770
2017-06-04 21:10:14 +00:00
Adrian Chadd
cd684deca9 [iwm] Move Smart Fifo handling into if_iwm_sf.c, sync with Linux iwlwifi.
* This change also fixes a possible issue in the existing smart-fifo code,
  which set the IWM_SF_CFG_DUMMY_NOTIF_OFF bit on AC8260 chipsets, although
  that's only used in iwlwifi for Family 8000 chipsets connected via SDIO
  interface.

Obtained from:	Dragonflybsd.git cb650b01526b0aeef3c4307d926e7f1428997d50
2017-06-04 21:05:58 +00:00
Dmitry Chagin
51d93426d9 On success, getrandom() Linux system call returns the number of bytes that
were copied to the buffer supplied by the user.

PR:           219464
Submitted by: Maciej Pasternacki
Reported by:  Maciej Pasternacki
MFC after:    1 week
2017-06-04 18:35:30 +00:00
Dmitry Chagin
e2e6a2a1b6 Revert r319053 due to lack of sence. As pointed out by kib@ opt_global.h
contains such fundamental settings as e.g. SMP option and fake
opt_global.h almost never match real configured kernels.

Reported by:	kib@
2017-06-04 18:24:41 +00:00
Andrew Turner
a29b35dd5e Start to rename files with common or generic names to be SoC specific. The
build system doesn't handle two files with the same name.
2017-06-04 09:11:14 +00:00
Conrad Meyer
109f344bb6 ext2fs(4): Fix a null dererence and clean an unclear switch
Coverity warned that the switch statement fell through.  While this was
intentional, the pattern wasn't especially clear.  I just changed it to a
conventional if pattern.

Reported by:	Coverity
CIDs:		1375851 (false positive), 1375853
Sponsored by:	Dell EMC Isilon
2017-06-03 22:39:50 +00:00
Conrad Meyer
a8b9a63980 ext2fs(4): Fix a number of broken flag checks
Introduced in r319071.

Reported by:	Coverity
CIDs:		1375847, 1375848, 1375849
Sponsored by:	Dell EMC Isilon
2017-06-03 22:30:30 +00:00
Michael Tuexen
8cb5a8e90a Fix the ICMP6 handling for TCP.
The ICMP6 packets might not be contained in a single mbuf. So don't
assume this. Keep the IPv4 and IPv6 code in sync and make explicit
that the syncache code only need the TCP sequence number, not the
complete TCP header.

MFC after:		3 days
Sponsored by:		Netflix, Inc.
2017-06-03 21:53:58 +00:00
Andrew Turner
d9f504454a Port the Vybrid code to PLATFORM to help move it into GENERIC. 2017-06-03 20:14:46 +00:00
Andrew Turner
6f5db7909a Port the Samsung ARM code to use PLATFORM and PLATFORM_SMP. This will help
move it into the GENERIC kernel config.
2017-06-03 20:02:12 +00:00
Andrew Turner
dc59c854ca Port the Xilinx code to use PLATFORM and PLATFORM_SMP. This will help move
it to be part of the armv6 GENERIC kernel.
2017-06-03 19:11:32 +00:00
Alan Cox
d118a00f3c Eliminate duplication of the pmap and pv list unlock operations in
pmap_enter() by implementing a single return path.  Otherwise, the
duplication will only increase with the upcoming support for psind == 1.

Reviewed by:	kib (some time ago)
2017-06-03 17:24:13 +00:00
Andrew Turner
e23ce9ef7a Stop making cpu_initclocks weak when using event timers. A weak symbol
could be overridden in the SoC specific code, but this would break GENERIC
as it is likely to be incorrect.

Remove the versatile implementation of cpu_initclocks as it's unneeded.
2017-06-03 16:24:17 +00:00
Alan Cox
d712b799b5 The data type returned by vmoff() is too narrow in its range. This could
break the transmission of files longer than 4 GB on 32-bit architectures.

Reviewed by:	glebius, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10019
2017-06-03 16:19:33 +00:00
Konstantin Belousov
698f05ab95 Mitigate several problems with the softdep_request_cleanup() on busy
host.

Problems start appearing when there are several threads all doing
operations on a UFS volume and the SU workqueue needs a cleanup.  It is
possible that each thread calling softdep_request_cleanup() owns the
lock for some dirty vnode (e.g. all of them are executing mkdir(2),
mknod(2), creat(2) etc) and all vnodes which must be flushed are locked
by corresponding thread. Then, we get all the threads simultaneously
entering softdep_request_cleanup().

There are two problems:
- Several threads execute MNT_VNODE_FOREACH_ALL() loops in parallel.  Due
  to the locking, they quickly start executing 'in phase' with the speed
  of the slowest thread.
- Since each thread already owns the lock for a dirty vnode, other threads
  non-blocking attempt to lock the vnode owned by other thread fail,
  and loops executing without making the progress.
Retry logic does not allow the situation to recover.  The result is
a livelock.

Fix these problems by making the following changes:
- Allow only one thread to enter MNT_VNODE_FOREACH_ALL() loop per mp.
  A new flag FLUSH_RC_ACTIVE guards the loop.
- If there were failed locking attempts during the loop, abort retry
  even if there are still work items on the mp work list.  An
  assumption is that the items will be cleaned when other thread
  either fsyncs its vnode, or unlock and allow yet another thread to
  make the progress.

It is possible now that some calls would get undeserved ENOSPC from
ffs_alloc(), because the cleanup is not aggressive enough. But I do
not see how can we reliably clean up workitems if calling
softdep_request_cleanup() while still owning the vnode lock. I thought
about scheme where ffs_alloc() returns ERESTART and saves the retry
counter somewhere in struct thread, to return to the top level, unlock
the vnode and retry.  But IMO the very rare (and unproven) spurious
ENOSPC is not worth the complications.

Reported and tested by:	pho
Style and comments by:	mckusick
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-06-03 16:18:50 +00:00
Andrew Turner
95bae5e40c Add MULTIDELAY support to the mpcore timer driver. This is needed when
using this with GENERIC.

While here remove the weak symbol, it doesn't seem to be needed anymore.
2017-06-03 15:56:54 +00:00
Andrew Turner
78527f28d3 Add MULTIDELAY support to the sp804 driver. 2017-06-03 15:48:03 +00:00
Andrew Turner
35c4874e8d Add MULTIDELAY to the Beaglebone kenrel config to help moving it to GENERIC. 2017-06-03 15:40:34 +00:00
Andrew Turner
b32611238e Enable MULTIDELAY in the i.MX5 kernel configs. This will help adding them
to GENERIC.
2017-06-03 15:39:23 +00:00
Andrew Turner
62e2e02278 Remove RT1310 from universe as it fails to build. 2017-06-03 14:45:46 +00:00
Konstantin Belousov
4cbc378c61 Clean possible td_su reference on the struct mount being unmounted as
the last step of ffs_unmount().

It is possible that the mount point is recorded for cleanup in AST
context while softdep flush is executed during unmount.  The workitems
are flushed by other means for the unmount, but the stray reference to
struct mount blocks destruction of mount.  Check for the situation and
manually call vfs_rel() before returning from ffs_unmount().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-03 14:15:14 +00:00
Konstantin Belousov
a7ca2c6ad0 Ensure that cached struct thread does not keep spurious td_su
reference on an UFS mount point.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-03 14:12:17 +00:00
Andrew Turner
08cda712b4 Make it an error to build armv6 without INTRNG enabled. Most kernel configs
have been updated for this, with the exception of the two marked as
NO_UNIVERSE in r319514.
2017-06-03 10:40:45 +00:00
Andrew Turner
5dc97550fb Mark the non-INTRNG armv6 configs with NO_UNIVERSE to prepare for INTRNG
being always enabled on armv6.
2017-06-03 10:38:41 +00:00
Ed Maste
873ed6f008 linux vdso: pass -fPIC to the assembler, not linker
-fPIC has no effect on linking although it seems to be ignored by
GNU ld.bfd.  However, it causes ld.lld to terminate with an invalid
argument error.

This is equivalent to r296057 but for the kernel (not modules) case.

MFC after:	2 months
Sponsored by:	The FreeBSD Foundation
2017-06-03 03:40:11 +00:00
Ed Maste
6a1c2e1fce msdosfs: use mem{cpy,move,set} instead of bcopy,bzero
This somewhat simplifies use of msdosfs code in userland (for makefs),
reduces diffs with NetBSD and is standard C as of C89.

Reviewed by:	imp
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D11014
2017-06-02 18:39:53 +00:00
Navdeep Parhar
a59a14773a cxgbe(4): Update the statistics for compound tx work requests once per
work request, not once per frame.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-06-02 17:57:27 +00:00
Gleb Smirnoff
971af2a311 Rename accept filter getopt/setopt functions, so that they are prefixed
with module name and match other functions in the module.  There is no
functional change.
2017-06-02 17:49:21 +00:00
Gleb Smirnoff
810951ddc9 Style: unwrap lines that doesn't have a good reason to be wrapped. 2017-06-02 17:43:47 +00:00
Gleb Smirnoff
bd617e3b98 Remove write only flag UNP_HAVEPCCACHED. 2017-06-02 17:39:05 +00:00
Gleb Smirnoff
0c3c207ffd For UNIX sockets make vnode point not to the socket, but to the UNIX PCB,
since the latter is the thing that links together VFS and sockets.

While here, make the union in the struct vnode anonymous.
2017-06-02 17:31:25 +00:00
Hans Petter Selasky
67e984c8f2 Improve kqueue() support in the LinuxKPI. Some applications using the
kqueue() does not set non-blocking I/O mode for event driven read of
file descriptors. This means the LinuxKPI internal kqueue read and
write event flags must be updated before the next read and/or write
system call. Else the read and/or write system call may block. This
can happen when there is no more data to read following a previous
read event. Then the application also gets blocked from processing
other events. This situation can also be solved by the applications
setting and using non-blocking I/O mode.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-02 16:52:18 +00:00
Hans Petter Selasky
639af71ab1 Add support for setting the non-blocking I/O flag for LinuxKPI
character devices. In Linux the FIONBIO IOCTL is handled by the kernel
and not the drivers. Also need return success for the FIOASYNC ioctl
due to existing logic in kern_fcntl() even though it is not supported
currently.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-02 16:30:40 +00:00
Luiz Otavio O Souza
33c2a3cb16 style(9) fixes, remove unnecessary headers, remove duplicate #defines and
in some cases, shuffle the code around to simplify locking.

No functional changes.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-02 15:12:32 +00:00
Olivier Houchard
5bb27fe15c - Don't bother flushing the data cache for pages we're about to unmap, there's
no need to.
- Remove pmap_is_current(), pmap_[pte|l3]_valid_cacheable as there were only
used to know if we had to write back pages.
- In pmap_remove_pages(), don't bother invalidating each page in the TLB,
we're about to flush the whole TLB anyway.

This makes make world 8-9% faster on my hardware.

Reviewed by:	andrew
2017-06-02 14:17:14 +00:00
Andrew Turner
88df15adf6 Fix device lookup of for the stdout-path chosen property.
The stdout-path chosen property may include the serial connection details,
e.g. the baud rate. When passing the device to OF_finddevice we need to
strip off this information as it will cause the lookup to fail.

Reviewed by:	emaste, manu
Differential Revision:	https://reviews.freebsd.org/D6846
2017-06-02 14:01:17 +00:00
Colin Percival
c74415ed3b Skip setting the MTU in the netfront driver (xn# devices) if the new MTU
is the same as the old MTU.  In particular, on Amazon EC2 "T2" instances
without this change, the network interface is reinitialized every 30
minutes due to the MTU being (re)set when a new DHCP lease is obtained,
causing packets to be dropped, along with annoying syslog messages about
the link state changing.

As a side note, the behaviour this commit fixes was responsible for
exposing the locking problems fixed via r318523 and r318631.

Maintainers of other network interface drivers may wish to consider making
the corresponding change; the handling of SIOCSIFMTU does not seem to
exhibit a great deal of consistency between drivers.

MFC after:	1 week
2017-06-02 07:03:31 +00:00
Andriy Voskoboinyk
5acae76adf rtwn: drop obsolete (since r319460) code.
Tested with RTL8188EU, STA mode.
2017-06-01 21:20:44 +00:00
Andriy Voskoboinyk
2db223f902 net80211: initialize i_seq for A-MPDU frames.
Fragment number field (part of i_seq) is used for AAD calculation;
as a result, without this patch every driver without h/w crypto support
need to clear it before ieee80211_crypto_encap().

Also fixes rtwn(4) A-MPDU Tx with dev.rtwn.%d.hwcrypto tunable
set to 0 (h/w crypto is disabled).

Tested with:
 * Intel 6205, STA mode.
 * RTL8188EU, STA mode.

Differential Revision:	https://reviews.freebsd.org/D10753
2017-06-01 20:46:43 +00:00
Gleb Smirnoff
1431a74845 As old prophecy says, some day UMA_DEBUG printfs shall be made CTRs. 2017-06-01 18:36:52 +00:00
Gleb Smirnoff
ac0a6fd015 Simplify boot pages management in UMA.
It is simply a contigous virtual memory pointer and number of pages.
There is no need to build a linked list here.  Just increment pointer
and decrement counter.  The only functional difference to old allocator
is that before we gave pages from topmost and down to lowest, and now
we give them in normal ascending order.

While here remove padalign from a mutex that is unused at runtime.

Reviewed by:	alc
2017-06-01 18:26:57 +00:00
Hans Petter Selasky
8600ba1aa9 Make sure the selrecord() function is only called from within system
polling contexts in the LinuxKPI.

After the kqueue() support was added to the LinuxKPI in r319409 the
Linux poll file operation will be used outside the system file polling
callback function, which can cause a NULL-pointer panic inside
selrecord() because curthread->td_sel is set to NULL. This patch moves
the selrecord() call away from poll_wait() and to the system file poll
callback function in the LinuxKPI, which essentially wraps the Linux
one. This is similar to what the cuse(3) module is currently doing.
Refer to sys/fs/cuse/*.[ch] for more details.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-01 16:49:48 +00:00
Michael Tuexen
98732609d5 Improve comments to describe what the code does.
Reported by:		jtl
Sponsored by:		Netflix, Inc.
2017-06-01 15:11:18 +00:00
Hans Petter Selasky
5a2866e9b1 Allow communication between functions on the same host when using the
mlx4en(4) driver in SRIOV mode.

Place a copy of the destination MAC address in the send WQE only under
SRIOV/eSwitch configuration or when the device is in selftest. This
allows communication between functions on the same host.

PR:			216493
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-06-01 10:44:48 +00:00
Hans Petter Selasky
156b40b62b Free hardware queue resource after port is stopped in the mlx4en(4)
driver. Else if the port is up the resource might still be busy and
the MTT free will fail.

PR:			216493
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-06-01 10:39:00 +00:00
Andrey V. Elsukov
8e5060a03e Build kdebug_secreplay() function only when IPSEC_DEBUG is defined.
This should fix the build on sparc.

Reported by:	emaste
X-MFC with:	r319118
2017-06-01 10:04:12 +00:00
Hans Petter Selasky
328c75d621 Translate the ERESTARTSYS error code into ERESTART in the LinuxKPI
ioctl(), read() and write() system call handlers. This error code is
internal to the kernel and should not be seen by user-space programs
according to Linux.

Submitted by:		Yanko Yankulov <yanko.yankulov@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-01 09:53:55 +00:00
Hans Petter Selasky
a6b28ee02a Add generic kqueue() and kevent() support to the LinuxKPI character
devices. The implementation allows read and write filters to be
created and piggybacks on the poll() file operation to determine when
a filter should trigger. The piggyback mechanism is simply to check
for the EWOULDBLOCK or EAGAIN return code from read(), write() or
ioctl() system calls and then update the kqueue() polling state bits.
The implementation is similar to the one found in the cuse(3) module.
Refer to sys/fs/cuse/*.[ch] for more details.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-06-01 09:34:51 +00:00
Xin LI
6448ec89e7 * limit size of buffers to RPC_MAXDATASIZE
* don't leak memory
 * be more picky about bad parameters

From:

https://raw.githubusercontent.com/guidovranken/rpcbomb/master/libtirpc_patch.txt
https://github.com/guidovranken/rpcbomb/blob/master/rpcbind_patch.txt

via NetBSD.

Reviewed by:	emaste, cem (earlier version)
Differential Revision:	https://reviews.freebsd.org/D10922
MFC after:	3 days
2017-06-01 06:12:25 +00:00
Jung-uk Kim
af05116143 Merge ACPICA 20170531. 2017-06-01 00:01:19 +00:00
Stephen J. Kiernan
9a81ba0f24 Add MD_VERIFY option to enable O_VERIFY in open for vnode type.
Add -o [no]verify option to mdconfig (and document in man page.)
Implement GEOM attribute MNT::verified to ask md if the backing vnode is
  verified.
Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if
  the underlying device has been verified.

Reviewed by:	rwatson
Approved by:	sjg (mentor)
Obtained from:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D2902
2017-05-31 21:18:11 +00:00
Hans Petter Selasky
9a83b0971e Minor code optimisation.
Avoid locking the global CUSE lock when the polling flags are zero.

MFC after:	1 week
2017-05-31 21:05:24 +00:00
Imre Vadász
f8b883c132 Fix typo in Driver Type A/C/D capability checks in sdhci.
Use the SDHCI_CAN_DRIVE_TYPE_A/_C/_D masks to check for Driver Type support,
instead of using the SDHCI_CTRL2_DRIVER_TYPE_A/_C/_D values which are meant
for setting the Driver Type in the HOST_CONTROL2 register.

Approved by:	adrian (mentor), jmcneill
Differential Revision:	https://reviews.freebsd.org/D10999
2017-05-31 19:20:27 +00:00
Adrian Chadd
bf3bb1b028 [ar71xx] rename AR724X_BASE -> std.AR724X 2017-05-31 16:32:33 +00:00
Hans Petter Selasky
c2676069cb Implement print_hex_dump(), print_hex_dump_bytes() and
printk_ratelimited() in the LinuxKPI.

While at it fix the inclusion guard of printk.h to be similar to the
rest of the LinuxKPI header files.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 16:24:02 +00:00
Hans Petter Selasky
427cefde27 Properly implement idr_preload() and idr_preload_end() in the
LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 16:08:30 +00:00
Hans Petter Selasky
dff36e69a1 Implement in_atomic() function in the LinuxKPI.
Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 15:05:44 +00:00
Hans Petter Selasky
90b30e6560 Properly set the .d_name field in the cdevsw structure for the
LinuxKPI.

Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:11:06 +00:00
Hans Petter Selasky
d56f1ed887 Make sure the VMAP's "vm_file" field is referenced in a Linux
compatible way by the linux_dev_mmap_single() function in the
LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:07:05 +00:00
Hans Petter Selasky
cca15f28c5 Remove the VMA handle from its list before calling the LinuxKPI VMA
close operation to prevent other threads from reusing the VM object
handle pointer.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:05:54 +00:00
Hans Petter Selasky
68b9f2f00c Don't acquire a reference on the VM-space when allocating the LinuxKPI
task structure to avoid deadlock when tearing down the VM object
during a process exit.

Found by:		markj @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:01:27 +00:00
Hans Petter Selasky
ea67550be0 Fix a reference count leak in the LinuxKPI due to calling VM open when
it shouldn't be called.

Background:
The Linux VM open operation is called when a new VMA is
created on top of the current VMA. This is done through either mremap
flow or split_vma, usually due to mlock, madvise, munmap and so
on. This is currently not supported by the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 12:08:25 +00:00
Hans Petter Selasky
f5a9867b7d Fixes for refcounting "struct linux_file" in the LinuxKPI.
- Allow "struct linux_file" to be refcounted when its "_file" member
  is NULL by using its "f_count" field. The reference counts are
  transferred to the file structure when the file descriptor is
  installed.

- Add missing vdrop() calls for error cases during open().

- Set the "_file" member of "struct linux_file" during open. This
allows use of refcounting through get_file() and fput() with LinuxKPI
character devices.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 12:02:59 +00:00
Hans Petter Selasky
3f743d782a Make sure the thread's priority is restored for all three cases inside
linux_synchronize_rcu_cb() in the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 10:01:15 +00:00
Konstantin Belousov
a8e7f543af Fix bug in r318997: remove the line which overrides vn_fsid()
calculation.

Noted by:	jhb
Reviewed by:	rmacklem
Sponsored by:	The FreeBSD Foundation
2017-05-30 21:20:54 +00:00
Mark Johnston
cb564d2436 Add some miscellaneous definitions to support DRM drivers.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10985
2017-05-30 17:16:08 +00:00
Jonathan T. Looney
8b07e00e99 Fix an unnecessary/incorrect check in the PKTOPT_EXTHDRCPY macro.
This macro allocates memory and, if malloc does not return NULL, copies
data into the new memory. However, it doesn't just check whether malloc
returns NULL. It also checks whether we called malloc with M_NOWAIT. That
is not necessary.

While it may be that malloc() will only return NULL when the M_NOWAIT flag
is set, we don't need to check for this when checking malloc's return
value. Further, in this case, the check was not completely accurate,
because it checked for flags == M_NOWAIT, rather than treating it as a bit
field and checking for (flags & M_NOWAIT).

Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10942
2017-05-30 14:50:28 +00:00
Jonathan T. Looney
fb04394554 Fix two places in the ICMP6 code where we could dereference a NULL pointer
in the icmp6_input() function.

When processing an ICMP6_ECHO_REQUEST, if IP6_EXTHDR_GET fails, it will
set nicmp6 and n to NULL. Therefore, we should condition our modification
to nicmp6 on n being not NULL.

And, when processing an ICMP6_WRUREQUEST in the (mode != FQDN) case, if
m_dup_pkthdr() fails, the code will set n to NULL. However, the very next
line dereferences n. Therefore, when m_dup_pkthdr() fails, we should
discontinue further processing and follow the same path as when m_gethdr()
fails.

Reported by:	clang static analyzer
Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10941
2017-05-30 14:41:31 +00:00
Jonathan T. Looney
382a6bbcf1 Enforce the limit on ICMP messages before doing work to formulate the
response.

Delete an unneeded rate limit for UDP under IPv6. Because ICMP6
messages have their own rate limit, it is unnecessary to apply a
second rate limit to UDP messages.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10387
2017-05-30 14:32:44 +00:00
Andriy Gapon
cae91bbe96 fix indentation
MFC after:	4 days
2017-05-30 13:53:03 +00:00
Zbigniew Bodek
416e886499 Introduce additional locks when releasing TX resources and buffers in ENA
There could be race condition with TX cleaning routine when cleaning mbufs,
when it was called directly from main sending thread (ena_mq_start).

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10927
2017-05-30 12:00:56 +00:00
Zbigniew Bodek
b9252a8889 Move ENA's hw stats updating routine to separate task
Initially, stats were being updated each time OS was requesting for
the first statistic.
To read statistics from hw, condvar was used. cv_timedwait cannot be
called when unsleepable lock is held, and this happens when FreeBSD
is requesting statistic.
Seperate task is reading statistics from NIC each 1 second.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10926
2017-05-30 11:58:51 +00:00
Zbigniew Bodek
081169f24c Add error handling to the ENA driver if init of the reset task fails
Also, to simplify cleaning routine, reset task is initialized before
allocating statistics and other resources.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10925
2017-05-30 11:56:54 +00:00
Zbigniew Bodek
e67c655431 Add locks before each ena_up and ena_down
Lock only ena_up and ena_down calls in ioctl handler, instead of whole
ioctl. Locking ioctl with sx lock that is sleepable, is not allowed in
some cases, e.g. when multicast options are being changed.
Additional locking was added in deatch function to prevent race condition
with ioctl function.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10924
2017-05-30 11:55:02 +00:00
Zbigniew Bodek
1e9fb89962 Add mbuf defragmentation to the ENA driver
When mbuf chain is too long and device cannot handle that number
of segments in DMA transaction, mbuf chain will be defragmented.
Initially, driver was dropping all mbuf chains that were exceeding
supported number of segments.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10923
2017-05-30 11:53:18 +00:00
Mateusz Guzik
c7a6a1b325 mtx: fix whitespace damage in _mtx_trylock_flags_
MFC after:	3 days
2017-05-30 02:25:47 +00:00