117014 Commits

Author SHA1 Message Date
Andriy Gapon
04b7c6b337 MFV r316928: 7256 low probability race in zfs_get_data
illumos/illumos-gate@0c94e1af67
0c94e1af67

https://www.illumos.org/issues/7256
                         error = dmu_sync(zio, lr->lr_common.lrc_txg,
                              zfs_get_done, zgd);
                         ASSERT(error || lr->lr_length <= zp->z_blksz);
  It's possible, although extremely rare, that the zfs_get_done() callback is
  executed before dmu_sync() returns.
  In that case the znode's range lock is dropped and the znode is unreferenced.
  Thus, the assertion can access some invalid or wrong data via the zp pointer.
  size variable caches the correct value of z_blksz and can be safely used here.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>

MFC after:	1 week
2017-05-26 10:31:05 +00:00
Andriy Gapon
7a94dd7aee MFC r316924: 8061 sa_find_idx_tab can be declared more type-safely
illumos/illumos-gate@7f0bdb4257
7f0bdb4257

https://www.illumos.org/issues/8061
  sa_find_idx_tab() is declared as taking and returning "void *" parameters.
  These can be declared to be the specific types.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	1 week
2017-05-26 10:27:35 +00:00
Adrian Chadd
7b6899bf2a [ath] fix short-GI wireshark flag.
Yes, HAL_RX_GI means "short guard interval."
2017-05-26 00:48:21 +00:00
Alexander Motin
e3d90506c4 Remove some code, dead from the day one. 2017-05-25 23:19:09 +00:00
Stephen McConnell
327f2e6c56 Fix several problems with mapping code.
Reviewed by:    ken, scottl, asomers, ambrisko, mav
Approved by:	ken, mav
MFC after:      1 week
Differential Revision: https://reviews.freebsd.org/D10861
2017-05-25 19:20:06 +00:00
Stephen McConnell
635e58c715 Fix several problems with mapping code.
Reviewed by:    ken, scottl, asomers, ambrisko, mav
Approved by:	ken, mav
MFC after:      1 week
Differential Revision: https://reviews.freebsd.org/D10878
2017-05-25 19:14:44 +00:00
Zbigniew Bodek
26872c13ce Unmask legacy interrupts on Marvell PCIE controller
This patch fixes a bug introduced with commit:
r294510  "Remove an extra '!' found by clang 3.8."

'!' was removed without inverting the logic, which
broke PCIe legacy interrupts operation for Marvell
controllers.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Netgate
2017-05-25 14:34:21 +00:00
Zbigniew Bodek
fa5f501d0a Add workaround for CESA MBUS windows with 4GB DRAM
Armada 38x SoC's equipped with 4GB DRAM suffer freeze
during CESA operation, if MBUS window opened at given
DRAM CS reaches end of the address space. Apply a workaround
by setting the window size to the closest possible
value, i.e. divide it by 2 (it has to be power-of-2).

Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10724
2017-05-25 14:25:05 +00:00
Zbigniew Bodek
0c79c0b138 Fix PM recognition on recent Marvell boards
PM status is only supported on Kirkwood and Disvovery.
Cleanup the code to properly report its state on
other platforms.

Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10718
2017-05-25 14:23:49 +00:00
Zbigniew Bodek
92ce47d94e Introduce separate watchdog driver for Armada to fix phony DELAY
DELAY is a problematic routine called all over the kernel.
Armada38x using CA-9 CPUs are using mpcore timer to count events
and measure time but DELAY in the mpcore timer code is a weak
function reference and therefore will be replaced by the platform
implementation if the one is introduced. Since Armada38x uses
on-chip watchdog to which the driver is merged with the on-chip timer
driver there will be a platform DELAY implementation.
The latter however will not use any HW timers as it will not attempt
to configure any. Phony busy loop will be used instead.

To fix that we introduce a separate watchdog driver for Armada platforms,
(currently only A38X) and stop using Marvell timer driver. That
switches DELAY to the desired implementation.

Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10710
2017-05-25 14:22:00 +00:00
Zbigniew Bodek
bb98396b47 Enable SCU Speculative linefills to L2 on Armada 38x
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10709
2017-05-25 14:19:20 +00:00
Zbigniew Bodek
70d163328d Fix memory corruption while configuring CPU windows on Marvell SoCs
Resolving CPU windows from localbus entry caused buffer overflow
and memory corruption. Fix wrong indexing and ensure the index
does not exceed table size.

Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10720
2017-05-25 14:16:43 +00:00
Andriy Gapon
ced98d784b fix vmxnet3 crash when LRO is enabled
The crash can occur when all of the following conditions are true:
- a packet consists of multiple segements (requires LRO enabled)
- there has been a failure to allocate an mbuf for the packet and
  the packet has to be dropped
- a host (vmware) still owned at least one segment of the packet,
  so the driver had to wait for another interrupt to proceed to
  discarding the remaning segment(s)

Reviewed by:	rstone
MFC after:	2 weeks
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D10874
2017-05-25 10:49:56 +00:00
Hans Petter Selasky
3f9dcc588d Declare the "snd_fxdiv_table" once. This shaves around 24Kbytes of
binary data from sound.ko and the kernel.

MFC after:		3 days
2017-05-25 05:23:47 +00:00
Adrian Chadd
f46839b9e3 [ath] [ath_hal] retire AH_SUPPORT_AR5416 changing anything.
Yes, the memory bloat is large, but it's 2017 and I'll fix it later
by making it runtime configurable / per-chip configurable if I ever need to.
2017-05-25 04:26:26 +00:00
Adrian Chadd
41059135ce [ath] [ath_hal] (etc, etc) - begin the task of re-modularising the HAL.
In the deep past, when this code compiled as a binary module, ath_hal
built as a module.  This allowed custom, smaller HAL modules to be built.
This was especially beneficial for small embedded platforms where you
didn't require /everything/ just to run.

However, sometime around the HAL opening fanfare, the HAL landed here
as one big driver+HAL thing, and a lot of the (dirty) infrastructure
(ie, #ifdef AH_SUPPORT_XXX) to build specific subsets of the HAL went away.
This was retained in sys/conf/files as "ath_hal_XXX" but it wasn't
really floated up to the modules themselves.

I'm now in a position where for the reaaaaaly embedded boards (both the
really old and the last couple generation of QCA MIPS boards) having a
cut down HAL module and driver loaded at runtime is /actually/ beneficial.

This reduces the kernel size down by quite a bit.  The MIPS modules look
like this:

adrian@gertrude:~/work/freebsd/head-embedded/src % ls -l ../root/mips_ap/boot/kernel.CARAMBOLA2/ath*ko
-r-xr-xr-x  1 adrian  adrian    5076 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_dfs.ko
-r-xr-xr-x  1 adrian  adrian  100588 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_hal.ko
-r-xr-xr-x  1 adrian  adrian  627324 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_hal_ar9300.ko
-r-xr-xr-x  1 adrian  adrian  314588 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_main.ko
-r-xr-xr-x  1 adrian  adrian   23472 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_rate.ko

And the x86 versions, like this:

root@gertrude:/home/adrian # ls -l /boot/kernel/ath*ko
-r-xr-xr-x  1 root  wheel   36632 May 24 18:32 /boot/kernel/ath_dfs.ko
-r-xr-xr-x  1 root  wheel  134440 May 24 18:32 /boot/kernel/ath_hal.ko
-r-xr-xr-x  1 root  wheel   82320 May 24 18:32 /boot/kernel/ath_hal_ar5210.ko
-r-xr-xr-x  1 root  wheel  104976 May 24 18:32 /boot/kernel/ath_hal_ar5211.ko
-r-xr-xr-x  1 root  wheel  236144 May 24 18:32 /boot/kernel/ath_hal_ar5212.ko
-r-xr-xr-x  1 root  wheel  336104 May 24 18:32 /boot/kernel/ath_hal_ar5416.ko
-r-xr-xr-x  1 root  wheel  598336 May 24 18:32 /boot/kernel/ath_hal_ar9300.ko
-r-xr-xr-x  1 root  wheel  406144 May 24 18:32 /boot/kernel/ath_main.ko
-r-xr-xr-x  1 root  wheel   55352 May 24 18:32 /boot/kernel/ath_rate.ko

.. so you can see, not building the whole HAL can save quite a bit.
For example, if you don't need AR9300 support, you can actually avoid
wasting half a megabyte of RAM.  On embedded routers this is quite a
big deal.

The AR9300 HAL can be later further shrunk because, hilariously,
it indeed supports AH_SUPPORT_<xxx> for optionally adding chipset support.
(I'll chase that down later as it's quite a big savings if you're only
building for a single embedded target.)

So:

* Create a very hackish way to load/unload HAL modules
* Create module metadata for each HAL subtype - ah_osdep_arXXXX.c
* Create module metadata for ath_rate and ath_dfs (bluetooth is
  currently just built as part of it)
* .. yes, this means we could actually build multiple rate control
  modules and pick one at load time, but I'd rather just glue this
  into net80211's rate control code.  Oh well, baby steps.
* Main driver is now "ath_main"
* Create an "if_ath" module that does what the ye olde one did -
  load PCI glue, main driver, HAL and all child modules.
  In this way, if you have "if_ath_load=YES" in /boot/modules.conf
  it will load everything the old way and stuff should still work.
* For module autoloading purposes, I actually /did/ fix up
  the name of the modules in if_ath_pci and if_ath_ahb.

If you want to selectively load things (eg on ye cheape ARM/MIPS platforms
where RAM is at a premium) you should:

* load ath_hal
* load the chip modules in question
* load ath_rate, ath_dfs
* load ath_main
* load if_ath_pci and/or if_ath_ahb depending upon your particular
  bus bind type - this is where probe/attach is done.

TODO:

* AR5312 module and associated pieces - yes, we have the SoC side support
  now so the wifi support would be good to "round things out";
* Just nuke AH_SUPPORT_AR5416 for now and always bloat the packet
  structures; this'll simplify other things.
* Should add a simple refcnt thing to the HAL RF/chip modules so you
  can't unload them whilst you're using them.
* Manpage updates, UPDATING if appropriate, etc.
2017-05-25 04:18:46 +00:00
Andriy Gapon
8816c0bb48 MFV r316925: 6101 attempt to lzc_create() a filesystem under a volume results in a panic
illumos/illumos-gate@b127fe3c05
b127fe3c05

https://www.illumos.org/issues/6101
  lzc_create(), or more correctly, zfs_ioc_create() does not reject an attempt to
  create a filesystem as a child of a volume, instead it proceeds to a crash.
  A crash stack obtained on FreeBSD:
  page fault while in kernel mode

  zap_leaf_lookup()
  fzap_lookup()
  zap_lookup_norm()
  zap_lookup()
  zfs_get_zplprop()
  zfs_fill_zplprops_impl()
  zfs_ioc_create()
  zfsdev_ioctl()
  devfs_ioctl_f()
  kern_ioctl()
  sys_ioctl()
  This crash happened with a kernel without debugging assertions.
  The immediate cause of crash appears to an attempt to interpret a zvol object
  as a zap object.
  For filesystems:
  #define MASTER_NODE_OBJ 1
  For zvols:
  #define ZVOL_OBJ                1ULL
  #define ZVOL_ZAP_OBJ            2ULL
  So, I see two problems here:
     1. an attempt to create a filesystem under a zvol should be rejected as
        early as possible, maybe in zfs_fill_zplprops()
     2. maybe zap_lookup / zap_lockdir should reject objects that are not of one
        of the zap object types

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	2 weeks
2017-05-24 22:34:54 +00:00
Andriy Gapon
e73f9f8a49 MFV r316923: 8026 retire zfs_throttle_delay and zfs_throttle_resolution
illumos/illumos-gate@6b03625981
6b03625981

https://www.illumos.org/issues/8026
  zfs_throttle_delay and zfs_throttle_resolution became disused since the new
  write throttling mechanism was introduced.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
2017-05-24 22:32:56 +00:00
Andriy Gapon
9fe5e04dfc MFC r316921: 8027 tighten up dsl_pool_dirty_delta
illumos/illumos-gate@313ae1e182
313ae1e182

https://www.illumos.org/issues/8027
  dsl_pool_dirty_delta() should not wake up waiters when dp->dp_dirty_total ==
  zfs_dirty_data_max, because they wait for dp_dirty_total to fall strictly below
  the threshold.
  It's probably very rare for that condition to occur, but it's better to have
  more accurate code.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
2017-05-24 22:27:48 +00:00
Andriy Gapon
e1b8f10a5e MFV r316920: 8023 Panic destroying a metaslab deferred range tree
illumos/illumos-gate@3991b535a8
3991b535a8

https://www.illumos.org/issues/8023
       $C
  ffffff0011bc0970 vpanic()
  ffffff0011bc0a00 strlog()
  ffffff0011bc0a30 range_tree_destroy+0x72(ffffff043769ad00)
  ffffff0011bc0a70 metaslab_fini+0xd5(ffffff0449acf380)
  ffffff0011bc0ab0 vdev_metaslab_fini+0x56(ffffff0462bae800)
  ffffff0011bc0af0 spa_unload+0x9b(ffffff03e3dac000)
  ffffff0011bc0b70 spa_export_common+0x115(ffffff047f4b4000, 2, 0, 0, 0)
  ffffff0011bc0b90 spa_destroy+0x1d(ffffff047f4b4000)
  ffffff0011bc0bd0 zfs_ioc_pool_destroy+0x20(ffffff047f4b4000)
  ffffff0011bc0c80 zfsdev_ioctl+0x4d7(11400000000, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68)
  ffffff0011bc0cc0 cdev_ioctl+0x39(11400000000, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68)
  ffffff0011bc0d10 spec_ioctl+0x60(ffffff03d9153b00, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68, 0)
  ffffff0011bc0da0 fop_ioctl+0x55(ffffff03d9153b00, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68, 0)
  ffffff0011bc0ec0 ioctl+0x9b(3, 5a01, 8040190)
  ffffff0011bc0f10 _sys_sysenter_post_swapgs+0x149()

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
MFC after:	2 weeks
2017-05-24 22:25:26 +00:00
Andriy Gapon
5386d7295a MFV r316917: 7968 multi-threaded spa_sync()
illumos/illumos-gate@94c2d0eb22
94c2d0eb22

https://www.illumos.org/issues/7968
  spa_sync() iterates over all the dirty dnodes and processes each of them by
  calling dnode_sync(). If there are many dirty dnodes (e.g. because we created
  or removed a lot of files), the single thread of spa_sync() calling
  dnode_sync() can become a bottleneck. Additionally, if many dnodes are dirtied
  concurrently in open context (e.g. due to concurrent file creation), the
  os_lock will experience lock contention via dnode_setdirty().
  The solution is to track dirty dnodes on a multilist_t, and for spa_sync() to
  use separate threads to process each of the sublists in the multilist.
  On the concurrent file creation microbenchmark, the performance improvement
  from dnode_setdirty() is up to 7%. Additionally, the wall clock time spent in
  spa_sync() is reduced to 15%-40% of the single-threaded case. In terms of cost/
  reward, once the other bottlenecks are addressed, fixing this bug will provide
  a medium-large performance gain and require a medium amount of effort to
  implement.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	3 weeks
2017-05-24 22:21:24 +00:00
Andriy Gapon
2ba631553c MFV r316916: 7970 zfs_arc_num_sublists_per_state should be common to all multilists
illumos/illumos-gate@10fbdecb05
10fbdecb05

https://www.illumos.org/issues/7970
  The global tunable zfs_arc_num_sublists_per_state is used by the ARC and
  the dbuf cache, and other users are planned. We should change this
  tunable to be common to all multilists.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	3 weeks
2017-05-24 22:15:16 +00:00
Andriy Gapon
1d7634429c MFC r316915: 7801 add more by-dnode routines (lint)
illumos/illumos-gate@411be58a6e
411be58a6e
MFC after:	24 days
X-MFC with:	r318823
2017-05-24 21:52:20 +00:00
Andriy Gapon
31fd119cc2 MFC r316914: 7801 add more by-dnode routines
illumos/illumos-gate@b0c42cd470
b0c42cd470

https://www.illumos.org/issues/7801
  Add *_by_dnode() routines for accessing objects given their
  dnode_t *, this is more efficient than accessing the object by
  (objset_t *, uint64_t object). This change converts some but
  not all of the existing consumers. As performance-sensitive
  code paths are discovered they should be converted to use
  these routines.
  Ported from: 0eef1bde31

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: bzzz77 <bzzz.tomas@gmail.com>
MFC after:	24 days
2017-05-24 21:49:21 +00:00
Andriy Gapon
5aab788866 MFC r316913: 7869 panic in bpobj_space(): null pointer dereference
illumos/illumos-gate@a3905a4592
a3905a4592

https://www.illumos.org/issues/7869
  The issue fixed by this patch is a race condition in the deadlist code.
  A thread executing an administrative command that uses
  `dsl_deadlist_space_range()` holds the lock of the whole `deadlist_t` to
  protect the access of all its entries that the deadlist contains in an
  avl tree.
  Sync threads trying to insert a new entry in the deadlist
  (through `dsl_deadlist_insert()` -> `dle_enqueue()`) do not hold the
  deadlist lock at that moment. If the `dle_bpobj` is the empty bpobj (our
  sentinel value), we close and reopen it. Between these two operations,
  it is possible for the `dsl_deadlist_space_range()` thread to dereference
  that bpobj which is `NULL` during that window.
  Threads should hold the a deadlist's `dl_lock` when they manipulate its
  internal data so scenarios like the one above are avoided. In addition,
  threads should also hold the bpobj lock whenever they are allocating the
  subobj list of a bpobj, and not just when they actually insert the subobj
  to the list. This way we can avoid potential memory leaks.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
MFC after:	2 weeks
2017-05-24 21:45:52 +00:00
Andriy Gapon
930f1af491 MFC r316912: 7793 ztest fails assertion in dmu_tx_willuse_space
illumos/illumos-gate@61e255ce72
61e255ce72

https://www.illumos.org/issues/7793
  Background information: This assertion about tx_space_* verifies that we
  are not dirtying more stuff than we thought we would. We “need” to know
  how much we will dirty so that we can check if we should fail this
  transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the
  transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() —
  typically less than a millisecond), we call dbuf_dirty() on the exact
  blocks that will be modified. Once this happens, the temporary
  accounting in tx_space_* is unnecessary, because we know exactly what
  blocks are newly dirtied; we call dnode_willuse_space() to track this
  more exact accounting.
  The fundamental problem causing this bug is that dmu_tx_hold_*() relies
  on the current state in the DMU (e.g. dn_nlevels) to predict how much
  will be dirtied by this transaction, but this state can change before we
  actually perform the transaction (i.e. call dbuf_dirty()).
  This bug will be fixed by removing the assertion that the tx_space_*
  accounting is perfectly accurate (i.e. we never dirty more than was
  predicted by dmu_tx_hold_*()). By removing the requirement that this
  accounting be perfectly accurate, we can also vastly simplify it, e.g.
  removing most of the logic in dmu_tx_count_*().
  The new tx space accounting will be very approximate, and may be more or
  less than what is actually dirtied. It will still be used to determine
  if this transaction will put us over quota. Transactions that are marked
  by dmu_tx_mark_netfree() will be excepted from this check. We won’t make
  an attempt to determine how much space will be freed by the transaction
  — this was rarely accurate enough to determine if a transaction should
  be permitted when we are over quota, which is why dmu_tx_mark_netfree()
  was introduced in 2014.
  We also won’t attempt to give “credit” when overwriting existing blocks,
  if those blocks may be freed. This allows us to remove the
  do_free_accounting logic in dbuf_dirty(), and associated routines. This

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	3 weeks
2017-05-24 21:43:34 +00:00
Hans Petter Selasky
0f86d40bf5 Increase the allowed maximum number of audio channels from 31 to 127
in the PCM feeder mixer. Without this change a value of 32 channels is
treated like zero, due to using a mask of 0x1f, causing a kernel
assert when trying to playback bitperfect 32-channel audio. Also
update the AWK script which is generating the division tables to
handle more than 18 channels. This commit complements r282650.

MFC after:		3 days
2017-05-24 21:42:48 +00:00
Andriy Gapon
3a9c923927 MFC r316907: 1300 filename normalization doesn't work for removes
illumos/illumos-gate@1c17160ac5
1c17160ac5

https://www.illumos.org/issues/1300

FreeBSD note: recent FreeBSD was not affected by the issue fixed as the
name cache is completely bypassed when normalization is enabled.
The change is imported for the sake of ZAP infrastructure modifications.

Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>

MFC after:	3 weeks
2017-05-24 21:29:31 +00:00
John Baldwin
04fe6458d3 Remove constants and comments for unimplemented entries in the default LDT.
These entries will never be added to the default LDT in the future.
2017-05-24 18:54:21 +00:00
Gleb Smirnoff
6a6cefac3d o Rearrange struct inpcb fields to optimize the TCP output code path
considering cache line hits and misses.  Put the lock and hash list
  glue into the first cache line, put inp_refcount inp_flags inp_socket
  into the second cache line.
o On allocation zero out entire structure except the lock and list entries,
  including inp_route inp_lle inp_gencnt.  When inp_route and inp_lle were
  introduced, they were added below inp_zero_size, resulting on not being
  cleared after free/alloc.  This definitely was a source of bugs with route
  caching.  Could be that r315956 has just fixed one of them.
  The inp_gencnt is reinitialized on every alloc, so it is safe to clear it.

This has been proved to improve TCP performance at Netflix.

Obtained from:		rrs
Differential Revision:	D10686
2017-05-24 17:47:16 +00:00
Cy Schubert
d84a2fbab5 Ifdef out a redundant if statement when LARGE_NAT is disabled.
MFC after:	1 week
2017-05-24 14:36:51 +00:00
Konstantin Belousov
e058e1c43c Add BIT_OR2(), BIT_AND2(), BIT_NAND2(), BIT_XOR() and BIT_XOR2().
Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	2 weeks
2017-05-24 10:09:54 +00:00
Konstantin Belousov
a07c3aeb73 Use __BSD_VISIBLE test instead checking for absense of _POSIX_SOURCE.
The Termios headers <termios.h> and <sys/_termios.h> used sometimes
_POSIX_SOURCE directly to determine if a thing should be exposed to
the user.  This circumvented the feature mechanisms of <sys/cdefs.h>.

Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	2 weeks
2017-05-24 09:25:13 +00:00
Navdeep Parhar
27bdfd5a8a cxgbe/iw_cxgbe: sodisconnect failures are harmless and should not be
treated as fatal errors.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-05-24 04:48:09 +00:00
Adrian Chadd
31ce00f662 [ath] begin migration of AHB support to use the PCI style board data API for calibration data.
This brings the AHB support in line with the PCI support - now other "things"
can wrap up the calibration / board data into a firmware blob and have them
probe/attach after the system has finished booting.

Note that this change requires /all/ of the AHB using kernel configurations
to change - so until I drop those changes in, this breaks AHB.

Fear not, I'll do that soon.

TODO:

* the above stuff.

Tested:

* AR9331, carambola 2, loading if_ath / wlan as modules at run time
2017-05-24 01:02:35 +00:00
Allan Jude
c20feae640 Followup to r318765 (capsicumize cpuset_*affinity)
Update *sysent files
2017-05-24 01:01:57 +00:00
Allan Jude
f299c47b52 Allow cpuset_{get,set}affinity in capabilities mode
bhyve was recently sandboxed with capsicum, and needs to be able to
control the CPU sets of its vcpu threads

Reviewed by:	emaste, oshogbo, rwatson
MFC after:	2 weeks
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D10170
2017-05-24 00:58:30 +00:00
Navdeep Parhar
7c0cad38c7 cxgbe(4): Update the T4, T5, and T6 firmwares to 1.16.45.0.
The latest firmware has a number of link related fixes, support for a
new custom card, and the fix for a bug that affected rate limiting on
FreeBSD.

Obtained from:	Chelsio Communications
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-05-23 23:40:17 +00:00
John Baldwin
9d24f98ca8 Remove the BSD/OS 2.1 system call gate LDT entry.
An extra copy of the system call gate was added to the default LDT back
in 1996 (r18513 / r18514).  However, the ability to run BSD/OS 2.1
i386 binaries under FreeBSD's native ABI is most likely no longer
needed.

Discussed with:	kib
2017-05-23 22:34:18 +00:00
Landon J. Fuller
26e4f22037 bhnd(4): Fix a SPROM identification regression introduced in r315866
In r315866, we introduced a direct read of the 8-bit sromrev field from the
memory mapped SPROM/OTP device. On OTP devices that require 16-bit access
alignment, this read fails, preventing identification of the SPROM layout.

So, let's perform an aligned read of the combined 16-bit sromrev/crc field
instead.

Approved by:	adrian (mentor, implicit)
2017-05-23 22:30:15 +00:00
John Baldwin
a0320759e7 Pass -N directly to ld via -Wl rather than passing it to the compiler driver.
In particular, clang doesn't accept -N.

Obtained from:	CheriBSD
Sponsored by:	DARPA / AFRL
2017-05-23 17:41:09 +00:00
Steve Wills
a4aaba3b0a Add security.bsd.see_jail_proc
Add security.bsd.see_jail_proc sysctl to hide jail processes from non-root
users

Reviewed by:	jamie
Approved by:	allanjude
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D10770
2017-05-23 16:59:24 +00:00
Cy Schubert
59c8837dfa Remove redundant variable declaration.
MFC after:	3 days
2017-05-23 14:38:59 +00:00
Konstantin Belousov
55b78354f6 Add COMPAT_FREEBSD11 on arm64, the arch is almost tier-1.
Discussed with:	andrew, emaste
Sponsored by:	The FreeBSD Foundation
2017-05-23 13:57:55 +00:00
Edward Tomasz Napierala
0e742f2d84 Remove superfluous parentheses.
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-05-23 12:00:08 +00:00
Andrey V. Elsukov
3aee70991d Fix possible double releasing for SA and SP references.
There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For outbound packets the direct call chain is the following:
 IPSEC_OUTPUT() method -> ipsec[46]_common_output() ->
 -> ipsec[46]_perform_request() -> xform_output() ->
 -> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
 -> xform_output_cb() -> ipsec_process_done() -> ip[6]_output().

The SA and SP references are held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did references releasing in
xform_output_cb(). But when the crypto callback called directly, in case of
error the error handling code in ipsec[46]_perform_request() also did
references releasing.

To fix this, remove error handling from ipsec[46]_perform_request() and do it
in xform_output() before crypto_dispatch().

MFC after:	10 days
2017-05-23 09:32:26 +00:00
Konstantin Belousov
ec95c622ff Regen. 2017-05-23 09:30:42 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Andrey V. Elsukov
5f7c516f21 Fix possible double releasing for SA reference.
There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For inbound packets the direct call chain is the following:
 IPSEC_INPUT() method -> ipsec_common_input() -> xform_input() ->
 -> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
 -> xform_input_cb() -> ipsec[46]_common_input_cb() -> netisr_queue().

The SA reference is held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did SA reference releasing in
xform_input_cb(). But when the crypto callback called directly, in case of
error (e.g. data authentification failed) the error handling in
ipsec_common_input() also did SA reference releasing.

To fix this, remove error handling from ipsec_common_input() and do it
in xform_input() before crypto_dispatch().

PR:		219356
MFC after:	10 days
2017-05-23 09:01:48 +00:00
Adrian Chadd
7a0466bc2b [ar71xx] remove dead code! 2017-05-23 06:20:24 +00:00