Commit Graph

117136 Commits

Author SHA1 Message Date
Hans Petter Selasky
427cefde27 Properly implement idr_preload() and idr_preload_end() in the
LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 16:08:30 +00:00
Hans Petter Selasky
dff36e69a1 Implement in_atomic() function in the LinuxKPI.
Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 15:05:44 +00:00
Hans Petter Selasky
90b30e6560 Properly set the .d_name field in the cdevsw structure for the
LinuxKPI.

Obtained from:		kmacy @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:11:06 +00:00
Hans Petter Selasky
d56f1ed887 Make sure the VMAP's "vm_file" field is referenced in a Linux
compatible way by the linux_dev_mmap_single() function in the
LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:07:05 +00:00
Hans Petter Selasky
cca15f28c5 Remove the VMA handle from its list before calling the LinuxKPI VMA
close operation to prevent other threads from reusing the VM object
handle pointer.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:05:54 +00:00
Hans Petter Selasky
68b9f2f00c Don't acquire a reference on the VM-space when allocating the LinuxKPI
task structure to avoid deadlock when tearing down the VM object
during a process exit.

Found by:		markj @
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 13:01:27 +00:00
Hans Petter Selasky
ea67550be0 Fix a reference count leak in the LinuxKPI due to calling VM open when
it shouldn't be called.

Background:
The Linux VM open operation is called when a new VMA is
created on top of the current VMA. This is done through either mremap
flow or split_vma, usually due to mlock, madvise, munmap and so
on. This is currently not supported by the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 12:08:25 +00:00
Hans Petter Selasky
f5a9867b7d Fixes for refcounting "struct linux_file" in the LinuxKPI.
- Allow "struct linux_file" to be refcounted when its "_file" member
  is NULL by using its "f_count" field. The reference counts are
  transferred to the file structure when the file descriptor is
  installed.

- Add missing vdrop() calls for error cases during open().

- Set the "_file" member of "struct linux_file" during open. This
allows use of refcounting through get_file() and fput() with LinuxKPI
character devices.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 12:02:59 +00:00
Hans Petter Selasky
3f743d782a Make sure the thread's priority is restored for all three cases inside
linux_synchronize_rcu_cb() in the LinuxKPI.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-05-31 10:01:15 +00:00
Konstantin Belousov
a8e7f543af Fix bug in r318997: remove the line which overrides vn_fsid()
calculation.

Noted by:	jhb
Reviewed by:	rmacklem
Sponsored by:	The FreeBSD Foundation
2017-05-30 21:20:54 +00:00
Mark Johnston
cb564d2436 Add some miscellaneous definitions to support DRM drivers.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10985
2017-05-30 17:16:08 +00:00
Jonathan T. Looney
8b07e00e99 Fix an unnecessary/incorrect check in the PKTOPT_EXTHDRCPY macro.
This macro allocates memory and, if malloc does not return NULL, copies
data into the new memory. However, it doesn't just check whether malloc
returns NULL. It also checks whether we called malloc with M_NOWAIT. That
is not necessary.

While it may be that malloc() will only return NULL when the M_NOWAIT flag
is set, we don't need to check for this when checking malloc's return
value. Further, in this case, the check was not completely accurate,
because it checked for flags == M_NOWAIT, rather than treating it as a bit
field and checking for (flags & M_NOWAIT).

Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10942
2017-05-30 14:50:28 +00:00
Jonathan T. Looney
fb04394554 Fix two places in the ICMP6 code where we could dereference a NULL pointer
in the icmp6_input() function.

When processing an ICMP6_ECHO_REQUEST, if IP6_EXTHDR_GET fails, it will
set nicmp6 and n to NULL. Therefore, we should condition our modification
to nicmp6 on n being not NULL.

And, when processing an ICMP6_WRUREQUEST in the (mode != FQDN) case, if
m_dup_pkthdr() fails, the code will set n to NULL. However, the very next
line dereferences n. Therefore, when m_dup_pkthdr() fails, we should
discontinue further processing and follow the same path as when m_gethdr()
fails.

Reported by:	clang static analyzer
Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10941
2017-05-30 14:41:31 +00:00
Jonathan T. Looney
382a6bbcf1 Enforce the limit on ICMP messages before doing work to formulate the
response.

Delete an unneeded rate limit for UDP under IPv6. Because ICMP6
messages have their own rate limit, it is unnecessary to apply a
second rate limit to UDP messages.

Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D10387
2017-05-30 14:32:44 +00:00
Andriy Gapon
cae91bbe96 fix indentation
MFC after:	4 days
2017-05-30 13:53:03 +00:00
Zbigniew Bodek
416e886499 Introduce additional locks when releasing TX resources and buffers in ENA
There could be race condition with TX cleaning routine when cleaning mbufs,
when it was called directly from main sending thread (ena_mq_start).

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10927
2017-05-30 12:00:56 +00:00
Zbigniew Bodek
b9252a8889 Move ENA's hw stats updating routine to separate task
Initially, stats were being updated each time OS was requesting for
the first statistic.
To read statistics from hw, condvar was used. cv_timedwait cannot be
called when unsleepable lock is held, and this happens when FreeBSD
is requesting statistic.
Seperate task is reading statistics from NIC each 1 second.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10926
2017-05-30 11:58:51 +00:00
Zbigniew Bodek
081169f24c Add error handling to the ENA driver if init of the reset task fails
Also, to simplify cleaning routine, reset task is initialized before
allocating statistics and other resources.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10925
2017-05-30 11:56:54 +00:00
Zbigniew Bodek
e67c655431 Add locks before each ena_up and ena_down
Lock only ena_up and ena_down calls in ioctl handler, instead of whole
ioctl. Locking ioctl with sx lock that is sleepable, is not allowed in
some cases, e.g. when multicast options are being changed.
Additional locking was added in deatch function to prevent race condition
with ioctl function.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10924
2017-05-30 11:55:02 +00:00
Zbigniew Bodek
1e9fb89962 Add mbuf defragmentation to the ENA driver
When mbuf chain is too long and device cannot handle that number
of segments in DMA transaction, mbuf chain will be defragmented.
Initially, driver was dropping all mbuf chains that were exceeding
supported number of segments.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10923
2017-05-30 11:53:18 +00:00
Mateusz Guzik
c7a6a1b325 mtx: fix whitespace damage in _mtx_trylock_flags_
MFC after:	3 days
2017-05-30 02:25:47 +00:00
Vladimir Kondratyev
0f78004261 psm: add support for evdev protocol
Both relative and absolute multitouch modes are supported.
To enable psm(4) evdev support one should:
1. Add `device evdev` and `options EVDEV_SUPPORT` to kernel config file
2. Add hw.psm.elantech_support=1 or hw.psm.synaptics_support=1 to
   /boot/loader.conf for activation of absolute mode on touchpads
3. Add kern.evdev.rcpt_mask=12 to /etc/sysctl.conf to enable psm event
   sourcing and disable sysmouse

Reviewed by:	gonzo
Approved by:	gonzo (mentor)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10265
Tested by:	wulf, Jan Kokemueller (Lenovo devs)
2017-05-29 20:43:00 +00:00
Andrey V. Elsukov
7f1f65918b Disable IPsec debugging code by default when IPSEC_DEBUG kernel option
is not specified.

Due to the long call chain IPsec code can produce the kernel stack
exhaustion on the i386 architecture. The debugging code usually is not
used, but it requires a lot of stack space to keep buffers for strings
formatting. This patch conditionally defines macros to disable building
of IPsec debugging code.

IPsec currently has two sysctl variables to configure debug output:
 * net.key.debug variable is used to enable debug output for PF_KEY
   protocol. Such debug messages are produced by KEYDBG() macro and
   usually they can be interesting for developers.
 * net.inet.ipsec.debug variable is used to enable debug output for
   DPRINTF() macro and ipseclog() function. DPRINTF() macro usually
   is used for development debugging. ipseclog() function is used for
   debugging by administrator.

The patch disables KEYDBG() and DPRINTF() macros, and formatting buffers
declarations when IPSEC_DEBUG is not present in kernel config. This reduces
stack requirement for up to several hundreds of bytes.
The net.inet.ipsec.debug variable still can be used to enable ipseclog()
messages by administrator.

PR:		219476
Reported by:	eugen
No objection from:	#network
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10869
2017-05-29 09:30:38 +00:00
Wojciech Macek
631f8f40d3 Introduce Genesys GL3224 quirks
The Genesys chip is failing when issueing READ_CAP(16) command.
Force a quirk to disable it and use READ_CAP(10) instead.

Also, depending on used firmware, GL3224 can be recognized
either as 'storage device' or 'mass storage class' -
enable both variants in scsi_quirk_table.

Submitted by:    Wojciech Macek <wma@semihalf.com>
                 Konrad Adamczyk <ka@semihalf.com>
Obtained from:   Semihalf
Sponsored by:    Stormshield
Reviewed by:     mav
Differential revision: https://reviews.freebsd.org/D10902
2017-05-29 09:22:53 +00:00
Wojciech Macek
7108339449 Increase timeout in Atheros HAL
It turned out, that some models of the Atheros PCIe
adapters (e.g. AR983x family) may fail to attach
due to insufficient timeout value.

Submitted by:   Bartosz Szczepanek <bsz@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Stormshield
Reviewed by:    adrian
Differential revision: https://reviews.freebsd.org/D10903
2017-05-29 09:21:38 +00:00
Wojciech Macek
e6a54e228a Enable wireless Atheros cards in ARMADA38X
Submitted by:   Bartosz Szczepanek <bsz@semihalf.com>
                Dominik Ermel <der@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Stormshield
Differential revision: https://reviews.freebsd.org/D10904
2017-05-29 09:20:20 +00:00
Adrian Chadd
bba819843e [AP93] fix up the arge0/arge1 hints. 2017-05-29 07:57:01 +00:00
Adrian Chadd
21ba140453 [ar71xx] [ar724x] update to work
* add EARLY_PRINTF for debugging
* update module list to be much larger
* add random, otherwise well, stuff doesn't work.
* IPFIREWALL_DEFAULT_TO_ACCEPT

Tested:

* AP93 (AR7240 + AR9280)

TODO:

* rename to std.AR724X
* unify the built module list between all of the mips24k/mips74k atheros config files -
  now that the HAL, hwpmc, USB, etc are per-chip/per-arch modules it is easy to just
  compile them all and only include the ones you care about.
2017-05-29 07:30:07 +00:00
Adrian Chadd
56e4110f8e Update AP93 support to the new world order.
* Map change: create a combined kernel+rootfs image.  The instructions I'll post
  on the wiki (which will be for a very outdated dev board, but at least will
  explain the what/why for posterity) will include how to reset the boot command.

Tested:

* AP93 dev board (AR7240 + AR9280)
2017-05-29 07:27:08 +00:00
Cy Schubert
808c7f058c Revert r318789. It causes hanging NAT tcp sessions. 2017-05-29 07:15:28 +00:00
Andriy Gapon
1628f75af1 zfs_lookup: fix bogus arguments to lookup of "snapshot" directory
When a parent directory lookup is done at the root of a snapshot mounted
under .zfs/snapshot directory, we need to look up that directory in
the parent filesystem.  We achieve that by doing a VOP_LOOKUP operation
on a .zfs vnode with "snapshot" as a target name.  But previously we
also passed ISDOTDOT flag to the lookup and, because of that, the lookup
actually returned the parent of the .zfs vnode, that is, a root vnode of
the parent filesystem.

Reported by:	lev
Tested by:	lev
MFC after:	3 days
2017-05-29 06:30:34 +00:00
Andriy Voskoboinyk
59ed13aa49 rtwn: fix connection problems with 'options RTWN_WITHOUT_UCODE'
sc_set_media_status() callback may involve some generic code in addition to
firmware-specific part (e.g., link status register setup for RTL8188E);
so, remove 'RTWN_WITHOUT_UCODE' ifdefs around it.

Tested with RTL8188CUS, RTL8188EU and RTL8821AU, STA mode.
2017-05-28 22:51:06 +00:00
Andriy Voskoboinyk
8d4d46ffb6 rtwn_usb: fix build with 'options RTWN_WITHOUT_UCODE' 2017-05-28 22:38:19 +00:00
Toomas Soome
8878df0d15 Small cleanup in dev_net.c
The variable servip is unused. One leftover printf and small cstyle nit.

Reviewed by:	bapt
Differential Revision:	https://reviews.freebsd.org/D10980
2017-05-28 21:20:55 +00:00
Baptiste Daroussin
41131c64be Followup on the user-class changes
Reported by:	Jose Luis Duran (via github)
2017-05-28 18:31:13 +00:00
Pedro F. Giffuni
0322275751 Fix potential memory leak.
Moving the allocation forward, just before it's actually needed, seems
sensible.
Add newline character at the last line while here.

Reported by:	pluknet
Differential Revision:	https://reviews.freebsd.org/D10974
2017-05-28 17:48:54 +00:00
Pedro F. Giffuni
39999a6998 Support for linux ext2fs posix-draft ACLs.
This is closely tied to the Extended Attribute implementation.

Submitted by:	Fedor Uporov
Reviewed by:	kevlo, pfg

Differential Revision:	https://reviews.freebsd.org/D10807
2017-05-28 15:39:11 +00:00
Michael Zhilin
5a4380b565 [etherswitch] [rtl8366] add phy4cpu setting and support mdioproxy
Tested on WZR-HP-G301NH(RTL8366RB) and WZR-HP-G300NH(RTL8366SR).

Submitted by:   Hiroki Mori <yamori813@yahoo.co.jp>
Differential Revision:	https://reviews.freebsd.org/D10740
2017-05-28 12:14:33 +00:00
Michael Zhilin
97721228b8 [mips] [bhnd] Support of old PMU for BMIPS and siba SoC
- Fix typo of PLL Type 4
 - Don't panic of frequency getters

Submitted by:	Hiroki Mori <yamori813@yahoo.co.jp>
Differential Revision:	https://reviews.freebsd.org/D10967
2017-05-28 12:05:16 +00:00
Dmitry Chagin
9811d215b9 In r246085 some bits that are MI movied out into headers in compat/linux,
but I missed that when I commited x86_64 Linuxulator. So remove the duplicates.

MFC after:	1 week
2017-05-28 08:46:57 +00:00
Adrian Chadd
2e986d170b [ar71xx] undo read-after-write to flush; some bus devices dislike this.
This broke the PCI fixup on at least the AR7240 + AR9280 reference design
board that I have.

Tested:

* Atheros AP93 reference design - AR7240 + AR9280
2017-05-28 07:44:55 +00:00
Dmitry Chagin
9ecc1abca3 On success, getrandom() Linux system call returns the number of bytes that
were copied to the buffer supplied by the user.

Also fix getrandom() if Linuxulator modules are built without the kernel.

PR:		219464
Submitted by:	Maciej Pasternacki
Reported by:	Maciej Pasternacki
MFC after:	1 week
2017-05-28 07:40:09 +00:00
Dmitry Chagin
1a8ea9fb85 Strip _binary_linux_locore_o_size from ${VDSO}.so as it is a low absolute
symbol, and this breaks symbol lookup in ddb.

Requested by:	bde@

MFC after:	1 week
2017-05-28 07:37:40 +00:00
Alan Cox
07c348ea7b After r118390, the variable "dmmax" was neither the correct strip size
nor the correct maximum block size.  Moreover, after r318995, it serves
no purpose except to provide information to user space through a read-
sysctl.

This change eliminates the variable "dmmax" but retains the sysctl.  It
also corrects the value returned by the sysctl.

Reviewed by:	kib, markj
MFC after:	3 days
2017-05-27 21:46:00 +00:00
Baptiste Daroussin
04238e0a32 Update the comments concerning net_parse_rootpath to reflect what it is now
really doing

Reported by:	rgrimes
Reviewed by:	rgrimes
Differential Revision:	https://reviews.freebsd.org/D10959
2017-05-27 18:46:00 +00:00
Cy Schubert
243567356b Fix return value of ip_sync_nat. Previously, regardless of error it
always returned a return code of 0.

Obtained from:	NetBSD ip_sync.c r1.5
MFC after:	1 week
2017-05-27 18:01:14 +00:00
Konstantin Belousov
03311f117b Use whole mnt_stat.f_fsid bits for st_dev.
Since ino64 expanded dev_t to 64bit, make VOP_GETATTR(9) provide all
bits of mnt_stat.f_fsid as va_fsid for vnodes on filesystems which use
f_fsid.  In particular, NFSv3 and sometimes NFSv4, and ZFS use this
method or reporting st_dev by stat(2).

Provide a new helper vn_fsid() to avoid duplicating code to copy
f_fsid to va_fsid.

Note that the change is mostly cosmetic.  Its motivation is to avoid
sign-extension of f_fsid[0] into 64bit dev_t value which happens after
dev_t becomes 64bit..

Reviewed by:	avg(zfs), rmacklem (nfs) (both for previous version)
Sponsored by:	The FreeBSD Foundation
2017-05-27 17:00:30 +00:00
Alan Cox
fe71561af2 In r118390, the swap pager's approach to striping swap allocation over
multiple devices was changed.  However, swapoff_one() was not fully and
correctly converted.  In particular, with r118390's introduction of a per-
device blist, the maximum swap block size, "dmmax", became irrelevant to
swapoff_one()'s operation.  Moreover, swapoff_one() was performing out-of-
range operations on the per-device blist that were silently ignored by
blist_fill().

This change corrects both of these problems with swapoff_one(), which will
allow us to potentially increase MAX_PAGEOUT_CLUSTER.  Previously,
swapoff_one() would panic inside of blist_fill() if you increased
MAX_PAGEOUT_CLUSTER.

Reviewed by:	kib, markj
MFC after:	3 days
2017-05-27 16:40:00 +00:00
Baptiste Daroussin
b5b274ce12 Catch with the change in the user class 2017-05-27 14:07:46 +00:00
Baptiste Daroussin
4e2a7b5c99 Capitalize DHCP
Reported by:	danfe
2017-05-27 13:55:20 +00:00
Baptiste Daroussin
aff810f1b2 Document recent changes on pxeboot 2017-05-27 13:26:18 +00:00
Baptiste Daroussin
e9ce925773 Partially revert r314948
While it sounds like a good idea to extract the RFC1048 data from PXE, in the
end it is not and it is causing lots of issues.  Our pxeloader might need
options which are incompatible with other pxe servers (for example iPXE, but
not only).

Our pxe loaders are also now settings their own user class, so it is useful to
issue our own pxe request at startup

Reviewed by:	tsoome
Differential Revision:	https://reviews.freebsd.org/D10953
2017-05-27 12:46:46 +00:00
Baptiste Daroussin
4dfd16670e Always issue the pxe request
All the code are now only issueing one single dhcp request at startup of the
loader meaning we can always request a the PXE informations from the
dhcp server.

Previous code lost that information, meaning no option 55 anymore (meaning not
working with the kea dhcp server) and no request for rootpath etc, no user class

Remove the flags from the bootp function which is not needed anymore

Reviewed by:	tsoome
Differential Revision:	https://reviews.freebsd.org/D10952
2017-05-27 12:35:01 +00:00
Baptiste Daroussin
5fe86cd909 Always build tftpfs support along with nfs for pxeboot
This change was already done for loader.efi
2017-05-27 12:20:13 +00:00
Baptiste Daroussin
404f5b6b29 Support URI scheme for root-path in netbooting
Rather that previous attempts to add tftpfs support at the same time as NFS
support. This time decide on a proper URI parser rather than hacks.

root-path can now be define the following way:
For tftpfs:

tftp://ip/path
tftp:/path (this one will consider the tftp server is the same as the one where
the pxeboot file was fetched from)

For nfs:
nfs:/path
nfs://ip/path

The historical
ip:/path
/path

are kept on NFS

Reviewed by:	tsoom, rgrimes
Differential Revision:	https://reviews.freebsd.org/D10947
2017-05-27 12:06:52 +00:00
Ed Maste
ef7161e774 uart: add AMT SOL PCI ID
I adjusted the description to be similar to existing AMT entries.

PR:		219384
Submitted by:	"Tooker"
MFC after:	1 week
2017-05-27 02:07:22 +00:00
Alexander Motin
41cf0d54a2 Call VLAN_CAPABILITIES() when LAGG capabilities change.
This makes VLAN on top of LAGG to expose proper capabilities if they are
changed after creation.

MFC after:	1 week
2017-05-26 22:22:48 +00:00
Conrad Meyer
95b978955c procstat(1): Add TCP socket send/recv buffer size
Add TCP socket send and receive buffer size to procstat -f output.

Reviewed by:	kib, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D10689
2017-05-26 22:17:44 +00:00
John Baldwin
d68990a14c Fail large requests with EFBIG.
The adapter firmware in general does not accept PDUs larger than 64k - 1
bytes in size.  Sending crypto requests larger than this size result in
hangs or incorrect output, so reject them with EFBIG.  For requests
chaining an AES cipher with an HMAC, the firmware appears to require
slightly smaller requests (around 512 bytes).

Sponsored by:	Chelsio Communications
2017-05-26 20:20:40 +00:00
Alexander Motin
8403ab7919 Improve applying unified capabilities to the lagg ports.
Some NICs have some capabilities dependent, so that disabling one require
disabling some other (TXCSUM/RXCSUM on em).  This code tries to reach the
consensus more insistently.

PR:		219453
MFC after:	1 week
2017-05-26 20:15:33 +00:00
Andriy Gapon
b5617df55b Allow PROBE_SPINUP to fail in CAM ATA transport
The motivation for this is two-fold.

1. Some old WD SATA disks may appear as if they need to be spun up
when they are already spinning.  Those disks would respond with
an error to the spin-up request.

2. Even if we really fail to spin up the disk, we still can try to
proceed to the subsequent phases.  If we fail later on, then no
difference.  Otherwise we get a chance to communicate with the
disk which is better than completely ignoring it, because a user
can try to recover the disk.

Reviewed by:	mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D10896
2017-05-26 17:44:47 +00:00
Dimitry Andric
b47efe07c4 Define a new __INO64 macro in <sys/_types.h>, to indicate the system
uses 64-bit inode numbers.  Programs can use this to avoid including
<sys/param.h>, with its associated namespace pollution.

Reviewed by:	kib
2017-05-26 16:29:55 +00:00
Michael Tuexen
5d08768a2b Use the SCTP_PCB_FLAGS_ACCEPTING flags to check for listeners.
While there, use a macro for checking the listen state to allow for
easier changes if required.

This done to help glebius@ with his listen changes.
2017-05-26 16:29:00 +00:00
Andriy Gapon
32ecf81aff MFV r318944: 8265 Reserve send stream flag for large dnode feature
illumos/illumos-gate@bc83969fdb
bc83969fdb

https://www.illumos.org/issues/8265
  Reserve bit 23 in the zfs send stream flags for the large
  dnode feature which has been implemented for Linux.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brian Behlendorf <behlendorf1@llnl.gov>

MFC after:	1 week
2017-05-26 12:08:38 +00:00
Andriy Gapon
a51eb0a964 MFV r318942: 8166 zpool scrub thinks it repaired offline device
illumos/illumos-gate@2d2f193a21
2d2f193a21

https://www.illumos.org/issues/8166
  If we do a scrub while a leaf device is offline (via "zpool offline"),
  we will inadvertently clear the DTL (dirty time log) of the offline
  device, even though it is still damaged. When the device comes back
  online, we will incompletely resilver it, thinking that the scrub
  repaired blocks written before the scrub was started. The incomplete
  resilver can lead to data loss if there is a subsequent failure of a
  different leaf device.
  The fix is to never clear the DTL of offline devices. Note that if a
  device is onlined while a scrub is in progress, the scrub will be
  restarted.
  The problem can be worked around by running "zpool scrub" after
  "zpool online".
  See also https://github.com/zfsonlinux/zfs/issues/5806

Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-05-26 12:04:21 +00:00
Andriy Gapon
2cd05c2473 MFV r318934: 8070 Add some ZFS comments
illumos/illumos-gate@40713f2b24
40713f2b24

https://www.illumos.org/issues/8070
  Add some ZFS comments left by various developers at different times

Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>

MFC after:	1 week
2017-05-26 11:49:42 +00:00
Andriy Gapon
0a07ea0e2f MFV r318931: 8063 verify that we do not attempt to access inactive txg
illumos/illumos-gate@b7b2590dd9
b7b2590dd9

https://www.illumos.org/issues/8063
  A standard practice in ZFS is to keep track of "per-txg" state. Any of
  the 3 active TXG's (open, quiescing, syncing) can have different values
  for this state. We should assert that we do not attempt to modify other
  (inactive) TXG's.

Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-05-26 11:37:11 +00:00
Andriy Gapon
28c5e43e36 MFV r318929: 7786 zfs`vdev_online() needs better notification about state changes
illumos/illumos-gate@5f368aef86
5f368aef86

https://www.illumos.org/issues/7786
  Currently, vdev_online() will only post sysevent if previous state was
  "offline". It should also post the event when the state changes from "removed"
  or "faulted" to "healthy" or "degraded".
  This will fix the following scenario:
  - pull disk from slot A
  - check that hotspare has taken its place (if available)
  - insert disk into slot B
  - check that hotspare moved back to "avail" state (if spare was used)
  The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail
  to update the vdev FRU.

Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Approved by: Albert Lee <trisk@forkgnu.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>

MFC after:	1 week
2017-05-26 11:33:34 +00:00
Andriy Gapon
9c2a3c861f MFV r318927: 8025 dbuf_read() creates unnecessary zio_root() for bonus buf
illumos/illumos-gate@def4fac588
def4fac588

https://www.illumos.org/issues/8025
  dbuf_read() creates a zio_root() to track and wait for all the zio's
  that may happen as part of this call. However, if the blkptr_t for
  this buffer is NULL or a hole, we will not create any more zio's, so
  this zio_root() is unnecessary. This is always the case when calling
  dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
  the containing dnode). For workloads that read a lot of bonus buffers
  (e.g. file creation and removal), creating and destroying these
  unnecessary zio's can decrease performance by around 3%.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
2017-05-26 11:30:55 +00:00
Andriy Gapon
ebaf416f95 MFV r316929: 6914 kernel virtual memory fragmentation leads to hang
illumos/illumos-gate@af868f46a5
af868f46a5

https://www.illumos.org/issues/6914

FreeBSD note: only a ZFS part of the change is merged, changes to the VM
subsystem are not ported (obviously).  Also, now that FreeBSD has
vmem(9) we don't have to ifdef-out the code that uses it.

MFC after:	2 weeks
2017-05-26 11:23:16 +00:00
Andriy Gapon
8629ec8394 arc_init: make code closer to upstream by introducing 'allmem' variable
All the differences in calculations are kept.
A comment about arc_max being 1/2 of all memory is fixed to reflect the
actual code that uses 5/8 as a factor.

MFC after:	1 week
2017-05-26 11:05:56 +00:00
Andriy Gapon
cf781c9b60 zfs_putpages: assert that sa_bulk_update() must succeed
Same as the upstream does in r316927.

MFC after:	1 week
2017-05-26 10:37:55 +00:00
Andriy Gapon
04b7c6b337 MFV r316928: 7256 low probability race in zfs_get_data
illumos/illumos-gate@0c94e1af67
0c94e1af67

https://www.illumos.org/issues/7256
                         error = dmu_sync(zio, lr->lr_common.lrc_txg,
                              zfs_get_done, zgd);
                         ASSERT(error || lr->lr_length <= zp->z_blksz);
  It's possible, although extremely rare, that the zfs_get_done() callback is
  executed before dmu_sync() returns.
  In that case the znode's range lock is dropped and the znode is unreferenced.
  Thus, the assertion can access some invalid or wrong data via the zp pointer.
  size variable caches the correct value of z_blksz and can be safely used here.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <andriy.gapon@clusterhq.com>

MFC after:	1 week
2017-05-26 10:31:05 +00:00
Andriy Gapon
7a94dd7aee MFC r316924: 8061 sa_find_idx_tab can be declared more type-safely
illumos/illumos-gate@7f0bdb4257
7f0bdb4257

https://www.illumos.org/issues/8061
  sa_find_idx_tab() is declared as taking and returning "void *" parameters.
  These can be declared to be the specific types.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	1 week
2017-05-26 10:27:35 +00:00
Adrian Chadd
7b6899bf2a [ath] fix short-GI wireshark flag.
Yes, HAL_RX_GI means "short guard interval."
2017-05-26 00:48:21 +00:00
Alexander Motin
e3d90506c4 Remove some code, dead from the day one. 2017-05-25 23:19:09 +00:00
Stephen McConnell
327f2e6c56 Fix several problems with mapping code.
Reviewed by:    ken, scottl, asomers, ambrisko, mav
Approved by:	ken, mav
MFC after:      1 week
Differential Revision: https://reviews.freebsd.org/D10861
2017-05-25 19:20:06 +00:00
Stephen McConnell
635e58c715 Fix several problems with mapping code.
Reviewed by:    ken, scottl, asomers, ambrisko, mav
Approved by:	ken, mav
MFC after:      1 week
Differential Revision: https://reviews.freebsd.org/D10878
2017-05-25 19:14:44 +00:00
Zbigniew Bodek
26872c13ce Unmask legacy interrupts on Marvell PCIE controller
This patch fixes a bug introduced with commit:
r294510  "Remove an extra '!' found by clang 3.8."

'!' was removed without inverting the logic, which
broke PCIe legacy interrupts operation for Marvell
controllers.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Netgate
2017-05-25 14:34:21 +00:00
Zbigniew Bodek
fa5f501d0a Add workaround for CESA MBUS windows with 4GB DRAM
Armada 38x SoC's equipped with 4GB DRAM suffer freeze
during CESA operation, if MBUS window opened at given
DRAM CS reaches end of the address space. Apply a workaround
by setting the window size to the closest possible
value, i.e. divide it by 2 (it has to be power-of-2).

Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10724
2017-05-25 14:25:05 +00:00
Zbigniew Bodek
0c79c0b138 Fix PM recognition on recent Marvell boards
PM status is only supported on Kirkwood and Disvovery.
Cleanup the code to properly report its state on
other platforms.

Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10718
2017-05-25 14:23:49 +00:00
Zbigniew Bodek
92ce47d94e Introduce separate watchdog driver for Armada to fix phony DELAY
DELAY is a problematic routine called all over the kernel.
Armada38x using CA-9 CPUs are using mpcore timer to count events
and measure time but DELAY in the mpcore timer code is a weak
function reference and therefore will be replaced by the platform
implementation if the one is introduced. Since Armada38x uses
on-chip watchdog to which the driver is merged with the on-chip timer
driver there will be a platform DELAY implementation.
The latter however will not use any HW timers as it will not attempt
to configure any. Phony busy loop will be used instead.

To fix that we introduce a separate watchdog driver for Armada platforms,
(currently only A38X) and stop using Marvell timer driver. That
switches DELAY to the desired implementation.

Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10710
2017-05-25 14:22:00 +00:00
Zbigniew Bodek
bb98396b47 Enable SCU Speculative linefills to L2 on Armada 38x
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10709
2017-05-25 14:19:20 +00:00
Zbigniew Bodek
70d163328d Fix memory corruption while configuring CPU windows on Marvell SoCs
Resolving CPU windows from localbus entry caused buffer overflow
and memory corruption. Fix wrong indexing and ensure the index
does not exceed table size.

Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10720
2017-05-25 14:16:43 +00:00
Andriy Gapon
ced98d784b fix vmxnet3 crash when LRO is enabled
The crash can occur when all of the following conditions are true:
- a packet consists of multiple segements (requires LRO enabled)
- there has been a failure to allocate an mbuf for the packet and
  the packet has to be dropped
- a host (vmware) still owned at least one segment of the packet,
  so the driver had to wait for another interrupt to proceed to
  discarding the remaning segment(s)

Reviewed by:	rstone
MFC after:	2 weeks
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D10874
2017-05-25 10:49:56 +00:00
Hans Petter Selasky
3f9dcc588d Declare the "snd_fxdiv_table" once. This shaves around 24Kbytes of
binary data from sound.ko and the kernel.

MFC after:		3 days
2017-05-25 05:23:47 +00:00
Adrian Chadd
f46839b9e3 [ath] [ath_hal] retire AH_SUPPORT_AR5416 changing anything.
Yes, the memory bloat is large, but it's 2017 and I'll fix it later
by making it runtime configurable / per-chip configurable if I ever need to.
2017-05-25 04:26:26 +00:00
Adrian Chadd
41059135ce [ath] [ath_hal] (etc, etc) - begin the task of re-modularising the HAL.
In the deep past, when this code compiled as a binary module, ath_hal
built as a module.  This allowed custom, smaller HAL modules to be built.
This was especially beneficial for small embedded platforms where you
didn't require /everything/ just to run.

However, sometime around the HAL opening fanfare, the HAL landed here
as one big driver+HAL thing, and a lot of the (dirty) infrastructure
(ie, #ifdef AH_SUPPORT_XXX) to build specific subsets of the HAL went away.
This was retained in sys/conf/files as "ath_hal_XXX" but it wasn't
really floated up to the modules themselves.

I'm now in a position where for the reaaaaaly embedded boards (both the
really old and the last couple generation of QCA MIPS boards) having a
cut down HAL module and driver loaded at runtime is /actually/ beneficial.

This reduces the kernel size down by quite a bit.  The MIPS modules look
like this:

adrian@gertrude:~/work/freebsd/head-embedded/src % ls -l ../root/mips_ap/boot/kernel.CARAMBOLA2/ath*ko
-r-xr-xr-x  1 adrian  adrian    5076 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_dfs.ko
-r-xr-xr-x  1 adrian  adrian  100588 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_hal.ko
-r-xr-xr-x  1 adrian  adrian  627324 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_hal_ar9300.ko
-r-xr-xr-x  1 adrian  adrian  314588 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_main.ko
-r-xr-xr-x  1 adrian  adrian   23472 May 23 23:45 ../root/mips_ap/boot/kernel.CARAMBOLA2/ath_rate.ko

And the x86 versions, like this:

root@gertrude:/home/adrian # ls -l /boot/kernel/ath*ko
-r-xr-xr-x  1 root  wheel   36632 May 24 18:32 /boot/kernel/ath_dfs.ko
-r-xr-xr-x  1 root  wheel  134440 May 24 18:32 /boot/kernel/ath_hal.ko
-r-xr-xr-x  1 root  wheel   82320 May 24 18:32 /boot/kernel/ath_hal_ar5210.ko
-r-xr-xr-x  1 root  wheel  104976 May 24 18:32 /boot/kernel/ath_hal_ar5211.ko
-r-xr-xr-x  1 root  wheel  236144 May 24 18:32 /boot/kernel/ath_hal_ar5212.ko
-r-xr-xr-x  1 root  wheel  336104 May 24 18:32 /boot/kernel/ath_hal_ar5416.ko
-r-xr-xr-x  1 root  wheel  598336 May 24 18:32 /boot/kernel/ath_hal_ar9300.ko
-r-xr-xr-x  1 root  wheel  406144 May 24 18:32 /boot/kernel/ath_main.ko
-r-xr-xr-x  1 root  wheel   55352 May 24 18:32 /boot/kernel/ath_rate.ko

.. so you can see, not building the whole HAL can save quite a bit.
For example, if you don't need AR9300 support, you can actually avoid
wasting half a megabyte of RAM.  On embedded routers this is quite a
big deal.

The AR9300 HAL can be later further shrunk because, hilariously,
it indeed supports AH_SUPPORT_<xxx> for optionally adding chipset support.
(I'll chase that down later as it's quite a big savings if you're only
building for a single embedded target.)

So:

* Create a very hackish way to load/unload HAL modules
* Create module metadata for each HAL subtype - ah_osdep_arXXXX.c
* Create module metadata for ath_rate and ath_dfs (bluetooth is
  currently just built as part of it)
* .. yes, this means we could actually build multiple rate control
  modules and pick one at load time, but I'd rather just glue this
  into net80211's rate control code.  Oh well, baby steps.
* Main driver is now "ath_main"
* Create an "if_ath" module that does what the ye olde one did -
  load PCI glue, main driver, HAL and all child modules.
  In this way, if you have "if_ath_load=YES" in /boot/modules.conf
  it will load everything the old way and stuff should still work.
* For module autoloading purposes, I actually /did/ fix up
  the name of the modules in if_ath_pci and if_ath_ahb.

If you want to selectively load things (eg on ye cheape ARM/MIPS platforms
where RAM is at a premium) you should:

* load ath_hal
* load the chip modules in question
* load ath_rate, ath_dfs
* load ath_main
* load if_ath_pci and/or if_ath_ahb depending upon your particular
  bus bind type - this is where probe/attach is done.

TODO:

* AR5312 module and associated pieces - yes, we have the SoC side support
  now so the wifi support would be good to "round things out";
* Just nuke AH_SUPPORT_AR5416 for now and always bloat the packet
  structures; this'll simplify other things.
* Should add a simple refcnt thing to the HAL RF/chip modules so you
  can't unload them whilst you're using them.
* Manpage updates, UPDATING if appropriate, etc.
2017-05-25 04:18:46 +00:00
Andriy Gapon
8816c0bb48 MFV r316925: 6101 attempt to lzc_create() a filesystem under a volume results in a panic
illumos/illumos-gate@b127fe3c05
b127fe3c05

https://www.illumos.org/issues/6101
  lzc_create(), or more correctly, zfs_ioc_create() does not reject an attempt to
  create a filesystem as a child of a volume, instead it proceeds to a crash.
  A crash stack obtained on FreeBSD:
  page fault while in kernel mode

  zap_leaf_lookup()
  fzap_lookup()
  zap_lookup_norm()
  zap_lookup()
  zfs_get_zplprop()
  zfs_fill_zplprops_impl()
  zfs_ioc_create()
  zfsdev_ioctl()
  devfs_ioctl_f()
  kern_ioctl()
  sys_ioctl()
  This crash happened with a kernel without debugging assertions.
  The immediate cause of crash appears to an attempt to interpret a zvol object
  as a zap object.
  For filesystems:
  #define MASTER_NODE_OBJ 1
  For zvols:
  #define ZVOL_OBJ                1ULL
  #define ZVOL_ZAP_OBJ            2ULL
  So, I see two problems here:
     1. an attempt to create a filesystem under a zvol should be rejected as
        early as possible, maybe in zfs_fill_zplprops()
     2. maybe zap_lookup / zap_lockdir should reject objects that are not of one
        of the zap object types

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	2 weeks
2017-05-24 22:34:54 +00:00
Andriy Gapon
e73f9f8a49 MFV r316923: 8026 retire zfs_throttle_delay and zfs_throttle_resolution
illumos/illumos-gate@6b03625981
6b03625981

https://www.illumos.org/issues/8026
  zfs_throttle_delay and zfs_throttle_resolution became disused since the new
  write throttling mechanism was introduced.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
2017-05-24 22:32:56 +00:00
Andriy Gapon
9fe5e04dfc MFC r316921: 8027 tighten up dsl_pool_dirty_delta
illumos/illumos-gate@313ae1e182
313ae1e182

https://www.illumos.org/issues/8027
  dsl_pool_dirty_delta() should not wake up waiters when dp->dp_dirty_total ==
  zfs_dirty_data_max, because they wait for dp_dirty_total to fall strictly below
  the threshold.
  It's probably very rare for that condition to occur, but it's better to have
  more accurate code.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
2017-05-24 22:27:48 +00:00
Andriy Gapon
e1b8f10a5e MFV r316920: 8023 Panic destroying a metaslab deferred range tree
illumos/illumos-gate@3991b535a8
3991b535a8

https://www.illumos.org/issues/8023
       $C
  ffffff0011bc0970 vpanic()
  ffffff0011bc0a00 strlog()
  ffffff0011bc0a30 range_tree_destroy+0x72(ffffff043769ad00)
  ffffff0011bc0a70 metaslab_fini+0xd5(ffffff0449acf380)
  ffffff0011bc0ab0 vdev_metaslab_fini+0x56(ffffff0462bae800)
  ffffff0011bc0af0 spa_unload+0x9b(ffffff03e3dac000)
  ffffff0011bc0b70 spa_export_common+0x115(ffffff047f4b4000, 2, 0, 0, 0)
  ffffff0011bc0b90 spa_destroy+0x1d(ffffff047f4b4000)
  ffffff0011bc0bd0 zfs_ioc_pool_destroy+0x20(ffffff047f4b4000)
  ffffff0011bc0c80 zfsdev_ioctl+0x4d7(11400000000, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68)
  ffffff0011bc0cc0 cdev_ioctl+0x39(11400000000, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68)
  ffffff0011bc0d10 spec_ioctl+0x60(ffffff03d9153b00, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68, 0)
  ffffff0011bc0da0 fop_ioctl+0x55(ffffff03d9153b00, 5a01, 8040190, 100003,
  ffffff03e1956b10, ffffff0011bc0e68, 0)
  ffffff0011bc0ec0 ioctl+0x9b(3, 5a01, 8040190)
  ffffff0011bc0f10 _sys_sysenter_post_swapgs+0x149()

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
MFC after:	2 weeks
2017-05-24 22:25:26 +00:00
Andriy Gapon
5386d7295a MFV r316917: 7968 multi-threaded spa_sync()
illumos/illumos-gate@94c2d0eb22
94c2d0eb22

https://www.illumos.org/issues/7968
  spa_sync() iterates over all the dirty dnodes and processes each of them by
  calling dnode_sync(). If there are many dirty dnodes (e.g. because we created
  or removed a lot of files), the single thread of spa_sync() calling
  dnode_sync() can become a bottleneck. Additionally, if many dnodes are dirtied
  concurrently in open context (e.g. due to concurrent file creation), the
  os_lock will experience lock contention via dnode_setdirty().
  The solution is to track dirty dnodes on a multilist_t, and for spa_sync() to
  use separate threads to process each of the sublists in the multilist.
  On the concurrent file creation microbenchmark, the performance improvement
  from dnode_setdirty() is up to 7%. Additionally, the wall clock time spent in
  spa_sync() is reduced to 15%-40% of the single-threaded case. In terms of cost/
  reward, once the other bottlenecks are addressed, fixing this bug will provide
  a medium-large performance gain and require a medium amount of effort to
  implement.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	3 weeks
2017-05-24 22:21:24 +00:00
Andriy Gapon
2ba631553c MFV r316916: 7970 zfs_arc_num_sublists_per_state should be common to all multilists
illumos/illumos-gate@10fbdecb05
10fbdecb05

https://www.illumos.org/issues/7970
  The global tunable zfs_arc_num_sublists_per_state is used by the ARC and
  the dbuf cache, and other users are planned. We should change this
  tunable to be common to all multilists.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	3 weeks
2017-05-24 22:15:16 +00:00
Andriy Gapon
1d7634429c MFC r316915: 7801 add more by-dnode routines (lint)
illumos/illumos-gate@411be58a6e
411be58a6e
MFC after:	24 days
X-MFC with:	r318823
2017-05-24 21:52:20 +00:00
Andriy Gapon
31fd119cc2 MFC r316914: 7801 add more by-dnode routines
illumos/illumos-gate@b0c42cd470
b0c42cd470

https://www.illumos.org/issues/7801
  Add *_by_dnode() routines for accessing objects given their
  dnode_t *, this is more efficient than accessing the object by
  (objset_t *, uint64_t object). This change converts some but
  not all of the existing consumers. As performance-sensitive
  code paths are discovered they should be converted to use
  these routines.
  Ported from: 0eef1bde31

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: bzzz77 <bzzz.tomas@gmail.com>
MFC after:	24 days
2017-05-24 21:49:21 +00:00
Andriy Gapon
5aab788866 MFC r316913: 7869 panic in bpobj_space(): null pointer dereference
illumos/illumos-gate@a3905a4592
a3905a4592

https://www.illumos.org/issues/7869
  The issue fixed by this patch is a race condition in the deadlist code.
  A thread executing an administrative command that uses
  `dsl_deadlist_space_range()` holds the lock of the whole `deadlist_t` to
  protect the access of all its entries that the deadlist contains in an
  avl tree.
  Sync threads trying to insert a new entry in the deadlist
  (through `dsl_deadlist_insert()` -> `dle_enqueue()`) do not hold the
  deadlist lock at that moment. If the `dle_bpobj` is the empty bpobj (our
  sentinel value), we close and reopen it. Between these two operations,
  it is possible for the `dsl_deadlist_space_range()` thread to dereference
  that bpobj which is `NULL` during that window.
  Threads should hold the a deadlist's `dl_lock` when they manipulate its
  internal data so scenarios like the one above are avoided. In addition,
  threads should also hold the bpobj lock whenever they are allocating the
  subobj list of a bpobj, and not just when they actually insert the subobj
  to the list. This way we can avoid potential memory leaks.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
MFC after:	2 weeks
2017-05-24 21:45:52 +00:00
Andriy Gapon
930f1af491 MFC r316912: 7793 ztest fails assertion in dmu_tx_willuse_space
illumos/illumos-gate@61e255ce72
61e255ce72

https://www.illumos.org/issues/7793
  Background information: This assertion about tx_space_* verifies that we
  are not dirtying more stuff than we thought we would. We “need” to know
  how much we will dirty so that we can check if we should fail this
  transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the
  transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() —
  typically less than a millisecond), we call dbuf_dirty() on the exact
  blocks that will be modified. Once this happens, the temporary
  accounting in tx_space_* is unnecessary, because we know exactly what
  blocks are newly dirtied; we call dnode_willuse_space() to track this
  more exact accounting.
  The fundamental problem causing this bug is that dmu_tx_hold_*() relies
  on the current state in the DMU (e.g. dn_nlevels) to predict how much
  will be dirtied by this transaction, but this state can change before we
  actually perform the transaction (i.e. call dbuf_dirty()).
  This bug will be fixed by removing the assertion that the tx_space_*
  accounting is perfectly accurate (i.e. we never dirty more than was
  predicted by dmu_tx_hold_*()). By removing the requirement that this
  accounting be perfectly accurate, we can also vastly simplify it, e.g.
  removing most of the logic in dmu_tx_count_*().
  The new tx space accounting will be very approximate, and may be more or
  less than what is actually dirtied. It will still be used to determine
  if this transaction will put us over quota. Transactions that are marked
  by dmu_tx_mark_netfree() will be excepted from this check. We won’t make
  an attempt to determine how much space will be freed by the transaction
  — this was rarely accurate enough to determine if a transaction should
  be permitted when we are over quota, which is why dmu_tx_mark_netfree()
  was introduced in 2014.
  We also won’t attempt to give “credit” when overwriting existing blocks,
  if those blocks may be freed. This allows us to remove the
  do_free_accounting logic in dbuf_dirty(), and associated routines. This

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	3 weeks
2017-05-24 21:43:34 +00:00
Hans Petter Selasky
0f86d40bf5 Increase the allowed maximum number of audio channels from 31 to 127
in the PCM feeder mixer. Without this change a value of 32 channels is
treated like zero, due to using a mask of 0x1f, causing a kernel
assert when trying to playback bitperfect 32-channel audio. Also
update the AWK script which is generating the division tables to
handle more than 18 channels. This commit complements r282650.

MFC after:		3 days
2017-05-24 21:42:48 +00:00
Andriy Gapon
3a9c923927 MFC r316907: 1300 filename normalization doesn't work for removes
illumos/illumos-gate@1c17160ac5
1c17160ac5

https://www.illumos.org/issues/1300

FreeBSD note: recent FreeBSD was not affected by the issue fixed as the
name cache is completely bypassed when normalization is enabled.
The change is imported for the sake of ZAP infrastructure modifications.

Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Kevin Crowe <kevin.crowe@nexenta.com>

MFC after:	3 weeks
2017-05-24 21:29:31 +00:00