Commit Graph

120801 Commits

Author SHA1 Message Date
Ian Lepore
3982006ed5 Remove some files that snuck in via cut and paste.
Having these compiled into the module causes the kobj method descriptors
to be resolved incorrectly (by the compile-time linker instead of the
kernel linker), which then leads to hours of frustrating debugging.
2018-02-21 16:34:04 +00:00
Nathan Whitehorn
d9dbc2104f Add definition for the PowerPC A2. 2018-02-21 15:15:58 +00:00
Nathan Whitehorn
dddf28585d Add definitions for the new Radix MMU mode on POWER9+ CPUs. 2018-02-21 15:15:31 +00:00
Andriy Gapon
cfb675a138 MFV r329718: 8520 7198 lzc_rollback_to should support rolling back to origin
illumos/illumos-gate@95643f75d2
95643f75d2

https://www.illumos.org/issues/8520
  lzc_rollback_to() should support rolling back to a clone's origin.
  The current checks in zfs_ioc_rollback() would not allow that because the
  origin snapshot belongs to a different filesystem.
  The overly restrictive check was introduced in 7600, but it was not a
  regression as none of the existing tools provided a way to rollback to the
  origin.

https://www.illumos.org/issues/7198
  EINVAL is returned when a dataset does not have any snapshots, so there is
  nothing to roll back to.
  Although the code in zfs_do_rollback checks for that condition in advance, it's
  still possible that the snapshot(s) gets removed after the check and before the
  rollback sync task is executed.
  At the moment zfs command would crash when that happens.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	2 weeks
2018-02-21 15:12:14 +00:00
Andriy Gapon
754d27df02 MFV r329715: 8997 ztest assertion failure in zil_lwb_write_issue
illumos/illumos-gate@f864f99efe
f864f99efe

https://www.illumos.org/issues/8997
  When dmu_tx_assign is called from zil_lwb_write_issue, it's possible
  for either ERESTART or EIO to be returned.
  If ERESTART is returned, this will cause an assertion to fail directly
  in zil_lwb_write_issue, where the code assumes the return value is
  EIO if dmu_tx_assign returns a non-zero value. This can occur if the
  SPA is suspended when dmu_tx_assign is called, and most often occurs
  when running zloop.
  If EIO is returned, this can cause assertions to fail elsewhere in the
  ZIL code. For example, zil_commit_waiter_timeout contains the
  following logic:
    lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb);
    ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED);
  In this case, if dmu_tx_assign returned EIO from within
  zil_lwb_write_issue, the lwb variable passed in will not be issued
  to disk. Thus, it's lwb_state field will remain LWB_STATE_OPENED and
  this assertion will fail. zil_commit_waiter_timeout assumes that after
  it calls zil_lwb_write_issue, the lwb will be issued to disk, and
  doesn't handle the case where this is not true; i.e. it doesn't handle
  the case where dmu_tx_assign returns EIO.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
MFC after:	3 weeks
2018-02-21 15:07:49 +00:00
Andriy Gapon
9d6810819c MFV r329713: 8731 ASSERT3U(nui64s, <=, UINT16_MAX) fails for large blocks
illumos/illumos-gate@a6c1eb3c08
a6c1eb3c08

https://www.illumos.org/issues/8731
  annotate_ecksum() asserts that nui64s, calculated as nui64s = size / sizeof
  (uint64_t), is not greater than UINT16_MAX.
  This restriction is needed because histograms of incorrectly set and cleared
  bits have 16 bit counters and if the buffer consists of too many 64-bit words,
  then a counter can potentially overflow producing an incorrect result.
  When the largest buffer size was 128KB the greatest value of nui64s was 16K,
  well within the limit.
  But now we have support for large buffers and for buffer sizes of 512KB and
  above the restriction is violated.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	2 weeks
2018-02-21 14:31:48 +00:00
Wojciech Macek
6d13fd638c PowerNV: Put processor to power-save state in idle thread
When processor enters power-save state it releases resources shared with other
cpu threads which makes other cores working much faster.

This patch also implements saving and restoring registers that might get
corrupted in power-save state.

Submitted by:          Patryk Duda <pdk@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           jhibbits, nwhitehorn, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14330
2018-02-21 14:28:40 +00:00
Andriy Gapon
70639f9a5e MFV r329710: 8966 Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable
illumos/illumos-gate@82693e09cc
82693e09cc
https://www.illumos.org/issues/8966

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: WHR <msl0000023508@gmail.com>
PR:		225162
Submitted by:	WHR <msl0000023508@gmail.com>
Reported by:	WHR <msl0000023508@gmail.com>
MFC after:	1 week
2018-02-21 14:17:07 +00:00
Edward Tomasz Napierala
8404ad78db Use proper buffer length (the announce_buf char pointer used to be anarray),
broken in r317143. This fixes those weird "cd0: Attempt" messages at boot.

PR:		222103
Reviewed by:	scottl@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14369
2018-02-21 14:05:13 +00:00
Hans Petter Selasky
07c757ec25 Allow LinuxKPI character devices to receive mmap() calls from the Linux
binary mode user-space emulation layer. This is a regression issue after
r328436, when LinuxKPI character devices started to use DTYPE_DEV in
the "f_type" field of the associated file structure(s).

MFC after:	3 days
Found by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-21 10:13:17 +00:00
Wojciech Macek
eb96cc1364 PowerNV: add missing RTC_WRITE support
Add function which can store RTC values to OPAL.

Submitted by:          Wojciech Macek <wma@semihalf.org>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-02-21 08:13:17 +00:00
Wojciech Macek
19a5b68236 CXGBE: implement prefetch on non-Intel architectures
Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           np, pdk@semihalf.com
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14452
2018-02-21 08:05:56 +00:00
Justin Hibbits
fcc491a3fe Split printtrap() into generic and CPU-specific components
Summary:
This compartmentalizes the CPU-specific trap components into its own
function, rather than littering the general printtrap() with various checks.
This will let us replace a series of #ifdef's with a runtime conditional check
in the future.

Reviewed By:	nwhitehorn
Differential Revision:	https://reviews.freebsd.org/D14416
2018-02-21 03:34:33 +00:00
Alexander Motin
d8e89539c8 MFV r324198: 8081 Compiler warnings in zdb
illumos/illumos-gate@3f7978d02b
3f7978d02b

https://www.illumos.org/issues/8081
  zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
  which uses -WError, the only way to build it is to disable all compiler
  warnings. This makes it much harder to detect newly introduced bugs. We should
  cleanup all the warnings.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
2018-02-21 03:08:47 +00:00
Alexander Motin
03618fe74d MFV r319737: 6939 add sysevents to zfs core for commands
illumos/illumos-gate@ce1577b049
ce1577b049

https://www.illumos.org/issues/6939
  Originally created https://smartos.org/bugview/OS-4489
       sysevents should be fired in the kernel from ZFS whenever a command
       is run that is logged in zpool history.
  Example output
  Terminal 1
  root - gz sunos ~ # zfs create zones/foobar
  root - gz sunos ~ # zfs set quota=10g zones/foobar
  root - gz sunos ~ # zfs destroy zones/foobar
  Terminal 2
  root - gz sunos ~ # sysevent EC_zfs
  nvlist version: 0
      date = 2016-04-28T14:50:08.964Z
      vendor = SUNW
      publisher = zfs
      class = EC_zfs
      subclass = ESC_ZFS_history_event
      pid = 0
      data = (embedded nvlist)
      nvlist version: 0
          pool_name = zones
          pool_guid = 0x40c964e8f9a7a694
          history_record = (embedded nvlist)
          nvlist version: 0
              dsname = zones/foobar
              dsid = 0x1525
              history internal str =
              internal_name = create
              history txg = 0x4c4ef3

Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Joshua M. Clulow <jmc@joyent.com>
Reviewed by: Josh Wilsdon <jwilsdon@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Alan Somers <asomers@gmail.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Dave Eddy <dave@daveeddy.com>
2018-02-21 02:19:42 +00:00
Alexander Motin
63e739af67 MFV r319736: 6396 remove SVM
illumos/illumos-gate@5f10ef697f
5f10ef697f

https://www.illumos.org/issues/6396
  LVM = SVM = Solaris Volume Manager
  dead code and not using with ZFS based platform.

Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
2018-02-21 00:24:54 +00:00
Alexander Motin
e5a4a83784 MFV r318941: 7446 zpool create should support efi system partition
illumos/illumos-gate@7855d95b30
7855d95b30

https://www.illumos.org/issues/7446
  Since we support whole-disk configuration for boot pool, we also will need
  whole disk support with UEFI boot and for this, zpool create should create efi-
  system partition.
  I have borrowed the idea from oracle solaris, and introducing zpool create -
  B switch to provide an way to specify that boot partition should be created.
  However, there is still an question, how big should the system partition be.
  For time being, I have set default size 256MB (thats minimum size for FAT32
  with 4k blocks). To support custom size, the set on creation "bootsize"
  property is created and so the custom size can be set as: zpool create B -
  o bootsize=34MB rpool c0t0d0
  After pool is created, the "bootsize" property is read only. When -B switch is
  not used, the bootsize defaults to 0 and is shown in zpool get output with
  value ''. Older zfs/zpool implementations are ignoring this property.
  https://www.illumos.org/rb/r/219/

Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>

This commit makes no sense for FreeBSD, that is why I blocked the option,
but it should be good to stay closer to upstream.
2018-02-21 00:18:57 +00:00
Navdeep Parhar
7cb7c6e37a Catch up with the removal of nktr_slot_flags from upstream netmap. No
functional impact intended.

Submitted by:	Vincenzo Maffione <v.maffione@gmail.com>
2018-02-20 21:42:45 +00:00
Jeff Roberson
683ca3a432 Fix the broken subqueue assignment for the cleanq.
Reported by:	pho
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
2018-02-20 21:27:17 +00:00
Mateusz Guzik
500ca73d43 mtx: add debug assertions to mtx_spin_wait_unlocked 2018-02-20 20:39:34 +00:00
Mateusz Guzik
862db53fb5 Fix reaping on process fd close broken after r329449
The only consumer of proc_reap other than proc_to_reap was not updated
to not PROC_SLOCK.

Reported by:    Juan Ramon Molina Menor <listjm club.fr>
2018-02-20 20:19:38 +00:00
Stephen Hurd
a4e5960730 IFLIB: do not remove dmamap on buffer unload
Dmamap is created only on IFC attach. If we remove it on
buffer release, we won't be able to do ifconfig down&up. Only destroy
when in detach.

Reported by:	wma
Reviewed by:	wma
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14060
2018-02-20 18:33:45 +00:00
Brooks Davis
b81e88d296 Reduce duplication in dynamic syscall registration code.
Remove the unused syscall_(de)register() functions in favor of the
better documented and easier to use syscall_helper_(un)register(9)
functions.

The default and freebsd32 versions differed in which array of struct
sysents they used and a few missing updates to the 32-bit code as
features were added to the main code.

Reviewed by:	cem
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14337
2018-02-20 18:08:57 +00:00
Ian Lepore
7efedde853 Adjust whitespace of things added in the past couple years to match the
original style of the file.  No functional changes.
2018-02-20 14:59:29 +00:00
Mateusz Guzik
681a1b752c Make killpg1 perform process validity checks without proc lock held. 2018-02-20 10:52:07 +00:00
Konstantin Belousov
2c0f13aa59 vm_wait() rework.
Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass.  If there is no object
associated with the wait, use curthread' policy domainset.  The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait().  vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation, Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D14384
2018-02-20 10:13:13 +00:00
Wojciech Macek
f32ebdc85c PowerPC: Switch to more accurate unit to avoid division rounding
On POWER8 architecture there is a timer with 512Mhz frequency.
It has about 1,95ns period, but it is rounded to 1ns which is not accurate.

Submitted by:          Patryk Duda <pdk@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14433
2018-02-20 07:30:57 +00:00
Wojciech Macek
838070d5f4 PowerNV: Send SIGILL on HEA illegal instruction exception
Currently Hypervisor Emulation Assistance interrupt is unhandled.
Executing an undefined instruction in userland triggers kernel panic.
Handle this the same way as Facility Unavailable Interrupt - send
SIGILL signal to userspace.

Submitted by:          Michal Stanek <mst@semihalf.com>
Obtained from:         Semihalf
Reviewed by:           nwhitehorn, pdk@semihalf.com, wma
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14437
2018-02-20 06:38:55 +00:00
Alexander Motin
89dabdb4ea MFC r316910: 7812 Remove gender specific language
illumos/illumos-gate@48bbca8168
48bbca8168

https://www.illumos.org/issues/7812
  This change removes all gendered language that did not refer specifically
  to an individual person or pet. The convention taken was to use
  variations on "they" when referring to users and/or human beings, while
  using "it" when referring to code, functions, and/or libraries.
  Additionally, we took the liberty to fix up any whitespace issues that
  were found in any files that were already being modified.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Daniel Hoffman <dj.hoffman@delphix.com>
2018-02-20 05:07:21 +00:00
Alexander Motin
3542f1bd3a MFV r307315:
7301 zpool export -f should be able to interrupt file freeing

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Author: Alek Pinchuk <alek@nexenta.com>

Closes #175
2018-02-20 04:36:51 +00:00
Alexander Motin
4a0867c8d2 MFV r302649: 7016 arc_available_memory is not 32-bit safe
illumos/illumos-gate@0dd053d7d8
0dd053d7d8

https://www.illumos.org/issues/7016
  upstream DLPX-39446 arc_available_memory is not 32-bit safe
  https://github.com/delphix/delphix-os/commit/
  6b353ea3b8a1610be22e71e657d051743c64190b
  related to this upstream:
  DLPX-38547 delphix engine hang
  https://github.com/delphix/delphix-os/commit/
  3183a567b3e8c62a74a65885ca60c86f3d693783
  DLPX-38547 delphix engine hang (fix static global)
  https://github.com/delphix/delphix-os/commit/
  22ac551d8ef085ad66cc8f65e51ac372b12993b9
  DLPX-38882 system hung waiting on free segment
  https://github.com/delphix/delphix-os/commit/
  cdd6beef7548cd3b12f0fc0328eeb3af540079c2

Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Author: Prakash Surya <prakash.surya@delphix.com>
2018-02-20 04:14:12 +00:00
Ian Lepore
42c52f36e4 Add missing MODULE_DEPENDS(). 2018-02-20 03:51:09 +00:00
Mateusz Guzik
81d68271d7 Reduce contention on the proctree lock during heavy package build.
There is a proctree -> allproc ordering established.

Most of the time it is either xlock -> xlock or slock -> slock.

On fork however there is a slock -> xlock pair which results in
pathological wait times due to threads keeping proctree held for
reading and all waiting on allproc. Switch this to xlock -> xlock.
Longer term fix would get rid of proctree in this place to begin with.
Right now it is necessary to walk the session/process group lists to
determine which id is free. The walk can be avoided e.g. with bitmaps.

The exit path used to have one place which dealt with allproc and
then with proctree. Move the allproc acquire into the section protected
by proctree. This reduces contention against threads waiting on proctree
in the fork codepath - the fork proctree holder does not have to wait
for allproc as often.

Finally, move tidhash manipulation outside of the area protected by
either of these locks. The removal from the hash was already unprotected.
There is no legitimate reason to look up thread ids for a process still
under construction.

This results in about 50% wait time reduction during -j 128 package build.
2018-02-20 02:18:30 +00:00
Jeff Roberson
06220fa737 Further parallelize the buffer cache.
Provide multiple clean queues partitioned into 'domains'.  Each domain manages
its own bufspace and has its own bufspace daemon.  Each domain has a set of
subqueues indexed by the current cpuid to reduce lock contention on the cleanq.

Refine the sleep/wakeup around the bufspace daemon to use atomics as much as
possible.

Add a B_REUSE flag that is used to requeue bufs during the scan to approximate
LRU rather than locking the queue on every use of a frequently accessed buf.

Implement bufspace_reserve with only atomic_fetchadd to avoid loop restarts.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14274
2018-02-20 00:06:07 +00:00
Bryan Venteicher
88126356cf Add more virtqueue getter methods
MFC after:	2 weeks
2018-02-19 19:31:18 +00:00
Bryan Venteicher
985ed053e3 Add VirtIO bus config_generation method
VirtIO buses (PCI, MMIO) can provide a generation field so a driver
can ensure either a 64-bit or array read was stable.

MFC after:	2 weeks
2018-02-19 19:28:24 +00:00
Konstantin Belousov
9f74642385 Do not free(9) uninitialized pointer.
Reported and tested by:	allanjude
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
2018-02-19 19:08:25 +00:00
Bryan Venteicher
7a16dacdfa Add PCI methods to iterate over the PCI capabilities
VirtIO V1 provides configuration in multiple VENDOR capabilities so this
allows all of the configuration to be discovered.

Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14325
2018-02-19 18:41:56 +00:00
Hans Petter Selasky
e44fa94c09 Implement list_safe_reset_next() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-19 16:31:19 +00:00
Nathan Whitehorn
65184f89b6 Set internal error returns for OF_peer(), OF_child(), and OF_parent() to
zero, matching the IEEE 1275 standard. Since these internal error paths
have never, to my knowledge, been taken, behavior is unchanged.

Reported by:	gonzo
MFC after:	2 weeks
2018-02-19 15:49:14 +00:00
Andrey V. Elsukov
15bf717a93 Remove unused variables and sysctl declaration.
MFC after:	1 week
2018-02-19 12:20:51 +00:00
Andrey V. Elsukov
6ca39da354 Check packet length to do not make out of bounds access. Also save ah_nxt
value to use it later, since ah pointer can become invalid.

Reported by:	Maxime Villard <max at m00nbsd dot net>
MFC after:	5 days
2018-02-19 11:14:38 +00:00
Andriy Gapon
8cebd0e419 relax an assert in zfsctl_snapdir_lookup to match r323578
Since r323578 we may remove the last reference to a covered vnode with
vrele() instead of vput().  So, v_usecount may be decremented before
the vnode is locked and zfsctl_snapdir_lookup may "catch" the vnode
with v_usecount of zero and v_holdcnt of one.

PR:		225795
Reported by:	asomers
MFC after:	1 week
2018-02-19 08:55:22 +00:00
Hans Petter Selasky
0f839f3a6d When stepping the radix tree in the LinuxKPI make sure we
clear the least significant bits, so that no entries are
skipped.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-19 06:11:58 +00:00
Ian Lepore
eb69d1f144 Build at45d and mx25l SPI flash drivers as modules. 2018-02-19 01:49:19 +00:00
Ian Lepore
63cdf4affb Add ofw_bus_if.h to SRCS. 2018-02-19 01:39:02 +00:00
Ian Lepore
2aa5d9c4c8 Add modules/spi as a gathering point for SPI-related modules, analagous to
modules/i2c for i2c/iicbus modules.  Build spibus as a module.
2018-02-19 01:32:27 +00:00
Mateusz Guzik
2ca66c1ef5 Fix process exit vs reap race introduced in r329449
The race manifested itself mostly in terms of crashes with "spin lock
held too long".

Relevant parts of respective code paths:

exit:				reap:
PROC_LOCK(p);
PROC_SLOCK(p);
p->p_state == PRS_ZOMBIE
PROC_UNLOCK(p);
				PROC_LOCK(p);
/* exit work */
				if (p->p_state == PRS_ZOMBIE) /* true */
					proc_reap()
					free proc
/* more exit work */
PROC_SUNLOCK(p);

Thus a still exiting process is reaped.

Prior to the change the zombie check was followed by slock/sunlock trip
which prevented the problem.

Even code prior to this commit has a bug: the proc is still accessed for
statistic collection purposes. However, the severity is rather small and
the bug may be fixed in a future commit.

Reported by:	many
Tested by:	allanjude
2018-02-19 00:54:08 +00:00
Ian Lepore
a7e31772e7 Build ofw_iicbus as a module if OPT_FDT is defined. 2018-02-19 00:47:03 +00:00
Mateusz Guzik
d257698833 mtx: add mtx_spin_wait_unlocked
The primitive can be used to wait for the lock to be released. Intended
usage is for locks in structures which are about to be freed.

The benefit is the avoided interrupt enable/disable trip + atomic op to
grab the lock and shorter wait if the lock is held (since there is no
worry someone will contend on the lock, re-reads can be more aggressive).

Briefly discussed with:	 kib
2018-02-19 00:38:14 +00:00
Ian Lepore
747de77cd5 Provide public declarations for ofw_spibus_driver and ofw_spibus_devclass
so other drivers can refer to them in DRIVER_MODULE() decls.
2018-02-18 23:35:23 +00:00
Ian Lepore
dc027dc6e2 Provide a public function to acquire a gpio pin by giving the property name
and index.  A private function to do exactly that already existed, so this
renames gpio_pin_get_by_ofw_impl() to gpio_pin_get_by_ofw_propidx() and
provides a declaration for it in a public header.

Previously there were functions to get a pin by property name (assuming
there would only be one pin defined for the name), or by index (asuming
the property has the standard name "gpios").  It turns out there are
devicetree bindings that describe properties with names other than "gpios"
which can describe multiple pins.  Hence the need to retrieve the Nth item
from a named property.
2018-02-18 23:08:43 +00:00
Ian Lepore
5e2d748931 Add the MODULE_DEPEND()s needed so that the kernel linker can resolve all
the symbols at load time when iicbus is not compiled into the kernel.
2018-02-18 23:01:33 +00:00
Ian Lepore
adddeaadc4 Add iic_recover_bus.c, now part of iicbus. This should have been added
as part of r320463.
2018-02-18 22:57:04 +00:00
Ian Lepore
c99321621c Arrange SRCS= as 1 file per line, alphabetical, so it's easier to maintain.
Whitespace only, no functional changes.
2018-02-18 22:54:19 +00:00
Mateusz Guzik
7beb60820f exit: get rid of PROC_SLOCK when checking a process to report, take #2
The suspension counter needs synchronisation through slock, but we don't
need it to check if inspecting the counter is necessary to begin with.
In the common case it is not, thus avoid the lock if possible.

Reviewed by:	kib
Tested by:	pho
2018-02-18 21:07:15 +00:00
Ian Lepore
af85a3d172 Give the imx_i2c driver its own name, set up its relationship to ofw_iicbus.
Previously it called itself 'iichb' to link up with the EARLY_DRIVER_MODULE
declaration in ofw_iicbus.c.
2018-02-18 20:08:35 +00:00
Mariusz Zaborski
965cd21173 Fix broken assertion in r329520.
Reported by:	pho@ lwhsu@
2018-02-18 20:04:39 +00:00
Ian Lepore
c5fe9c7b20 Allow i2c hardware drivers to declare their own relationships to ofw_iicbus
rather than relying on a set of canned EARLY_DRIVER_MODULE() statements in
the ofw_iicbus source.  This means hw drivers will no longer be required to
use one of a few predefined driver names.  They will also now be able to
decide themselves if they want to use DRIVER_MODULE or EARLY_DRIVER_MODULE
and to set which pass to attach on for early modules.

Mainly, this adds extern declarations for the driver and devclass variables.
It also renames ofwiicbus_devclass to ofw_iicbus_devclass to be consistant
with the way we use ofw_ prefixes on this stuff.
2018-02-18 19:33:28 +00:00
Brooks Davis
7a095112b2 Correct/improve the descriptions if kern.ipc.(shmsegs,sema,msqids).
The description of kern.ipc.shmsegs was wrong since 2005.  I updated the
others (which were more correct) to match.

PR:		225933
Reviewed by:	cem
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14391
2018-02-18 19:19:36 +00:00
Hans Petter Selasky
8f294983e9 Optimise xchg() to use atomic_swap_32() and atomic_swap_64().
Suggested by:	kib@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-18 18:46:56 +00:00
Hans Petter Selasky
644680491e Fix implementation of xchg() function macro in the LinuxKPI.
The exchange operation must be atomic.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-18 17:37:23 +00:00
Scott Long
f0779b0452 Improve command lifecycle debugging and detection of problems.
Sponsored by:	Netflix
2018-02-18 16:41:34 +00:00
Mark Johnston
2fb9a51077 Don't include DMAR map entry zone items in kernel dumps.
Such items may be allocated in the I/O path used by the dumper,
potentially causing the dump to fail. Since there is some precedent
in the DMAR driver for avoiding this problem using _NODUMP, apply
this workaround to the zone as well.

Reported and tested by:	mmacy
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14422
2018-02-18 16:03:50 +00:00
Mariusz Zaborski
20641651ec Use the fdeget_locked function instead of the fget_locked in the
sys_capability.

Reviewed by:	pjd@ (earlier version)
Discussed with:	mjg@
2018-02-18 15:27:24 +00:00
Hans Petter Selasky
ead15282ae Implement support for radix_tree_for_each_slot() and radix_tree_exception()
in the LinuxKPI and use unsigned long type for the radix tree index.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-18 12:54:21 +00:00
Hans Petter Selasky
78d7441913 Implement the KMEM_CACHE() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 09:52:30 +00:00
Hans Petter Selasky
0628fc903e Make the vm_fault structure in the LinuxKPI compatible with
newer versions of the Linux kernel. No functional change.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 09:31:01 +00:00
Hans Petter Selasky
0597ffb0b5 Implement the rcu_dereference_raw() function macro.
Make sure all RCU dereferencing use the READ_ONCE() function macro.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 09:10:14 +00:00
Hans Petter Selasky
7c86047355 Implement __GFP_BITS_SHIFT and __GFP_BITS_MASK macros in the LinuxKPI.
Add compile time asserts to catch conflicts with native defines.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 08:58:20 +00:00
Hans Petter Selasky
15052dc861 Implement __list_del_entry() helper functions in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 08:47:15 +00:00
Hans Petter Selasky
d51be3591a Implement file_inode() and call_mmap() helper functions in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 08:40:07 +00:00
Hans Petter Selasky
b15a13af6b Refactor dentry structure into its own header file in the LinuxKPI similary
to Linux. No functional change. Implement d_inode() helper function.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 08:29:25 +00:00
Hans Petter Selasky
0424e413e7 Update the ktime type in the LinuxKPI to be a signed 64-bit integer similarly
to Linux, to avoid compilation issues. Implement ktime_get_real_seconds().

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
Sponsored by:	Limelight Networks
2018-02-18 08:05:40 +00:00
Ian Lepore
f82eace5b3 Build modules specific to imx5/imx6 only when building those kernels.
This adds sys/modules/imx with a SUBDIR makefile to make the whole
collection of modules that are specific to these SoCs.  Initially, that
"whole collection" consists of the if_ffec and imx_i2c drivers.

The if_ffec driver is referenced in its existing home in ../ffec rather
than moving it into the imx directory, because it's used by powerpc too,
but it is no longer built for all armv6/7 systems.

The imx_i2c driver is newly added as a module.
2018-02-18 02:48:54 +00:00
Ian Lepore
b107b904a6 Add a detach method so that this can be a kldunload-friendly module. 2018-02-18 02:01:41 +00:00
Ian Lepore
07fca7ace2 Fix fallout from the import of fresh dts source files from linux 4.15. It
appears that node names no longer include leading zeroes in the @address
qualifiers, so we have to search for the nodes involved in interrupt fixup
using both flavors of name to be compatible with old and new .dtb files.

(You know you're in a bad place when you're applying a workaround to code
that exists only as a workaround for another problem.)
2018-02-18 00:02:09 +00:00
Ian Lepore
02b84ad865 Don't call sdhci_cleanup_slot() if sdhci_init_slot() never got called.
Also, do callout_init() very early in attach, so that callout_drain()
can be called in detach without worrying about whether it ever got init'd.
2018-02-17 23:39:10 +00:00
Ian Lepore
47d67d2451 Do not try to deallocate memory that wasn't allocated (you'd think that
would be safe, but the function also tries to destroy mutexes that never
got created).

I guess this can only happen when imx_ehci_detach() is called on the
error-exit path from imx_ehci_attach(), and that path never got exercised
before today.
2018-02-17 23:23:27 +00:00
Hans Petter Selasky
9a323f25ab Implement spin_trylock_irq() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 22:45:15 +00:00
Hans Petter Selasky
1169b94c7b Stub more lockdep function macros in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 22:41:20 +00:00
Hans Petter Selasky
94b9710bc7 Implement get_task_pid() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 22:33:26 +00:00
Hans Petter Selasky
314d034088 Allow the put_user() function macro to put constant values by using the
existing __put_user() macro.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 21:47:15 +00:00
Hans Petter Selasky
2460cbb4a6 Implement BUILD_BUG_ON_INVALID() function macro in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 21:40:19 +00:00
Hans Petter Selasky
03f8ddedf0 Add support for printk_ratelimit() function macro and improve the existing
printk_ratelimited() function macro to return a boolean stating if there
was a printout, true, or not, false.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 21:25:19 +00:00
Justin Hibbits
bce6d88bc1 Merge AIM and Book-E PCPU fields
This is part of a long-term goal of merging Book-E and AIM into a single GENERIC
kernel.  As more work is done, the struct may be optimized further.

Reviewed by:	nwhitehorn
2018-02-17 20:59:12 +00:00
Hans Petter Selasky
e35dc5149d Add support for kref_read() function in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 20:56:35 +00:00
Hans Petter Selasky
13a27c3b43 Add support for mmgrab() function in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 20:52:54 +00:00
Hans Petter Selasky
2060ca654e Add support for __percpu and __weak macros in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 20:50:18 +00:00
Hans Petter Selasky
7353335d1c Move the IRQ_RETVAL() and irqreturn definitions to irqreturn.h in the
LinuxKPI to be compatible with Linux. No functional change.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 20:37:21 +00:00
Hans Petter Selasky
1249c589b6 Add checks for valid IRQ tag before setting up or tearing down an interrupt
handler in the LinuxKPI. This is needed when the interrupt handler is disabled
before freeing the interrupt.

MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-17 20:09:43 +00:00
Emmanuel Vadot
0f7a6420fe aw_mmc: Only change the clock if it has really changed
This also seems to fix problem when booting Pine64 from the mmc.

Tested On:	Pine64
Tested On:	Pine64-LTS
2018-02-17 18:30:25 +00:00
Mateusz Guzik
8bf6ff2226 Revert r329448.
Turns out is is actually racy, reproducible with stress2/misc/truss.sh

Requested by:	kib
2018-02-17 17:23:43 +00:00
Hans Petter Selasky
426ded2c0b Remove unused bus_autoconf section from usb.ko.
Sponsored by:	Mellanox Technologies
2018-02-17 14:44:03 +00:00
Hans Petter Selasky
171164e50d Revert redundant parts of r329440 after recent devmatch(8) changes.
Sponsored by:	Mellanox Technologies
2018-02-17 12:38:46 +00:00
Mateusz Guzik
e4ccf57fdc Undo LOCK_PROFILING pessimisation after r313454 and r313455
With the option used to compile the kernel both sx and rw shared ops would
always go to the slow path which added avoidable overhead even when the
facility is disabled.

Furthermore the increased time spent doing uncontested shared lock acquire
would be bogusly added to total wait time, somewhat skewing the results.

Restore old behaviour of going there only when profiling is enabled.

This change is a no-op for kernels without LOCK_PROFILING (which is the
default).
2018-02-17 12:07:09 +00:00
Mateusz Guzik
ad58e5e86c exit: stop doing PROC_SLOCK just to call proc_reap
It immediately does PROC_SUNLOCK anyway and the lock plays no role.
2018-02-17 09:03:11 +00:00
Mateusz Guzik
9c0e785c58 exit: get rid of PROC_SLOCK when checking a process to report
All accessed fields are protected with already held process lock.
2018-02-17 08:48:45 +00:00
Hans Petter Selasky
af4010be77 Compile fix for GCC in the LinuxKPI.
Older versions of GCC don't allow flexible array members in a union.
Use a zero length array instead.

MFC after:	1 week
Reported by:	jbeich@
Sponsored by:	Mellanox Technologies
2018-02-17 08:12:35 +00:00
Warner Losh
1e7e4fd25a Correct the PNP information generated by the usb driver to match the
artificial NOMATCH usb does in lieu of creating a device_t for devices
with no drivers. Also, correct bus to be 'uhub' since where USB
devices attach, even though 'usb' is more logical, we need the
physical bus here.

Submitted by: hps@
2018-02-17 06:57:17 +00:00
Warner Losh
a35ddacab7 Fixup minor nits in the PNP_INFO protocol.
Sponsored by: Netflix
2018-02-17 06:57:03 +00:00
Mateusz Guzik
015cd8dc93 On process exit signal the parent after dropping the proctree lock. 2018-02-17 00:24:50 +00:00
Mateusz Guzik
7e588b9219 Unref the prison after proctree is dropped. 2018-02-17 00:23:56 +00:00
Mateusz Guzik
65f29b9caa Postpone sx_sunlock(&proctree_lock) on fork until after allproc is dropped.
There is a significant contention on the lock during -j 128 package build.
This change drops total wait time on this lock by 60%.
2018-02-17 00:23:28 +00:00
Mateusz Guzik
6776bfeb8f Tidy up kern_wait6
- don't relock curproc in msleep
- don't relock proctree if P_STATCHILD is spotted
- reformat the proc_to_reap call in the main loop
2018-02-17 00:21:50 +00:00
Konstantin Belousov
fc97574bd3 Remove unused symbols.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-16 23:18:42 +00:00
Alan Somers
4571b2776f zfs: fix formatting in a log statement
Submitted by:	Dave Baukus <daveb@spectralogic.com>
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2018-02-16 21:59:08 +00:00
Roger Pau Monné
c2bddfdc51 xen/pv: remove the attach of the ISA bus from the Xen PV bus
There's no need to attach the ISA bus from the Xen PV one.

Sponsored by:           Citrix Systems R&D
2018-02-16 18:04:27 +00:00
Olivier Houchard
a72c9dc53f Define CK_MD_TSO for the relevant arches (i386, amd64 and sparc64).
Defaulting to CK_MD_RMO has the unfortunate side effect of generating
memory barriers that are useless on those arches, and the even more
unfortunate side effect of generating lfence/sfence/mfence on i386, even
if older CPUs don't support it.
This should fix the panic reported when using IPFW on a Pentium 3.
Note that mfence and sfence might still be used in a few case, but that
shouldn't happen in FreeBSD right now, and should be fixed upstream first.

MFC after:	1 week
2018-02-16 17:50:06 +00:00
Alan Somers
dfbc272d5d Handle generic pathconf attributes in the .zfs ctldir
MFC instructions: change the value of _PC_LINK_MAX to INT_MAX

Reported by:	jhb
MFC after:	19 days
X-MFC-With:	329265
Sponsored by:	Spectra Logic Corp
2018-02-16 16:56:09 +00:00
Hans Petter Selasky
f4824a028d Implement mutex_trylock_recursive() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 16:01:39 +00:00
Hans Petter Selasky
10ee3d3016 Implement memdup_user_nul() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:52:28 +00:00
Hans Petter Selasky
f1f7e04a29 Implement tasklet_enable() and tasklet_disable() in the LinuxKPI.
MFC after:	1 week
Requested by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:41:16 +00:00
Mark Johnston
16759360d4 Fix a memory leak introduced in r328426.
ffs_sbget() may return a superblock buffer even if it fails, so the
caller must be prepared to free it in this case. Moreover, when tasting
alternate superblock locations in a loop, ffs_sbget()'s readfunc
callback must free the previously allocated buffer.

Reported and tested by:	pho
Reviewed by:		kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D14390
2018-02-16 15:41:03 +00:00
Mark Johnston
3f060b60b1 Use the conventional name for an array of pages.
No functional change intended.

Discussed with:	kib
MFC after:	3 days
2018-02-16 15:38:22 +00:00
Ed Maste
b8283138cd Correct module symbol export handling
EXPORT_SYMS can be set to YES, NO, a list of symbols to export from a
module, or to a filename containing such a list.  For the case that it
is set to a symbol list, replace spaces in the list with newlines, so
the created file is in the format expected by kmod_syms.awk.

Reviewed by:	imp, jhb
MFC after:	1 month
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14284
2018-02-16 15:38:02 +00:00
Hans Petter Selasky
219ff59ce2 Implement enable_irq() and disable_irq() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:37:33 +00:00
Hans Petter Selasky
2a7c2b914f Allow the cmpxchg() macro in the LinuxKPI to work on pointers without
generating compiler warnings, -Wint-conversion .

Requested by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:20:21 +00:00
Ed Maste
0ba1b36553 Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.  Additional files
waiting on permission from others are listed in review D14210.

Approved by:	kan, marcel, sos, rdivacky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-02-16 15:00:14 +00:00
Konstantin Belousov
13cad9af82 Use local symbol for offset.
Small global symbols confuse ddb which matches them against small
unrelated displacements and makes the disassembly ugly.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-16 13:32:46 +00:00
Andriy Gapon
7b394c1066 move vintr_intercept_enabled under INVARIANTS
The function is not used outside of INVARIANTS since r328622.

MFC after:	1 week
2018-02-16 07:02:14 +00:00
Andriy Gapon
c945107d23 read-behind / read-ahead support for zfs_getpages()
ZFS caches blocks it reads in its ARC, so in general the optional
pages are not as useful as with filesystems that read the data
directly into the target pages.  But still the optional pages
are useful to reduce the number of page faults and associated
VM / VFS / ZFS calls.
Another case that gets optimized (as a side effect) is paging in
from a hole.  ZFS DMU does not currently provide a convenient
API to check for a hole.  Instead it creates a temporary zero-filled
block and allows accessing it as if it were a normal data block.
Getting multiple pages one by one from a hole results in repeated
creation and destruction of the temporary block (and an associated
ARC header).

Tested with fsx using various supported blocks sizes from 512 bytes
to 128 KB and additionally 1 MB.

Please note that in illumos and ZoL they do not do the range-locking in
the page-in path. This is because ZFS has a double-caching problem
between ARC and page cache and that requires zfs_read() and zfs_write()
to consult pages in the page cache. So, in those functions they first
lock a range and then lock pages corresponding to the range. While in
the page-in (and maybe page-out) path they first lock the pages and then
would lock the range. So, they would have a deadlock.

I believe that FreeBSD does not have that problem, because the page-in
deals only with invalid pages while zfs_read() and zfs_write() need to
access only valid pages. They do not wait on a busy page unless it's
already valid.

Reviewed by:	kib
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D14263
2018-02-16 06:59:35 +00:00
Anish Gupta
0b37d3d90e This change fixes duplicate detection of same IOMMU/AMD-Vi device for Ryzen with EFR support.
IVRS can have entry of type legacy and non-legacy present at same time for same AMD-Vi device. ivhd driver will ignore legacy if new IVHD type is present as specified in AMD-Vi specification. Earlier both of IVHD entries used and two ivhd devices were created.
Add support for new IVHD type 0x11 and 0x40 in ACPI. Create new struct of type acpi_ivrs_hardware_new for these new type of IVHDs. Legacy type 0x10 will continue to use acpi_ivrs_hardware.

Reviewed by:	avg
Approved by:	grehan
Differential Revision:https://reviews.freebsd.org/D13160
2018-02-16 05:17:00 +00:00
Brooks Davis
27b95863a9 Get rid of the requirement to include SysV IPC headers with _KERNEL
defined in ipcrm by introducing _WANT_SYSVxxx_INTERNALS defines.

Reviewed by:	jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14271
2018-02-16 01:33:01 +00:00
Brooks Davis
aff4f2d315 Reduce duplication in __acl_*_(file|link).
Add const to new kern_ functions and push down as required.

Reviewed by:	rwatson
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14174
2018-02-15 21:24:43 +00:00
Jung-uk Kim
ea4fe1da62 Change size of padding to reflect reality. No functional change.
Discussed with:		kib
2018-02-15 20:42:38 +00:00
Warner Losh
bc40691e40 Report the number of remaining retries when we have an error that
we're retrying.
2018-02-15 18:57:54 +00:00
Brooks Davis
d88fe103eb Reduce duplication in __mac_*_(file|link)(2) implementation.
Reviewed by:	rwatson
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14175
2018-02-15 18:57:22 +00:00
Brooks Davis
2feb5b8dc9 Regen after r329322. 2018-02-15 18:32:11 +00:00
Brooks Davis
a4dcd0ef22 Remove freebsd32_getdirentries(), it will be unused after the next
commit.
2018-02-15 18:31:43 +00:00
Brooks Davis
e4039d68fb Revert r329323. I missed something in my testing. 2018-02-15 17:58:51 +00:00
Mark Johnston
05f0f0e9ea Fix the test for SET_FOREACH termination.
Unlike the queue(3) _FOREACH macros, the iterator for a SET_FOREACH is
not NULL after the end of the set is reached.
2018-02-15 17:35:40 +00:00
Brooks Davis
a170bb0387 Regen after r329322: Fix getdirentries(2) under 32-bit compat.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14379
2018-02-15 17:27:19 +00:00
Brooks Davis
1f6023cf0b Fix getdirentries(2) under 32-bit compat.
The latest version of getdirentries (syscall 554) takes a pointer
an an off_t as the last argument. The old version which copies out
an int32_t was being used instead. Use the standard sys_getdirentries()
implementation instead.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14379
2018-02-15 17:26:30 +00:00
Olivier Houchard
0511fa0efd Rename the ACPI variant of the gicv2m driver from "gicv2m" to "gicv2m_acpi".
The FDT variant is called "gicv2m" too, and as both would try to register
on gic, only one of them would succeed, while we want them both in a
GENERIC kernel.

Reviewed by:	andrew
2018-02-15 15:46:14 +00:00
Andriy Gapon
113ce413ba MFV r329313: 8857 zio_remove_child() panic due to already destroyed parent zio
illumos/illumos-gate@d6e1c446d7
d6e1c446d7

https://www.illumos.org/issues/8857
  I had an OS panic on one of our servers:

  ffffff01809128c0 vpanic()
  ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80)
  ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80)
  ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0,
  ffffff3373370908)
  ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0)
  ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0)
  ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140)
  ffffff0180912b40 thread_start+8()

  It panicked here:
  http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/
  zio.c#430

  pio->io_lock is DEAD, thus a panic. Further analysis shows the "pio"
  (parent zio of "cio") has already been destroyed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>

PR:		223803
Tested by:	shiva.bhanujan@quorum.com
MFC after:	2 weeks
2018-02-15 14:46:29 +00:00
Mateusz Guzik
b345111b2b xen: fix smp boot after r328157
mce_stack was left unset leading to early crashes
2018-02-15 07:23:41 +00:00
Ravi Pokala
c756fb6ebb mxge(4) should pass unhandled ioctls to ether_ioctl()
Panasas discovered that ioctl(SIOCGLAGGPORT) returns ENOTTY for mxge(4) when
the NIC is not a member of a lagg. This came as a surprise, because the
SIOCGLAGGPORT handler in if_lagg.c only returns ENOENT (if run against the
laggX interface, rather than a physical port) or EINVAL (if run against a
non-member physical port). This behavior was not seen with other drivers,
such as bge(4), igb(4), and cxl(4). When I compared their respective ioctl
handlers, I found that they all called ether_ioctl() for the default (i.e.
unhandled) case; by contrast, mxge(4) only calls ether_ioctl() for two
specific cases, and returns ENOTTY for the default case.

Remove the two cases which explicitly call ether_ioctl(), and let the
default case call it instead. This matches what the vast majority of the NIC
drivers do.

Reviewed by:	kmacy
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14381
2018-02-15 03:22:04 +00:00
Conrad Meyer
5bd0149714 x86 pmap: Make memory mapped via pmap_qenter() non-executable
The idea is, the pmap_qenter() API is now defined to not produce executable
mappings.  If you need executable mappings, use another API.

Add pg_nx flag in pmap_qenter on x86 to make kernel pages non-executable.

Other architectures that support execute-specific permissons on page table
entries should subsequently be updated to match.

Submitted by:	Darrick Lew <darrick.freebsd AT gmail.com>
Reviewed by:	markj
Discussed with:	alc, jhb, kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14062
2018-02-14 23:35:47 +00:00
Eugene Grosbein
8be8c75688 ng_pppoe(8): add support for user-supplied Host-Uniq tag.
A few ISP filter PADI requests based on such tag,
to force the use of their own routers.
The custom Host-Uniq tag is passed in the NGM_PPPOE_CONNECT
control message, so it can be used with FreeBSD ppp(8)
and mpd without any other change.

Add support to send and receive PADM messages,
HURL and MOTM, often used by service providers to provide
ACS information and other configuration settings
to the user CPE.

Submitted by:	ale
Approved by:	mav (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D9270
2018-02-14 21:17:44 +00:00
Mateusz Guzik
f795032b47 rwlock: diff-reduction of runlock compared to sx sunlock 2018-02-14 20:37:33 +00:00
Alan Somers
834063202a gpart: append partition name to the underlying provider's physical path
If the underlying provider's physical path is null, then the gpart device's
physical path will be, too. Otherwise, it will append the partition name,
such as "/p1" or "/s1/a". This will make gpart work better with zfsd(8).

PR:		224965
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14010
2018-02-14 20:26:09 +00:00
Alan Somers
0bab7fa8a7 geli: append "/eli" to the underlying provider's physical path
If the underlying provider's physical path is null, then the geli device's
physical path will be, too. Otherwise, it will append "/eli".  This will make
geli work better with zfsd(8).

PR:		224962
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D13979
2018-02-14 20:15:32 +00:00
Bryan Drewery
70c144dc78 nanosleep(2): Fix bogus incrementing of rmtp by tc_tick_sbt on [EINTR].
sbt is the time in the future that the tsleep_sbt() is expected to be completed
at.  sbtt is the current time.  Depending on the precision with sysctl
kern.timecounter.alloweddeviation the start time may be incremented by
tc_tick_sbt.  The same increment is needed for the current time of sbtt before
calculating the difference.  The impact of missing this increment is that rmtp
may increase by one tc_tick_sbt on every early [EINTR] return.  If the same
struct is passed in for rqtp as rmtp this can result in rqtp effectively
incrementing by tc_tick_sbt and sleeping longer than originally intended.

This problem was introduced in r247797.

Reviewed by:	kib, markj, vangyzen (all on an older version of the test)
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14362
2018-02-14 18:43:50 +00:00
Alan Somers
d64bae1f1a Implement .vop_pathconf and .vop_getacl for the .zfs ctldir
zfsctl_common_pathconf will report all the same variables that regular ZFS
volumes report. zfsctl_common_getacl will report an ACL equivalent to 555,
except that you can't read xattrs or edit attributes.

Fixes a bug where "ls .zfs" will occasionally print something like:
ls: .zfs/.: Operation not supported

PR:		225793
Reviewed by:	avg
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D14365
2018-02-14 15:49:31 +00:00
Justin Hibbits
d793587fe2 Fix a panic introduced in r329225
Some GEOM partition tables may be destroyed with incomplete partition
entries.  Guard against this with NULL checks.

Reported by:	pholm,others
Reviewed by:	markj
Tested by:	pholm
2018-02-14 15:12:09 +00:00
Justin Hibbits
a00ce4e854 PPC64: Get the timestap from the proper OF field
Summary:
After revision rS328534('PPC64: use hwref instead of cpuid'), FreeBSD on
powerpc64 virtual machine panics since it is unable to read the
timebase, showing the following error:

     get-property for timebase-frequency on zero phandle

     panic: Unable to determine timebase frequency!

With the change above,  cpuref->cr_hwref does not contain the phandle
anymore, thus, it never reads the proper CPU entry in OF.

Submitted by:	Breno Leitao
Differential Revision:	https://reviews.freebsd.org/D14204
2018-02-14 02:51:28 +00:00
Justin Hibbits
26e251b55c powerpc64/pseries: Define new hcalls
Summary:
Define new hcalls as in 'Linux on Power Architecture Platform Reference'
version 1.1 (24 March 2016) downloaded from:

        https://members.openpowerfoundation.org/document/dl/469

Submitted by:	Breno Leitao
Differential Revision:	https://reviews.freebsd.org/D14281
2018-02-14 02:48:27 +00:00
Konstantin Belousov
ada27a3bb8 Cleanup unused page argument for vm_reserv_break().
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14364
2018-02-14 00:34:02 +00:00
Konstantin Belousov
d929ad7f91 Ensure memory consistency on COW.
From the submitter description:
The process is forked transitioning a map entry to COW
Thread A writes to a page on the map entry, faults, updates the pmap to
  writable at a new phys addr, and starts TLB invalidations...
Thread B acquires a lock, writes to a location on the new phys addr, and
  releases the lock
Thread C acquires the lock, reads from the location on the old phys addr...
Thread A ...continues the TLB invalidations which are completed
Thread C ...reads from the location on the new phys addr, and releases
  the lock

In this example Thread B and C [lock, use and unlock] properly and
neither own the lock at the same time.  Thread A was writing somewhere
else on the page and so never had/needed the lock. Thread C sees a
location that is only ever read|modified under a lock change beneath
it while it is the lock owner.

To fix this, perform the two-stage update of the copied PTE.  First,
the PTE is updated with the address of the new physical page with
copied content, but in read-only mode.  The pmap locking and the page
busy state during PTE update and TLB invalidation IPIs ensure that any
writer to the page cannot upgrade the PTE to the writable state until
all CPUs updated their TLB to not cache old mapping.  Then, after the
busy state of the page is lifted, the faults for write can proceed and
do not violate the consistency of the reads.

The change is done in vm_fault because most architectures do need IPIs
to invalidate remote TLBs.  More, I think that hardware guarantees of
atomicity of the remote TLB invalidation are not enough to prevent the
inconsistent reads of non-atomic reads, like multi-word accesses
protected by a lock.  So instead of modifying each pmap invalidation
code, I did it there.

Discovered and analyzed by: Elliott.Rabe@dell.com
Reviewed by:	markj
PR:	225584 (appeared to have the same cause)
Tested by:	Elliott.Rabe@dell.com, emaste, Mike Tancsa <mike@sentex.net>, truckman
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14347
2018-02-14 00:31:45 +00:00