Commit Graph

120801 Commits

Author SHA1 Message Date
Warner Losh
a35ddacab7 Fixup minor nits in the PNP_INFO protocol.
Sponsored by: Netflix
2018-02-17 06:57:03 +00:00
Mateusz Guzik
015cd8dc93 On process exit signal the parent after dropping the proctree lock. 2018-02-17 00:24:50 +00:00
Mateusz Guzik
7e588b9219 Unref the prison after proctree is dropped. 2018-02-17 00:23:56 +00:00
Mateusz Guzik
65f29b9caa Postpone sx_sunlock(&proctree_lock) on fork until after allproc is dropped.
There is a significant contention on the lock during -j 128 package build.
This change drops total wait time on this lock by 60%.
2018-02-17 00:23:28 +00:00
Mateusz Guzik
6776bfeb8f Tidy up kern_wait6
- don't relock curproc in msleep
- don't relock proctree if P_STATCHILD is spotted
- reformat the proc_to_reap call in the main loop
2018-02-17 00:21:50 +00:00
Konstantin Belousov
fc97574bd3 Remove unused symbols.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-16 23:18:42 +00:00
Alan Somers
4571b2776f zfs: fix formatting in a log statement
Submitted by:	Dave Baukus <daveb@spectralogic.com>
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2018-02-16 21:59:08 +00:00
Roger Pau Monné
c2bddfdc51 xen/pv: remove the attach of the ISA bus from the Xen PV bus
There's no need to attach the ISA bus from the Xen PV one.

Sponsored by:           Citrix Systems R&D
2018-02-16 18:04:27 +00:00
Olivier Houchard
a72c9dc53f Define CK_MD_TSO for the relevant arches (i386, amd64 and sparc64).
Defaulting to CK_MD_RMO has the unfortunate side effect of generating
memory barriers that are useless on those arches, and the even more
unfortunate side effect of generating lfence/sfence/mfence on i386, even
if older CPUs don't support it.
This should fix the panic reported when using IPFW on a Pentium 3.
Note that mfence and sfence might still be used in a few case, but that
shouldn't happen in FreeBSD right now, and should be fixed upstream first.

MFC after:	1 week
2018-02-16 17:50:06 +00:00
Alan Somers
dfbc272d5d Handle generic pathconf attributes in the .zfs ctldir
MFC instructions: change the value of _PC_LINK_MAX to INT_MAX

Reported by:	jhb
MFC after:	19 days
X-MFC-With:	329265
Sponsored by:	Spectra Logic Corp
2018-02-16 16:56:09 +00:00
Hans Petter Selasky
f4824a028d Implement mutex_trylock_recursive() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 16:01:39 +00:00
Hans Petter Selasky
10ee3d3016 Implement memdup_user_nul() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:52:28 +00:00
Hans Petter Selasky
f1f7e04a29 Implement tasklet_enable() and tasklet_disable() in the LinuxKPI.
MFC after:	1 week
Requested by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:41:16 +00:00
Mark Johnston
16759360d4 Fix a memory leak introduced in r328426.
ffs_sbget() may return a superblock buffer even if it fails, so the
caller must be prepared to free it in this case. Moreover, when tasting
alternate superblock locations in a loop, ffs_sbget()'s readfunc
callback must free the previously allocated buffer.

Reported and tested by:	pho
Reviewed by:		kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D14390
2018-02-16 15:41:03 +00:00
Mark Johnston
3f060b60b1 Use the conventional name for an array of pages.
No functional change intended.

Discussed with:	kib
MFC after:	3 days
2018-02-16 15:38:22 +00:00
Ed Maste
b8283138cd Correct module symbol export handling
EXPORT_SYMS can be set to YES, NO, a list of symbols to export from a
module, or to a filename containing such a list.  For the case that it
is set to a symbol list, replace spaces in the list with newlines, so
the created file is in the format expected by kmod_syms.awk.

Reviewed by:	imp, jhb
MFC after:	1 month
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14284
2018-02-16 15:38:02 +00:00
Hans Petter Selasky
219ff59ce2 Implement enable_irq() and disable_irq() in the LinuxKPI.
MFC after:	1 week
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:37:33 +00:00
Hans Petter Selasky
2a7c2b914f Allow the cmpxchg() macro in the LinuxKPI to work on pointers without
generating compiler warnings, -Wint-conversion .

Requested by:	Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Mellanox Technologies
2018-02-16 15:20:21 +00:00
Ed Maste
0ba1b36553 Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text.  To avoid license proliferation switch to
the standard 2-clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.  Additional files
waiting on permission from others are listed in review D14210.

Approved by:	kan, marcel, sos, rdivacky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-02-16 15:00:14 +00:00
Konstantin Belousov
13cad9af82 Use local symbol for offset.
Small global symbols confuse ddb which matches them against small
unrelated displacements and makes the disassembly ugly.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-16 13:32:46 +00:00
Andriy Gapon
7b394c1066 move vintr_intercept_enabled under INVARIANTS
The function is not used outside of INVARIANTS since r328622.

MFC after:	1 week
2018-02-16 07:02:14 +00:00
Andriy Gapon
c945107d23 read-behind / read-ahead support for zfs_getpages()
ZFS caches blocks it reads in its ARC, so in general the optional
pages are not as useful as with filesystems that read the data
directly into the target pages.  But still the optional pages
are useful to reduce the number of page faults and associated
VM / VFS / ZFS calls.
Another case that gets optimized (as a side effect) is paging in
from a hole.  ZFS DMU does not currently provide a convenient
API to check for a hole.  Instead it creates a temporary zero-filled
block and allows accessing it as if it were a normal data block.
Getting multiple pages one by one from a hole results in repeated
creation and destruction of the temporary block (and an associated
ARC header).

Tested with fsx using various supported blocks sizes from 512 bytes
to 128 KB and additionally 1 MB.

Please note that in illumos and ZoL they do not do the range-locking in
the page-in path. This is because ZFS has a double-caching problem
between ARC and page cache and that requires zfs_read() and zfs_write()
to consult pages in the page cache. So, in those functions they first
lock a range and then lock pages corresponding to the range. While in
the page-in (and maybe page-out) path they first lock the pages and then
would lock the range. So, they would have a deadlock.

I believe that FreeBSD does not have that problem, because the page-in
deals only with invalid pages while zfs_read() and zfs_write() need to
access only valid pages. They do not wait on a busy page unless it's
already valid.

Reviewed by:	kib
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D14263
2018-02-16 06:59:35 +00:00
Anish Gupta
0b37d3d90e This change fixes duplicate detection of same IOMMU/AMD-Vi device for Ryzen with EFR support.
IVRS can have entry of type legacy and non-legacy present at same time for same AMD-Vi device. ivhd driver will ignore legacy if new IVHD type is present as specified in AMD-Vi specification. Earlier both of IVHD entries used and two ivhd devices were created.
Add support for new IVHD type 0x11 and 0x40 in ACPI. Create new struct of type acpi_ivrs_hardware_new for these new type of IVHDs. Legacy type 0x10 will continue to use acpi_ivrs_hardware.

Reviewed by:	avg
Approved by:	grehan
Differential Revision:https://reviews.freebsd.org/D13160
2018-02-16 05:17:00 +00:00
Brooks Davis
27b95863a9 Get rid of the requirement to include SysV IPC headers with _KERNEL
defined in ipcrm by introducing _WANT_SYSVxxx_INTERNALS defines.

Reviewed by:	jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14271
2018-02-16 01:33:01 +00:00
Brooks Davis
aff4f2d315 Reduce duplication in __acl_*_(file|link).
Add const to new kern_ functions and push down as required.

Reviewed by:	rwatson
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14174
2018-02-15 21:24:43 +00:00
Jung-uk Kim
ea4fe1da62 Change size of padding to reflect reality. No functional change.
Discussed with:		kib
2018-02-15 20:42:38 +00:00
Warner Losh
bc40691e40 Report the number of remaining retries when we have an error that
we're retrying.
2018-02-15 18:57:54 +00:00
Brooks Davis
d88fe103eb Reduce duplication in __mac_*_(file|link)(2) implementation.
Reviewed by:	rwatson
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14175
2018-02-15 18:57:22 +00:00
Brooks Davis
2feb5b8dc9 Regen after r329322. 2018-02-15 18:32:11 +00:00
Brooks Davis
a4dcd0ef22 Remove freebsd32_getdirentries(), it will be unused after the next
commit.
2018-02-15 18:31:43 +00:00
Brooks Davis
e4039d68fb Revert r329323. I missed something in my testing. 2018-02-15 17:58:51 +00:00
Mark Johnston
05f0f0e9ea Fix the test for SET_FOREACH termination.
Unlike the queue(3) _FOREACH macros, the iterator for a SET_FOREACH is
not NULL after the end of the set is reached.
2018-02-15 17:35:40 +00:00
Brooks Davis
a170bb0387 Regen after r329322: Fix getdirentries(2) under 32-bit compat.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14379
2018-02-15 17:27:19 +00:00
Brooks Davis
1f6023cf0b Fix getdirentries(2) under 32-bit compat.
The latest version of getdirentries (syscall 554) takes a pointer
an an off_t as the last argument. The old version which copies out
an int32_t was being used instead. Use the standard sys_getdirentries()
implementation instead.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14379
2018-02-15 17:26:30 +00:00
Olivier Houchard
0511fa0efd Rename the ACPI variant of the gicv2m driver from "gicv2m" to "gicv2m_acpi".
The FDT variant is called "gicv2m" too, and as both would try to register
on gic, only one of them would succeed, while we want them both in a
GENERIC kernel.

Reviewed by:	andrew
2018-02-15 15:46:14 +00:00
Andriy Gapon
113ce413ba MFV r329313: 8857 zio_remove_child() panic due to already destroyed parent zio
illumos/illumos-gate@d6e1c446d7
d6e1c446d7

https://www.illumos.org/issues/8857
  I had an OS panic on one of our servers:

  ffffff01809128c0 vpanic()
  ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80)
  ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80)
  ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0,
  ffffff3373370908)
  ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0)
  ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0)
  ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140)
  ffffff0180912b40 thread_start+8()

  It panicked here:
  http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/
  zio.c#430

  pio->io_lock is DEAD, thus a panic. Further analysis shows the "pio"
  (parent zio of "cio") has already been destroyed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>

PR:		223803
Tested by:	shiva.bhanujan@quorum.com
MFC after:	2 weeks
2018-02-15 14:46:29 +00:00
Mateusz Guzik
b345111b2b xen: fix smp boot after r328157
mce_stack was left unset leading to early crashes
2018-02-15 07:23:41 +00:00
Ravi Pokala
c756fb6ebb mxge(4) should pass unhandled ioctls to ether_ioctl()
Panasas discovered that ioctl(SIOCGLAGGPORT) returns ENOTTY for mxge(4) when
the NIC is not a member of a lagg. This came as a surprise, because the
SIOCGLAGGPORT handler in if_lagg.c only returns ENOENT (if run against the
laggX interface, rather than a physical port) or EINVAL (if run against a
non-member physical port). This behavior was not seen with other drivers,
such as bge(4), igb(4), and cxl(4). When I compared their respective ioctl
handlers, I found that they all called ether_ioctl() for the default (i.e.
unhandled) case; by contrast, mxge(4) only calls ether_ioctl() for two
specific cases, and returns ENOTTY for the default case.

Remove the two cases which explicitly call ether_ioctl(), and let the
default case call it instead. This matches what the vast majority of the NIC
drivers do.

Reviewed by:	kmacy
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14381
2018-02-15 03:22:04 +00:00
Conrad Meyer
5bd0149714 x86 pmap: Make memory mapped via pmap_qenter() non-executable
The idea is, the pmap_qenter() API is now defined to not produce executable
mappings.  If you need executable mappings, use another API.

Add pg_nx flag in pmap_qenter on x86 to make kernel pages non-executable.

Other architectures that support execute-specific permissons on page table
entries should subsequently be updated to match.

Submitted by:	Darrick Lew <darrick.freebsd AT gmail.com>
Reviewed by:	markj
Discussed with:	alc, jhb, kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14062
2018-02-14 23:35:47 +00:00
Eugene Grosbein
8be8c75688 ng_pppoe(8): add support for user-supplied Host-Uniq tag.
A few ISP filter PADI requests based on such tag,
to force the use of their own routers.
The custom Host-Uniq tag is passed in the NGM_PPPOE_CONNECT
control message, so it can be used with FreeBSD ppp(8)
and mpd without any other change.

Add support to send and receive PADM messages,
HURL and MOTM, often used by service providers to provide
ACS information and other configuration settings
to the user CPE.

Submitted by:	ale
Approved by:	mav (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D9270
2018-02-14 21:17:44 +00:00
Mateusz Guzik
f795032b47 rwlock: diff-reduction of runlock compared to sx sunlock 2018-02-14 20:37:33 +00:00
Alan Somers
834063202a gpart: append partition name to the underlying provider's physical path
If the underlying provider's physical path is null, then the gpart device's
physical path will be, too. Otherwise, it will append the partition name,
such as "/p1" or "/s1/a". This will make gpart work better with zfsd(8).

PR:		224965
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D14010
2018-02-14 20:26:09 +00:00
Alan Somers
0bab7fa8a7 geli: append "/eli" to the underlying provider's physical path
If the underlying provider's physical path is null, then the geli device's
physical path will be, too. Otherwise, it will append "/eli".  This will make
geli work better with zfsd(8).

PR:		224962
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D13979
2018-02-14 20:15:32 +00:00
Bryan Drewery
70c144dc78 nanosleep(2): Fix bogus incrementing of rmtp by tc_tick_sbt on [EINTR].
sbt is the time in the future that the tsleep_sbt() is expected to be completed
at.  sbtt is the current time.  Depending on the precision with sysctl
kern.timecounter.alloweddeviation the start time may be incremented by
tc_tick_sbt.  The same increment is needed for the current time of sbtt before
calculating the difference.  The impact of missing this increment is that rmtp
may increase by one tc_tick_sbt on every early [EINTR] return.  If the same
struct is passed in for rqtp as rmtp this can result in rqtp effectively
incrementing by tc_tick_sbt and sleeping longer than originally intended.

This problem was introduced in r247797.

Reviewed by:	kib, markj, vangyzen (all on an older version of the test)
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14362
2018-02-14 18:43:50 +00:00
Alan Somers
d64bae1f1a Implement .vop_pathconf and .vop_getacl for the .zfs ctldir
zfsctl_common_pathconf will report all the same variables that regular ZFS
volumes report. zfsctl_common_getacl will report an ACL equivalent to 555,
except that you can't read xattrs or edit attributes.

Fixes a bug where "ls .zfs" will occasionally print something like:
ls: .zfs/.: Operation not supported

PR:		225793
Reviewed by:	avg
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D14365
2018-02-14 15:49:31 +00:00
Justin Hibbits
d793587fe2 Fix a panic introduced in r329225
Some GEOM partition tables may be destroyed with incomplete partition
entries.  Guard against this with NULL checks.

Reported by:	pholm,others
Reviewed by:	markj
Tested by:	pholm
2018-02-14 15:12:09 +00:00
Justin Hibbits
a00ce4e854 PPC64: Get the timestap from the proper OF field
Summary:
After revision rS328534('PPC64: use hwref instead of cpuid'), FreeBSD on
powerpc64 virtual machine panics since it is unable to read the
timebase, showing the following error:

     get-property for timebase-frequency on zero phandle

     panic: Unable to determine timebase frequency!

With the change above,  cpuref->cr_hwref does not contain the phandle
anymore, thus, it never reads the proper CPU entry in OF.

Submitted by:	Breno Leitao
Differential Revision:	https://reviews.freebsd.org/D14204
2018-02-14 02:51:28 +00:00
Justin Hibbits
26e251b55c powerpc64/pseries: Define new hcalls
Summary:
Define new hcalls as in 'Linux on Power Architecture Platform Reference'
version 1.1 (24 March 2016) downloaded from:

        https://members.openpowerfoundation.org/document/dl/469

Submitted by:	Breno Leitao
Differential Revision:	https://reviews.freebsd.org/D14281
2018-02-14 02:48:27 +00:00
Konstantin Belousov
ada27a3bb8 Cleanup unused page argument for vm_reserv_break().
Reviewed by:	markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14364
2018-02-14 00:34:02 +00:00
Konstantin Belousov
d929ad7f91 Ensure memory consistency on COW.
From the submitter description:
The process is forked transitioning a map entry to COW
Thread A writes to a page on the map entry, faults, updates the pmap to
  writable at a new phys addr, and starts TLB invalidations...
Thread B acquires a lock, writes to a location on the new phys addr, and
  releases the lock
Thread C acquires the lock, reads from the location on the old phys addr...
Thread A ...continues the TLB invalidations which are completed
Thread C ...reads from the location on the new phys addr, and releases
  the lock

In this example Thread B and C [lock, use and unlock] properly and
neither own the lock at the same time.  Thread A was writing somewhere
else on the page and so never had/needed the lock. Thread C sees a
location that is only ever read|modified under a lock change beneath
it while it is the lock owner.

To fix this, perform the two-stage update of the copied PTE.  First,
the PTE is updated with the address of the new physical page with
copied content, but in read-only mode.  The pmap locking and the page
busy state during PTE update and TLB invalidation IPIs ensure that any
writer to the page cannot upgrade the PTE to the writable state until
all CPUs updated their TLB to not cache old mapping.  Then, after the
busy state of the page is lifted, the faults for write can proceed and
do not violate the consistency of the reads.

The change is done in vm_fault because most architectures do need IPIs
to invalidate remote TLBs.  More, I think that hardware guarantees of
atomicity of the remote TLB invalidation are not enough to prevent the
inconsistent reads of non-atomic reads, like multi-word accesses
protected by a lock.  So instead of modifying each pmap invalidation
code, I did it there.

Discovered and analyzed by: Elliott.Rabe@dell.com
Reviewed by:	markj
PR:	225584 (appeared to have the same cause)
Tested by:	Elliott.Rabe@dell.com, emaste, Mike Tancsa <mike@sentex.net>, truckman
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14347
2018-02-14 00:31:45 +00:00
Konstantin Belousov
607970bc8e Do not call pmap_enter() with invalid protection mode.
If the map entry elookup was performed due to the mapping changes, we
need to ensure that there is still some access permission bit
requested which is compatible with the current vm_map_entry mode.  If
not, restart the handler from scratch instead of trying to save the
current progress.

Also adjust fault_type to not include cleared permission bits.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14347
2018-02-14 00:25:18 +00:00
Conrad Meyer
a4f2dfa6fa kgssapi: Remove trivial deadcode
CID:		1385956
Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-02-14 00:12:03 +00:00
Landon J. Fuller
0f18e6f6d6 bwn(4): Conditionalize "RX decryption attempted" message on a new
BWN_DEBUG_HWCRYPTO debug flag.

The MAC will attempt decryption (and set BWN_RX_MAC_DEC) even if a key has
not been supplied to the hardware; this is expected behavior, and there's
no need to spam users' console with this debugging printf.
2018-02-13 20:07:40 +00:00
Mark Johnston
6026dcd7ca Add support for zstd-compressed user and kernel core dumps.
This works similarly to the existing gzip compression support, but
zstd is typically faster and gives better compression ratios.

Support for this functionality must be configured by adding ZSTDIO to
one's kernel configuration file. dumpon(8)'s new -Z option is used to
configure zstd compression for kernel dumps. savecore(8) now recognizes
and saves zstd-compressed kernel dumps with a .zst extension.

Submitted by:	cem (original version)
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13101,
			https://reviews.freebsd.org/D13633
2018-02-13 19:28:02 +00:00
Ed Maste
83ab0f9f33 amd64/pmap: Move Foundation copyright to the 2-clause section
Sponsored by:	The FreeBSD Foundation
2018-02-13 19:19:26 +00:00
Mark Johnston
b9625b0d3b Move zstd malloc()/free()/calloc() macros to stdlib.h.
The definitions otherwise leak into anything that includes zstd.h,
which is not desirable for native FreeBSD code.

Reviewed by:	allanjude, cem, imp
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14352
2018-02-13 19:18:00 +00:00
Ed Maste
85b648927a libkern: use nul for terminating char rather than 0
Akin to the change made in r188080 for lib/libc/string/.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
2018-02-13 19:17:48 +00:00
Bryan Drewery
501608626d ports modules: Don't leak AUTO_OBJ changes into the port builds.
This came about when r328489 made ports modules builds no longer use the
in-tree share/mk files, but didn't cleanup MAKEOBJDIR from the
environment.

This fixes "Variable OBJTOP is recursive".

Sponsored by:	Dell EMC
2018-02-13 17:51:16 +00:00
Landon J. Fuller
ded5af888d bwn(4): txpid2g/txpid5g[lh] are not defined after sromrev 7; the default
indices into the TX power gain table should be used instead.

This enables use of bwn(4) with later BCM4321 revisions.

Reported by:	Trev Roydhouse
2018-02-13 17:43:54 +00:00
Justin Hibbits
08a3b42fdb Narrow a race, and fix a leak, in g_part_wither
A race in g_part_wither() can lead to I/O being performed with a freed GEOM
when the device disappears.  Close the race as best as we can for now,
following the code patterns from g_part_ctl_destroy() and g_part_ctl_undo().
This also fixes a leak, as g_wither_geom() does not wither providers, it
only orphans them, so the partition entries would never get destroyed in
g_wither_washer().

Note, this is not a complete fix, it can still race with g_part_start(), the
race has merely been narrowed.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
2018-02-13 17:40:09 +00:00
Ian Lepore
157f3d7649 Fix bad indentation. Whitespace only, no functional changes.
Reported by:	bde@
2018-02-13 17:38:08 +00:00
Hans Petter Selasky
33ec1ccbae Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.

Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826

Sponsored by:	Mellanox Technologies
2018-02-13 17:04:34 +00:00
Li-Wen Hsu
92ddc7b86d Fix non-64-bit platform build by printing bus_addr_t values using %#jx
Reviewed by:	slm
Differential Revision:	https://reviews.freebsd.org/D14344
2018-02-13 16:26:06 +00:00
Konstantin Belousov
67dcd64ab8 linuxkpi: Do not leak pages on put.
When the owner of the wire reference releases the last reference, it
might be that the page was already attempted to be freed (but free
cannot be performed at that time due to wire).  Check that the page
was removed from the object as the indicator of the free attempt and
finish the free operation if so.

Reported and tested by:	Slava Shwartsman
Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-02-13 15:44:35 +00:00
Konstantin Belousov
c4be9169c0 Do not leak rv->psind in some specific situations.
Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver).  Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting.  Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld.  In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.

Clean psind on vm_reserv_break() to avoid the situation.

Reported and tested by:	Slava Shwartsman
Reviewed by:	markj
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14335
2018-02-13 15:36:28 +00:00
Konstantin Belousov
c688c9051b Fix build with gas.
Do not use C constant suffixes.  Bit values are small enough to not
require typing, despite they are used for 64bit MSR writes.  The added
cast in hw_ibrs_recalculate() is redundand but I prefer to add it for
clarity.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-13 15:30:31 +00:00
Hans Petter Selasky
9ce3abcf79 Fix for incorrect PnP information used by devmatch(8).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-13 11:43:57 +00:00
Hans Petter Selasky
abf52ec001 Add new USB quirk.
PR:		225844
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-13 08:13:20 +00:00
Jeff Roberson
e958ad4cf3 Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc().  Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by:	markj
Discussed with:	kib, glebius
Tested by:	pho (earlier version)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14273
2018-02-12 22:53:00 +00:00
Landon J. Fuller
e8a81142ab bwn(4): Fix outstanding bug in PHY-G tssi2dbm table generation caught by
-Wconstant-conversion, and remove now unnecessary warning suppression
flags.
2018-02-12 22:21:11 +00:00
Eric van Gyzen
0bbfb20fe5 Update the MTU in affected routes when IPv6 RA changes the MTU
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU.  Update the routes so they do not provide
stale MTUs.

This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.

Reported and tested by:	Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by:	dab, melifaro
Discussed with:	ae
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14257
2018-02-12 19:49:20 +00:00
Landon J. Fuller
7c7c726bca siba(4): Ignore disabled per-core address match entries.
Previously, the address regions described by disabled admatch entries would
be treated as being mapped to the given core; while incorrect, this was
essentially harmless given that the entries describe unused address space
on the few affected devices.

We now perform parsing of per-core admatch registers and interrupt flags in
siba_erom, correctly skip any disabled admatch entries, and use the
siba_erom API in siba_add_children() to perform enumeration of attached
cores.
2018-02-12 19:36:26 +00:00
Alan Somers
21b8a7a6ca Fix a comment. No functional change.
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2018-02-12 17:42:28 +00:00
Ian Lepore
bd54c5acba Add a new sysctl, debug.clock_do_io, to allow manully triggering a one-shot
read or write of all registered realtime clocks.  In the read case, the
values read are simply discarded.  For writes, there's no alternative but
to actually write the current system time to the device.
2018-02-12 17:41:11 +00:00
Ian Lepore
45eee6db6f Add a set of convenience routines for RTC drivers to use for debug output,
and a debug.clock_show_io sysctl to control debugging output.
2018-02-12 17:33:14 +00:00
Jonathan T. Looney
48fca66157 Mark the pages used for the initial page-table entries as wired. This
makes them consistent with the way other page-table pages are allocated.
It also provides the rest of the VM system a good clue that these pages
are used.

Reviewed by:	alc, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14269
2018-02-12 17:27:50 +00:00
Ian Lepore
aedc51f11a Replace the existing print_ct() private debugging function with a set of
three public functions to format and print the three major data structures
used by realtime clock drivers (clocktime, bcd_clocktime, and timespec).
2018-02-12 16:25:56 +00:00
Warner Losh
7cafeaa1fd Add Lua as a scripting langauge to /boot/loader
liblua glues the lua run time into the boot loader. It implements all
the runtime routines that lua expects. In addition, it has a few
standard 'C' headers that nueter various aspects of the LUA build that
are too specific to lua to be in libsa. Many refinements from the
original code to improve implementation and the number of included lua
libraries. Use int64_t for lua_Number. Have "/boot/lua" be the default
module path. Numerous cleanups from the original GSoC project,
including hacking libsa to allow lua to be built with only one change
outside luaconf.h.

Add the final bit of lua glue to bring in liblua and plug into the
multiple interpreter framework, previously committed.

Add LOADER_LUA option, currently off by default.

Presently, this is an experimental option. One must opt-in to using
this by defining WITH_LOADER_LUA and WITHOUT_FORTH. It's been
lightly tested, so keep a backup copy of your old loader handy.
The menu code, coming in the next commit, hasn't been exhaustively
tested. A LUA boot loader is 60k larger than a FORTH one, which is
80k larger than a no-interpreter one. Subtle changes in size
may tip things past some subtle limit (the binary is ~430k now
when built with LUA). A future version may offer coexistance.

Bump FreeBSD version to 1200058 to mark the milestone.

Pedro Souza's 2014 Summer of Code project. Rui Paulo, Pedro Arthur,
Zakary Nafziger and Wojciech A. Koszek also contributed. Warner Losh
reworked it extensively into its current form.

Obtained from: https://wiki.freebsd.org/SummerOfCode2014/LuaLoader
Sponsored by: Google Summer of Code
Relnotes: Yes
MFC After: 1 month
Differential Review: https://reviews.freebsd.org/D14295
2018-02-12 15:31:53 +00:00
Warner Losh
62bca77843 Move __va_list and related defines to sys/sys/_types.h
__va_list and related defines are identical in all the
ARCH/include/_types.h files. Move them to sys/sys/_types.h

Sponsored by: Netflix
2018-02-12 14:48:20 +00:00
Warner Losh
982e7bdafc We don't support gcc < 4.2.1, so varargs.h now is just #error
always. Unifdef for versions prior to 4.2.1 and remove now-unused
header files.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14323
2018-02-12 14:48:14 +00:00
Warner Losh
33e959abab Use standard pattern for stdargs.h
We don't support older compilers. Most of the code in these files is
for pre-3.0 gcc, which is at least 15 years obsolete. Move to using
phk's sys/_stdargs.h for all these platforms.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14323
2018-02-12 14:48:05 +00:00
Tycho Nightingale
58a6aaf7ec Provide further mitigation against CVE-2017-5715 by flushing the
return stack buffer (RSB) upon returning from the guest.

This was inspired by this linux commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kvm?id=117cc7a908c83697b0b737d15ae1eb5943afe35b

Reviewed by:	grehan
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14272
2018-02-12 14:45:27 +00:00
Scott Long
63b1a33514 Print out the shared memory queues during initialization
Sponsored by:	Netflix
2018-02-11 20:15:47 +00:00
Brooks Davis
ad704a34bc Use syscall_helper_register(9) rather than syscall_register().
The usage is simpler, documented, and more common.

Reviewed by:	cem
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14227
2018-02-11 18:37:08 +00:00
Warner Losh
38c35fec7a Consistent macro indentation is the hobgoblin of small minds
Line up the macro definitions and names here like the machine/stdarg.h
files that this replaced. This is easier to read and also makes it
easier to match up with other includes. Also two space indent va_end
to match the rest of the surrounding if block.
2018-02-11 17:45:38 +00:00
Ian Lepore
4a8b6f54d1 Add a device ID to uftdi for TIAO USB Multi Protocol Adapter (TUMPA).
PR:		225810
2018-02-11 16:35:23 +00:00
Conrad Meyer
b42712a8b7 Add GUID and alias for Apple APFS partition
PR:		225813
Submitted by:	James Wright <james.wright AT jigsawdezign.com>
2018-02-11 06:57:20 +00:00
Emmanuel Vadot
dc9b494e2b dts: Update our device tree sources files from Linux 4.15 2018-02-10 15:29:46 +00:00
Andrey V. Elsukov
b2fe54bc8b Reinitialize IP header length after checksum calculation. It is used
later by TCP-MD5 code.

This fixes the problem with broken TCP-MD5 over IPv4 when NIC has
disabled TCP checksum offloading.

PR:		223835
MFC after:	1 week
2018-02-10 10:13:17 +00:00
Brooks Davis
946c028ec1 Use syscall_helper_register() to register syscalls and initialize though
the module interface.

This is the more common approach and the syscall_helper interface is
easier to understand.

Reviewed by:	jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14251
2018-02-10 01:09:22 +00:00
Alexander Motin
4f5d657343 Teach mps(4) and mpr(4) drivers to autotune chain frames.
This is a first part of the change.  It makes the drivers to calculate
the required number of chain frames to satisfy worst case scenarios, but
it does not change existing overly strict limits on them.  The next step
will be to rewrite the allocator to not require megabytes of physically
contiguous address space, that may be problematic if done after boot,
after doing which the limits can be removed.  Until that this code can
just correct user set limits, if they are set too high.

Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14261
2018-02-10 00:55:46 +00:00
Alexander Motin
bc55f7298e Add sysctls for dnode block and indirect block shifts.
MFC after:	2 weeks
2018-02-09 23:29:50 +00:00
Jung-uk Kim
ff879b0799 MFV: r329072
Merge ACPICA 20180209.
2018-02-09 21:49:38 +00:00
Nathan Whitehorn
778c8dac96 Fix PowerMac G5 thermal management, plus likely other bugs, introduced in
r328113 and affecting SMP systems.

The way the time is set on PowerMacs is racy and relies on all the
CPUs in the system setting a register simultaneously in a rendezvous. A
few-cycle delay can result in out-of-sync times, which can break the
scheduler and result in calls like mtx_sleep() and pause() never timing out
if the thread is migrated while sleeping. r328113 added a call to a no-op
function between the beginning of the rendezvous and setting the time that
was only called on APs and added enough cycles to cause a problematic offset.
For some reason, the fan-management code was the first place this appeared.

Clue from:	andreast
Reported by:	many
2018-02-09 20:09:32 +00:00
Kirk McKusick
13a025d5d8 Merge biodone_finish() back into biodone(). The primary purpose is
to make the order of operations clearer to avoid the race condition
that was fixed in r328914. In particular, this commit corrects a
similar race that existed in the soft updates callback.

Doing some sleuthing through the SVN repository, it appears that
bufdone_finish() was added to support XFS:

------------------------------------------------------------------------
r153192 | rodrigc | 2005-12-06 19:39:08 -0800 (Tue, 06 Dec 2005) | 13 lines

Changes imported from XFS for FreeBSD project:
- add fields to struct buf (needed by XFS)
    - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3
    - b_pin_count, count of pinned buffer

- add new B_MANAGED flag
- add breada() function to initiate asynchronous I/O on read-ahead blocks.
- add bufdone_finish(), bpin(), bunpin_wait() functions

Patches provided by:    kan
Reviewed by:            phk
Silence on:             arch@

------------------------------------------------------------------------

It does not appear to ever have been used for anything else.  XFS was
disconnected in r241607:

------------------------------------------------------------------------
r241607 | attilio | 2012-10-16 03:04:00 -0700 (Tue, 16 Oct 2012) | 5 lines

Disconnect non-MPSAFE XFS from the build in preparation for dropping
GIANT from VFS.

This is not targeted for MFC.

------------------------------------------------------------------------

and removed entirely in r247631:

------------------------------------------------------------------------
r247631 | attilio | 2013-03-02 07:33:54 -0800 (Sat, 02 Mar 2013) | 5 lines

Garbage collect XFS bits which are now already completely disconnected
from the tree since few months.

This is not targeted for MFC.

------------------------------------------------------------------------

Since XFS support is gone, there is no reason to retain biodone_finish().

Suggested by: Warner Losh (imp)
Discussed with: cem, kib
Tested by: Peter Holm (pho)
2018-02-09 19:50:47 +00:00
Jonathan T. Looney
31ba4c7b5b On bootup, the amd64 pmap initialization code creates page-table
mappings for the pages used for the kernel and some initial allocations
used for the page table. It maps the kernel and the blocks used for
these initial allocations using 2MB pages.

However, if the kernel does not end on a 2MB boundary, it still maps the
last portion using a 2MB page, but reports that the unused 4K blocks
within this 2MB allocation are free physical blocks. This means that
these same physical blocks could also be mapped elsewhere - for example,
into a user process. Given the proximity to the kernel text and data
area, it seems wise to avoid allowing someone to write data to physical
blocks also mapped into these virtual addresses.

(Note that this isn't a security vulnerability: the direct map makes
most/all memory on the system mapped into kernel space. And, nothing
in the kernel should be trying to access these pages, as the virtual
addresses are unused. It simply seems wise to avoid reusing these
physical blocks while they are mapped to virtual addresses so close
to the kernel text and data area.)

Consequently, let's reserve the physical blocks covered by the
page-table mappings for these initial allocations.

Reviewed by:	kib, markj
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D14268
2018-02-09 17:46:33 +00:00
Gleb Smirnoff
f7d3578564 Fix boot_pages exhaustion on machines with many domains and cores, where
size of UMA zone allocation is greater than page size. In this case zone
of zones can not use UMA_MD_SMALL_ALLOC, and we  need to postpone switch
off of this zone from startup_alloc() until full launch of VM.

o Always supply number of VM zones to uma_startup_count(). On machines
  with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over
  a page. In the latter case account VM zones for number of allocations
  from the zone of zones.
o Rewrite startup_alloc() so that it will immediately switch off from
  itself any zone that is already capable of running real alloc.
  In worst case scenario we may leak a single page here. See comment
  in uma_startup_count().
o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some
  extra SYSINITs, e.g. vm_page_init() may sneak in before.
o While here, remove uma_boot_pages_mtx. With recent changes to boot
  pages calculation, we are guaranteed to use all of the boot_pages
  in the early single threaded stage.

Reported & tested by:	mav
2018-02-09 04:45:39 +00:00
Eric van Gyzen
43105e589a Fix ICMPv6 redirects
icmp6_redirect_input() validates that a redirect packet came from the
current gateway for the respective destination.  To do this, it compares
the source address, which has an embedded scope zone id, to the next-hop
address, which does not.  If the address is link-local, which should be
the case, the comparison fails and the redirect is ignored.

Insert the scope zone id into the next-hop address so the comparison
is accurate.

Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases.

Submitted by:	Farrell Woods <Farrell_Woods@Dell.com> (initial revision)
Reviewed by:	ae melifaro dab
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14254
2018-02-09 00:13:05 +00:00
Kirk McKusick
068beacf21 The goal of this change is to prevent accidental foot shooting by
folks running filesystems created on check-hash enabled kernels
(which I will call "new") on a non-check-hash enabled kernels (which
I will call "old). The idea here is to detect when a filesystem is
run on an old kernel and flag the filesystem so that when it gets
moved back to a new kernel, it will not start getting a slew of
check-hash errors.

Back when the UFS version 2 filesystem was created, it added a file
flag FS_INDEXDIRS that was to be set on any filesystem that kept
some sort of on-disk indexing for directories. The idea was precisely
to solve the issue we have today. Specifically that a newer kernel
that supported indexing would be able to tell that the filesystem
had been run on an older non-indexing kernel and that the indexes
should not be used until they had been rebuilt. Since we have never
implemented on-disk directory indicies, the FS_INDEXDIRS flag is
cleared every time any UFS version 2 filesystem ever created is
mounted for writing.

This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH
flag. Thus, the FS_METACKHASH is definitively known to have always
been cleared. The FS_INDEXDIRS flag has been moved to a new block
of flags that will always be cleared starting with this commit
(until they get used to implement some future feature which needs
to detect that the filesystem was mounted on a kernel that predates
the new feature).

If a filesystem with check-hashes enabled is mounted on an old
kernel the FS_METACKHASH flag is cleared. When that filesystem is
mounted on a new kernel it will see that the FS_METACKHASH has been
cleared and clears all of the fs_metackhash flags. To get them
re-enabled the user must run fsck (in interactive mode without the
-y flag) which will ask for each supported check hash whether it
should be rebuilt and enabled. When fsck is run in its default preen
mode, it will just ignore the check hashes so they will remain
disabled.

The kernel has always disabled any check hash functions that it
does not support, so as more types of check hashes are added, we
will get a non-surprising result. Specifically if filesystems get
moved to kernels supporting fewer of the check hashes, those that
are not supported will be disabled. If the filesystem is moved back
to a kernel with more of the check-hashes available and fsck is run
interactively to rebuild them, then their checking will resume.
Otherwise just the smaller subset will be checked.

A side effect of this commit is that filesystems running with
cylinder-group check hashes will stop having them checked until
fsck is run to re-enable them (since none of them currently have
the FS_METACKHASH flag set). So, if you want check hashes enabled
on your filesystems after booting a kernel with these changes, you
need to run fsck to enable them. Any newly created filesystems will
have check hashes enabled. If in doubt as to whether you have check
hashes emabled, run dumpfs and look at the list of enabled flags
at the end of the superblock details.
2018-02-08 23:06:58 +00:00
Dimitry Andric
b1562cfa89 Pull in r324594 from upstream clang trunk (by Alexander Ivchenko):
Fix for #31362 - ms_abi is implemented incorrectly for values >=16
  bytes.

  Summary:
  This patch is a fix for following issue:
  https://bugs.llvm.org/show_bug.cgi?id=31362 The problem was caused by
  front end lowering C calling conventions without taking into account
  calling conventions enforced by attribute. In this case win64cc was
  no correctly lowered on targets other than Windows.

  Reviewed By: rnk (Reid Kleckner)

  Differential Revision: https://reviews.llvm.org/D43016

  Author: belickim <mateusz.belicki@intel.com>

This fixes clang 6.0.0 assertions when building the emulators/wine and
emulators/wine-devel ports, and should also make it use the correct
Windows calling conventions.  Bump __FreeBSD_version to make the fix
easy to detect.

PR:		224863
MFC after:	3 months
X-MFC-With:	r327952
2018-02-08 21:11:48 +00:00
Brooks Davis
de55bbfc5b Modernize nfssvc(2) registartion.
Use syscall_helper_register() to register syscalls and do it through the
module interface rather than sysinit.

This pattern is more common and easier to understand.

Reviewed by:	jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14232
2018-02-08 20:09:42 +00:00
Mark Johnston
ab7c09f121 Use vm_page_unwire_noq() instead of directly modifying page wire counts.
No functional change intended.

Reviewed by:	alc, kib (previous revision)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14266
2018-02-08 19:28:51 +00:00
Andriy Gapon
f11548e1da remove a duplicate assignment
There should be no functional change.

MFC after:	1 week
2018-02-08 13:22:40 +00:00
Brooks Davis
f849badfe4 style(9): use a type for each member in struct defintions.
Shorten a comment to fit in 80 columns.
2018-02-08 00:42:28 +00:00
Brooks Davis
dff2e0e4de Remove part of a comment reverting to nonexistant struct members. 2018-02-07 23:45:13 +00:00
Andriy Gapon
b2387aa652 exec_map_first_page: fix an inverse condition introduced in r254138
While the bug itself was serious, as we could either pass a non-busied
page to vm_pager_get_pages() or leak a busy page, it could only be
triggered under a very rare condition where the page is already inserted
into the object, but it is not valid yet.

Reviewed by:	kib
MFC after:	2 weeks
2018-02-07 21:51:59 +00:00
Navdeep Parhar
919da4ceff iw_cxgbe: Remove declaration of a function that no longer exists. 2018-02-07 20:13:08 +00:00
Andrey V. Elsukov
99493f5a4a Remove duplicate #include <netinet/ip_var.h>. 2018-02-07 19:12:05 +00:00
Andrey V. Elsukov
b99a682320 Rework ipfw dynamic states implementation to be lockless on fast path.
o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
  for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
  need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
  information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
  dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
  lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
  and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
  O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
  prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
  separate arrays. 65535 limit to maximum number of hash buckets now
  removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
  and for parent states.
o By default now is used Jenkinks hash function.

Obtained from:	Yandex LLC
MFC after:	42 days
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D12685
2018-02-07 18:59:54 +00:00
Hans Petter Selasky
c70e38e40c Give USB template SYSUNINIT()'s a uniq name to avoid symbol name collision
when building stand/usb. Regression after r328194.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-07 18:46:08 +00:00
Warner Losh
802baf0ba6 Cull Atmel board configs no longer relevant.
Remove most of the Atmel at91 boards. Most of them are no longer
relevant or used by people. Kept ATMEL since it should work on all the
boards that still work (I've not confirmed this, since I don't have
all these boards). Also kept SAM9G20EK, since I have several boards
that it is used on. If I've deleted a kernel in error, please let me
know.
2018-02-07 18:33:53 +00:00
Warner Losh
9d602e4e3d Fix cut and pasted comments to reflect differences in code from the
original source.

Sponsored by: Netflix
2018-02-07 18:33:46 +00:00
Gleb Smirnoff
5073a08328 Fix three miscalculations in amount of boot pages:
o Most of startup zones have struct uma_slab embedded into the slab,
  so provide macro UMA_SLAB_SPACE and use it instead of UMA_SLAB_SIZE,
  when calculating how many pages would certain kind of allocations
  require. Some zones are offpage, so we might have a positive inaccuracy.
o The keg for the zone of zones is allocated "dynamically", so we
  need +1 when calculating amount of pages for kegs. [1]
o The zones of zones and zones of kegs have arbitrary alignment of 32,
  and this also needs to be accounted for. [2]

While here, spread more comments and improve diagnostic messages.

Reported by:	pho [1], jtl [2]
2018-02-07 18:32:51 +00:00
Alex Richardson
f7925608a5 Fix compilation of mips_postboot_fixup() with a C11 compiler
The _Alignas specifier must come before the declaration and not after. It
works if _Alignas() expands to __attribute__(aligned(x)) which was the only
case I tested before.

Approved By:	jhb (mentor)
2018-02-07 16:58:01 +00:00
Mark Johnston
1d3a1bcfac Dequeue wired pages lazily.
Previously, wiring a page would cause it to be removed from its page
queue. In the common case, unwiring causes it to be enqueued at the tail
of that page queue. This change modifies vm_page_wire() to not dequeue
the page, thus avoiding the highly contended page queue locks. Instead,
vm_page_unwire() takes care of requeuing the page as a single operation,
and the page daemon dequeues wired pages as they are encountered during
a queue scan to avoid needlessly revisiting them later. For pages in
PQ_ACTIVE we do even better, since a requeue is unnecessary.

The change improves scalability for some common workloads. For instance,
threads wiring pages into the buffer cache no longer need to modify
global page queues, and unwiring is usually done by the bufspace thread,
so concurrency is not as much of an issue. As another example, many
sysctl handlers wire the output buffer to avoid faults on copyout, and
since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid
modifying the page queue in this case.

The change also adds a block comment describing some properties of
struct vm_page's reference counters, and the busy lock.

Reviewed by:	jeff
Discussed with:	alc, kib
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D11943
2018-02-07 16:57:10 +00:00
Hans Petter Selasky
5b7cc89266 Fix implementation of ktime_add_ns() and ktime_sub_ns() in the LinuxKPI to
actually return the computed result instead of the input value.

This is a regression issue after r289572.

Found by:	gcc6
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-02-07 12:12:06 +00:00
Adrian Chadd
f6ede6300f [ath] Use the BSSID address logic for STA VAPs too.
For DWDS VAPs on ath(4) we need to ensure that the STA vap and hostap VAP
have different MAC addresses.  If the STA code path doesn't utilise the
address assign / reclaim path then it doesn't update the bitmap with which
address was allocated.

This should fix a bunch of corner issues I've been seeing with DWDS STA + AP
VAPs that I was working around with manual MAC address assignment.
2018-02-07 09:37:22 +00:00
Adrian Chadd
037fb51a2e [ar71xx] Fix the TL-wdr3600/tl-wdr4300 hints in the new world order.
Tested:

* tl-wdr4300
2018-02-07 09:35:47 +00:00
Kyle Evans
87fb7f5b58 if_awg: Skip emac reset if configured for internal PHY
On the OrangePi One at least, emac reset when an ethernet cable is not
plugged in seems to break ethernet. Soft reset will fail, even with
increasing the delay and retries to wait for up to 20 seconds. This can be
reproduced across at least two different OrangePi One's by simply leaving
ethernet cable unplugged when awg attaches. Whether it's plugged in or not
through u-boot process makes no difference.

Skipping the reset in this configuration doesn't seem to cause any problems,
tried across many many reboots with and without ethernet cable plugged in.

Tested on:	OrangePi One
Tested on:	Other boards (manu)
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D13974
2018-02-07 01:54:13 +00:00
Warner Losh
c4b72d8b37 Keep a counter for number of requests completed with an error.
Sponsored by: Netflix
2018-02-06 23:21:08 +00:00
Pedro F. Giffuni
7cbd6d338e {ext2|ufs}_readdir: Avoid setting negative ncookies.
ncookies cannot be negative or the allocator will fail. This should only
happen if a caller is very broken but we can still try to survive the
event.

We should probably also verify for uio_resid > MAXPHYS but in that case
it is not clear that just clipping the ncookies value is an adequate
response.

MFC after:	2 weeks
2018-02-06 22:38:19 +00:00
Ian Lepore
ce75945d3c Use const pointers for input data not modified by clock utility functions. 2018-02-06 22:17:01 +00:00
Gleb Smirnoff
d2be4a1e4f Use correct arithmetic to calculate how many pages we need for kegs
and hashes.  There is no functional change with current sizes.
2018-02-06 22:13:40 +00:00
Jeff Roberson
e2068d0bcd Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state.  Protect reservations with the free lock
from the domain that they belong to.  Refactor to make vm domains more
of a first class object.

Reviewed by:    markj, kib, gallatin
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14000
2018-02-06 22:10:07 +00:00
Gleb Smirnoff
1616767dfc Improve DIAGNOSTIC printf. Report using a boot page every time regardless
of booted status.
2018-02-06 22:08:43 +00:00
Gleb Smirnoff
ae941b1b4e Fix boot_pages calculation for machines that don't have UMA_MD_SMALL_ALLOC.
o Call uma_startup1() after initializing kmem, vmem and domains.
o Include 8 eight VM startup pages into uma_startup_count() calculation.
o Account for vmem_startup() and vm_map_startup() preallocating pages.
o Account for extra two allocations done by kmem_init() and vmem_create().
o Hardcode the place of execution of vm_radix_reserve_kva(). Using SYSINIT
  allowed several other SYSINITs to sneak in before it, thus bumping
  requirement for amount of boot pages.
2018-02-06 22:06:59 +00:00
Scott Long
964107031b Cache the value of the request and reply frame size since it's used quite
a bit in the normal operation of the driver.  Covert it to represent bytes
instead of 32bit words.  Fix what I believe to be is a bug in this respect
with the Tri-mode cards.

Sponsored by:	Netflix
2018-02-06 21:01:38 +00:00
Bjoern A. Zeeb
6b9159b96b Remove a trailing whitspace. 2018-02-06 19:14:15 +00:00
Mark Johnston
4d2653522d Delete a declaration for a variable removed in r305362. 2018-02-06 17:26:11 +00:00
Mark Johnston
0d02f6c201 Simplify synchronization read error handling.
Since synchronization reads are performed by submitting a request to
the external mirror provider, we know that the request returns with an
error only when gmirror was unable to read a copy of the block from any
mirror. Thus, there is no need to retry the request from the
synchronization error handler.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-02-06 16:02:33 +00:00
Alexander Motin
62a09ee976 Fix queue length reporting in mps(4) and mpr(4).
Both drivers were found to report CAM bigger queue depth then they really
can handle.  It made them later under high load with many disks return
some of submitted requests back with CAM_REQUEUE_REQ status for later
resubmission.

Reviewed by:	scottl
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D14215
2018-02-06 16:02:25 +00:00
Kenneth D. Merry
e2997a03b7 Diagnostic buffer fixes for the mps(4) and mpr(4) drivers.
In mp{r,s}_diag_register(), which is used to register diagnostic
buffers with the mp{r,s}(4) firmware, we allocate DMAable memory.

There were several issues here:
 o No checking of the bus_dmamap_load() return value.  If the load
   failed or got deferred, mp{r,s}_diag_register() continued on as if
   nothing had happened.  We now check the return value and bail
   out if it fails.

 o No waiting for a deferred load callback.  bus_dmamap_load()
   calls a supplied callback when the mapping is done.  This is
   generally done immediately, but it can be deferred.
   mp{r,s}_diag_register() did not check to see whether the callback
   was already done before proceeding on.  We now sleep until the
   callback is done if it is deferred.

 o No call to bus_dmamap_sync(... BUS_DMASYNC_PREREAD) after the
   memory is allocated and loaded.  This is necessary on some
   platforms to synchronize host memory that is going to be updated
   by a device.

Both drivers would also panic if the firmware was reinitialized while
a diagnostic buffer operation was in progress.  This fixes that problem
as well.  (The driver will reinitialize the firmware in various
circumstances, but the problem I ran into was that the firmware would
generate an IOC Fault due to a PCIe error.)

mp{r,s}var.h:
	Add a new structure, struct mpr_busdma_context, that is
	used for deferred busdma load callbacks.

	Add a prototype for mp{r,s}_memaddr_wait_cb().
mp{r,s}.c:
	Add a new busdma callback function, mp{r,s}_memaddr_wait_cb().
	This provides synchronization for callers that want to
	wait on a deferred bus_dmamap_load() callback.

mp{r,s}_user.c:
	In bus_dmamap_register(), add a call to bus_dmamap_sync()
	with the BUS_DMASYNC_PREREAD flag set after an allocation
	is loaded.

	Also, check the return value of bus_dmamap_load().  If it
	fails, bail out.  If it is EINPROGRESS, wait for the
	callback to happen.  We use an interruptible sleep (msleep
	with PCATCH) and let the callback clean things up if we get
	interrupted.

	In mpr_diag_read_buffer() and mps_diag_read_buffer(), call
	bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD) before copying
	the data out to make sure the data is in stable storage.

	In mp{r,s}_post_fw_diag_buffer() and
	mp{r,s}_release_fw_diag_buffer(), check the reply to see
	whether it is NULL.  It can be NULL (and the command non-NULL)
	if the controller gets reinitialized while we're waiting for
	the command to complete but the driver structures aren't
	reallocated.  The driver structures generally won't be
	reallocated unless there is a firmware upgrade that changes
	one of the IOCFacts.

	When freeing diagnostic buffers in mp{r,s}_diag_register()
	and mp{r,s}_diag_unregister(), zero/NULL out the buffer after
	freeing it.  This will prevent a duplicate free in some
	situations.

Sponsored by:	Spectra Logic
Reviewed by:	mav, scottl
MFC after:	1 week
Differential Revision:	D13453
2018-02-06 15:58:22 +00:00
Alex Richardson
e911aac76a Make mips_postboot_fixup work when building the kernel with clang+lld
The compiler/linker can align fake_preload anyway it would like. When
building the kernel with gcc+bfd this always happened to be a multiple of 8.
When I built the kernel with clang and linked with lld fake_preload
happened to only be aligned to 4 bytes which caused a an ADDRS trap because
the compiler will emit sd instructions to store to this buffer.

Reviewed By:	jhb, imp
Approved By:	jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D14018
2018-02-06 15:41:15 +00:00
Kyle Evans
a123333f3a dtb/allwinner: Add sun7i-a20-lamobo-r1.dts (Banana Pi R1)
FreeBSD boots on this board, but the ethernet switch is not currently
supported, resulting in no ethernet.

A U-Boot port will be added once the ethernet switch is at least basically
supported, but we add its DTS to the build here to lower the barrier-to-boot
while work is underway.
2018-02-06 14:57:03 +00:00
Adrian Chadd
2ba4bf8fa5 [arswitch] Implement the switch MAC address fetch API.
The placeholders are here for some future "set" MAC address API.

Tested:

* AR9340 switch
* AR8327 switch
2018-02-06 08:35:49 +00:00
Adrian Chadd
15bd1a867e [etherswitch] add initial support for potentially configuring and fetching the switch MAC address.
Switches that originate their own frames (eg obvious ones like Pause frames)
need a MAC address to use to send those frames from.

This API will hopefully begin to allow that to be configurable.
2018-02-06 08:34:50 +00:00
Scott Long
4b07a5606c Fix a case where a request frame can be composed that requires 2 or more
SGList elements, but there's only enough space in the request frame for
either 1 element or a chain frame pointer.  Previously, the code would
hit the wrong case, add the SGList element, but then fail to add the
chain frame due to lack of space.  Re-arrange the code to catch this case
earlier and handle it.

Sponsored by:	Netflix
2018-02-06 06:55:55 +00:00
Scott Long
99e7a4ad9e Return a C errno for cam_periph_acquire().
There's no compelling reason to return a cam_status type for this
function and doing so only creates confusion with normal C
coding practices. It's technically an API change, but the periph API
isn't widely used. No efffective change to operation.

Reviewed by:	imp, mav, ken
Sponsored by:	Netflix
Differential Revision:	D14063
2018-02-06 06:42:25 +00:00
Gleb Smirnoff
f4bef67c9c Followup on r302393 by cperciva, improving calculation of boot pages required
for UMA startup.

o Introduce another stage of UMA startup, which is entered after
  vm_page_startup() finishes. After this stage we don't yet enable buckets,
  but we can ask VM for pages. Rename stages to meaningful names while here.
  New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
  BOOT_RUNNING.
  Enabling page alloc earlier allows us to dramatically reduce number of
  boot pages required. What is more important number of zones becomes
  consistent across different machines, as no MD allocations are done before
  the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
  startup_alloc(), however that may change, so vm_page_startup() provides
  its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
  functions calculates sizes of zones zone and kegs zone, and calculates how
  many pages UMA will need to bootstrap.
  It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
  mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
  but also as args.size.

Reviewed by:		imp, gallatin (earlier version)
Differential Revision:	https://reviews.freebsd.org/D14054
2018-02-06 04:16:00 +00:00
Kirk McKusick
47806d1b93 Occasional cylinder-group check-hash errors were being reported on
systems running with a heavy filesystem load. Tracking down this
bug was elusive because there were actually two problems. Sometimes
the in-memory check hash was wrong and sometimes the check hash
computed when doing the read was wrong. The occurrence of either
error caused a check-hash mismatch to be reported.

The first error was that the check hash in the in-memory cylinder
group was incorrect. This error was caused by the following
sequence of events:

- We read a cylinder-group buffer and the check hash is valid.
- We update its cg_time and cg_old_time which makes the in-memory
  check-hash value invalid but we do not mark the cylinder group dirty.
- We do not make any other changes to the cylinder group, so we
  never mark it dirty, thus do not write it out, and hence never
  update the incorrect check hash for the in-memory buffer.
- Later, the buffer gets freed, but the page with the old incorrect
  check hash is still in the VM cache.
- Later, we read the cylinder group again, and the first page with
  the old check hash is still in the VM cache, but some other pages
  are not, so we have to do a read.
- The read does not actually get the first page from disk, but rather
  from the VM cache, resulting in the old check hash in the buffer.
- The value computed after doing the read does not match causing the
  error to be printed.

The fix for this problem is to only set cg_time and cg_old_time as
the cylinder group is being written to disk. This keeps the in-memory
check-hash valid unless the cylinder group has had other modifications
which will require it to be written with a new check hash calculated.
It also requires that the check hash be recalculated in the in-memory
cylinder group when it is marked clean after doing a background write.

The second problem was that the check hash computed at the end of the
read was incorrect because the calculation of the check hash on
completion of the read was being done too soon.

- When a read completes we had the following sequence:

  - bufdone()
  -- b_ckhashcalc (calculates check hash)
  -- bufdone_finish()
  --- vfs_vmio_iodone() (replaces bogus pages with the cached ones)

- When we are reading a buffer where one or more pages are already
  in memory (but not all pages, or we wouldn't be doing the read),
  the I/O is done with bogus_page mapped in for the pages that exist
  in the VM cache. This mapping is done to avoid corrupting the
  cached pages if there is any I/O overrun. The vfs_vmio_iodone()
  function is responsible for replacing the bogus_page(s) with the
  cached ones. But we were calculating the check hash before the
  bogus_page(s) were replaced. Hence, when we were calculating the
  check hash, we were partly reading from bogus_page, which means
  we calculated a bad check hash (e.g., because multiple pages have
  been mapped to bogus_page, so its contents are indeterminate).

The second fix is to move the check-hash calculation from bufdone()
to bufdone_finish() after the call to vfs_vmio_iodone() so that it
computes the check hash over the correct set of pages.

With these two changes, the occasional cylinder-group check-hash
errors are gone.

Submitted by: David Pfitzner <dpfitzner@netflix.com>
Reviewed by: kib
Tested by: David Pfitzner
2018-02-06 00:19:46 +00:00
Konstantin Belousov
5370a081a8 Move signal trampolines out of locore.s into separate source file.
Similar to other arches, the move makes the subject of locore.s only
the kernel startup.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-02-06 00:02:30 +00:00
Landon J. Fuller
d177c19903 bwn(4): migrate bwn(4) to the native bhnd(9) interface, and drop siba_bwn.
- Remove the shim interface that allowed bwn(4) to use either siba_bwn or
  bhnd(4), replacing all siba_bwn calls with their bhnd(4) bus equivalents.
- Drop the legay, now-unused siba_bwn bus driver.
- Clean up bhnd(4) board flag defines referenced by bwn(4).

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D13518
2018-02-05 23:38:15 +00:00
John Baldwin
15746ef43a Ignore relocation tables for non-memory-resident sections.
As a followup to r328101, ignore relocation tables for ELF object
sections that are not memory resident.  For modules loaded by the
loader, ignore relocation tables whose associated section was not
loaded by the loader (sh_addr is zero).  For modules loaded at runtime
via kldload(2), ignore relocation tables whose associated section is
not marked with SHF_ALLOC.

Reported by:	Mori Hiroki <yamori813@yahoo.co.jp>, adrian
Tested on:	mips, mips64
MFC after:	1 month
Sponsored by:	DARPA / AFRL
2018-02-05 23:35:33 +00:00
John Baldwin
d722231bca Always give ELF brands a chance to veto a match.
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D13945
2018-02-05 23:27:42 +00:00
Adrian Chadd
f6f7bec6f8 [arswitch] disable ARP copy-to-CPU port for AR9340 for now.
I'll have to go double check to see if it does indeed pass ARP frames between
switch ports with this disabled, but it seems required for the CPU port to see
ARP traffic.

I'll dig into this some more.
2018-02-05 20:37:29 +00:00
Adrian Chadd
45a7272a5b [arswitch] fix build breakage.
Apparently the last time I checked building this it didn't pick up that the
header had changed.
2018-02-05 20:30:53 +00:00
Brooks Davis
0a2c60c371 Reduce duplication in extattr_*_(file|link) syscalls.
Reviewed by:	rwatson
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14173
2018-02-05 19:06:34 +00:00
Brooks Davis
02bc058f79 ANSIfy syscall implementations.
Reviewed by:	rwatson
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14172
2018-02-05 18:58:55 +00:00
Ed Maste
8a3b44cfc2 Additional linuxolator whitespace cleanup, missed in r328890 2018-02-05 18:39:06 +00:00
Brooks Davis
7dea788b91 Garbage collect trailing whitespace.
Sponsored by:	DARPA, AFRL
2018-02-05 18:06:54 +00:00
Ed Maste
132f90c660 Linuxolator whitespace cleanup
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator.  Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.

Sponsored by:	Turing Robotic Industries Inc.
2018-02-05 17:29:12 +00:00
Pedro F. Giffuni
fdc154e44a ext2fs: remove EXT4F_RO_INCOMPAT_SUPP
This was a hack to be able to mount ext4 filesystems read-only while not
supporting all the features. We now support all those features so it
doesn't make sense to keep the undocumented hack.

Discussed with:	fsu
2018-02-05 15:14:01 +00:00
Ed Maste
85059bc4ad Move assym.s to DPSRCS in linux modules
assym.s exists only to be included by other .s files, and should not
actually be assembled by itself.

Sponsored by:	Turing Robotic Industries Inc.
2018-02-05 14:53:18 +00:00
Pedro F. Giffuni
f86f5cd406 ext2fs: Cleanup variable assignments for extents.
Delay the initialization of variables until the are needed.

In the case of ext4_ext_rm_leaf(), make sure 'error' value is not
undefined.

Reported by:		Clang's static analyzer
Differential Revision:	https://reviews.freebsd.org/D14193
2018-02-05 14:30:27 +00:00
Andriy Gapon
be4be0e5ca zfs: move a utility function, ioflags, closer to its consumers
No functional change.

MFC after:	1 week
2018-02-05 14:19:36 +00:00
Konstantin Belousov
20e4afbfbb On munlock(), unwire correct page.
It is possible, for complex fork()/collapse situations, to have
sibling address spaces to partially share shadow chains. If one
sibling performs wiring, it can happen that a transient page, invalid
and busy, is installed into a shadow object which is visible to other
sibling for the duration of vm_fault_hold().  When the backing object
contains the valid page, and the wiring is performed on read-only
entry, the transient page is eventually removed.

But the sibling which observed the transient page might perform the
unwire, executing vm_object_unwire().  There, the first page found in
the shadow chain is considered as the page that was wired for the
mapping.  It is really the page below it which is wired.  So we unwire
the wrong page, either triggering the asserts of breaking the page'
wire counter.

As the fix, wait for the busy state to finish if we find such page
during unwire, and restart the shadow chain walk after the sleep.

Reported and tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14184
2018-02-05 12:49:20 +00:00
Andrey V. Elsukov
68e0e5a673 Modify ip6_get_prevhdr() to be able use it safely.
Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.

In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.

Reported by:	Maxime Villard <max at m00nbsd dot net>
Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14158
2018-02-05 09:22:07 +00:00
Adrian Chadd
f76883d64c [arswitch] Enable ATU dump support for the AR9340.
This indeed uses the same registers as the AR8216 and later chips.

There seems to be an issue with ARP requests being sent out from the CPU
through this switch here, so figuring that out is next.  Learning works fine on
the AR8327 ethernet switch on the /other/ gigabit ethernet port, so I don't
think it's the network stack or ethernet driver.

Tested:

* DB120 - AR9340 SOC + ethernet switch (and other bits.)
2018-02-05 07:05:28 +00:00
Adrian Chadd
7f81a20479 [arswitch] fix mac address field definition.
Whilst here, add some further fields for future experimenting.

Tested:

* AR9340 switch
* AR9330 switch
* AR7240 switch
2018-02-05 07:03:45 +00:00
Adrian Chadd
7ed0831923 [arswitch] Break out of the loop upon any error, not just -1.
This fixes the AR9340 "unimplemented" thingy for now.
2018-02-05 05:51:37 +00:00
Adrian Chadd
286a5a1c7e [ar71xx] Fix DB120 AHB device hints in the new world order.
This allows the on-chip (AHB bus) device to attach correctly as a module.

Tested:

* DB120, AR9344 (SoC + 2x2 2G wifi) + QCA9580 PCI 3x3 5G wifi
2018-02-05 04:48:41 +00:00
Adrian Chadd
431017d066 [ar71xx] AR934x is a MIPS74k board - use the right hwpmc module 2018-02-05 04:47:13 +00:00
Adrian Chadd
ea3c60a1e5 [ar71xx] New world order - don't reference ath_pci here, it's a module now 2018-02-05 04:46:36 +00:00
Vladimir Kondratyev
74a53bd131 psm(4): Fix panic occuring soon after PS/2 packet has been rejected by
synaptics or elantech sanity checker.

After packet has been rejected contents of packet buffer is not cleared
with setting of inputbytes counter to 0. So when this packet buffer is
filled again being an element of circular queue, new data appends to old
data rather than overwrites it. This leads to packet buffer overflow
after 10 rounds.

Fix it with setting of packet's inputbytes counter to 0 after rejection.

While here add extra logging of rejected packets.

PR:		222667 (for reference)
Reported by:	Neel Chauhan <neel@neelc.org>
Tested by:	Neel Chauhan <neel@neelc.org>
MFC after:	1 week
2018-02-04 23:01:48 +00:00
Justin Hibbits
b0d3bb2613 Only look for L2 cache controllers for mpc85xx_cache
The L3 cache controller (Corenet Platform Cache) is listed with one of its
compatible strings as "cache", which this driver can't attach to.  Restrict
to a known list of primary cache controller strings, as found in the l2cache
devicetree binding.
2018-02-04 20:07:08 +00:00
Justin Hibbits
ce2d51972f Start building modules for MPC85XX and MPC85XXSPE
These kernels aren't restricted to development boards anymore, they are
closer in behavior to GENERIC, so build modules.
2018-02-04 15:40:48 +00:00
Justin Hibbits
2c26c98c89 Add sdhci to MPC85XX build 2018-02-04 15:39:15 +00:00
Justin Hibbits
77baa2256d Minimal changes for MPR to build on architectures with physical addresses larger than virtual
Summary:
Some architectures use large (36-bit) physical addresses, with smaller
virtual addresses.  Casting between vm_paddr_t (or bus_addr_t) and void * is
considered illegal, so cast through uintptr_t.  No functional change on existing
platforms.

Reviewed By:	scottl
Differential Revision:	https://reviews.freebsd.org/D14042
2018-02-04 15:37:58 +00:00
Alan Somers
f5b4099e6b geom: don't write stack garbage in disk labels
Most consumers of g_metadata_store were passing in partially unallocated
memory, resulting in stack garbage being written to disk labels. Fix them by
zeroing the memory first.

gvirstor repeated the same mistake, but in the kernel.

Also, glabel's label contained a fixed-size string that wasn't
initialized to zero.

PR:		222077
Reported by:	Maxim Khitrov <max@mxcrypt.com>
Reviewed by:	cem
MFC after:	3 weeks
X-MFC-With:	323314
X-MFC-With:	323338
Differential Revision:	https://reviews.freebsd.org/D14164
2018-02-04 14:49:55 +00:00
Steve Wills
aa3c83c3c6 Create GENERIC64-NODEBUG for powerpc64
Approved by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D14192
2018-02-04 14:27:12 +00:00
Adrian Chadd
c9f70b7b88 [arswitch] fix up issues on the AR8327.
This correctly dumps the ethernet bridge contents on an AR8327 switch.

Tested:

* AP135 - QCA9550 + AR8327 ethernet switch:

# etherswitchcfg atu dump
 [0] c0:3f:d5:7e:6f:45: portmask 0x00000004
 [1] f6:b6:03:96:1e:ba: portmask 0x00000004
 [2] 00:03:7f:11:38:4f: portmask 0x00000040
# arp -na
? (192.168.3.170) at 00:03:7f:11:38:4f on arge0 permanent [ethernet]
? (192.168.3.12) at c0:3f:d5:7e:6f:45 on arge0 expires in 1188 seconds [ethernet]
? (192.168.3.1) at f6:b6:03:96:1e:ba on arge0 expires in 1186 seconds [ethernet]
2018-02-04 08:22:11 +00:00
Hans Petter Selasky
c32d1cce9d Add new USB ID.
PR:		225641
Submitted by:	Ryan <ryanwinter@outlook.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-03 09:43:32 +00:00
Xin LI
90a48fba23 After r328426, g_label depends on UFS (option FFS) code to read UFS
superblock, and the kernel will fail to link when UFS is not built
in.  This commit makes it depend on a small portion of FFS bits and
thereby fixes build for this situation.

This is intended as an interim bandaid, and the actual superblock
reading code should probably be made independent of UFS, so we do
not need to depend on it (see kib@'s comment in the review for
details), and we will revisit this once the superblock check hashes
are all in place.

Differential Revision:	https://reviews.freebsd.org/D14092
2018-02-03 09:15:13 +00:00
Adrian Chadd
65d59686e1 [arswitch] add initial functionality for AR8327 ATU management.
* Add the bulk of the ATU table read function
* Correct how the ATU function and WAIT bits work

TODO:

* more testing, figure out how the multi-vlan table stuff works and push that
  up to userspace
2018-02-03 00:59:08 +00:00
Adrian Chadd
84a5558c38 [arswitch] Stub out the ATU table dump in AR9340 switches until I implement
this.
2018-02-02 22:08:03 +00:00
Adrian Chadd
62042c979d [arswitch] begin tidying up the learning and ATU management, introduce ATU APIs.
* Refactor the initial learning configuration (port learning, address expiry,
  handling address moving between ports, etc, etc) into a separate HAL routine
* and ensure that it's consistent between switch chips - the AR8216,8316,724x,9331
  SoCs all share the same switch code.
* .. the AR8327 needs doing - the defaults seem OK for now
* .. the AR9340 is different but it's also programmed now.

* Add support for flushing a single port worth of ATU entries
* Add support for fetching the ATU table from AR8216 and derived chips

Tested:

* AR9344, Carambola 2

TODO:

* Further testing on other chips
* Add AR9340 support
* Add AR8327 support
2018-02-02 22:05:36 +00:00
Brooks Davis
0fd25723bc Add kern.ipc.{msqids,semsegs,sema} sysctls for FreeBSD32.
Stop leaking kernel pointers though theses sysctls and make sure that the
padding in the structures is zeroed on allocation to avoid other leaks.

Reviewed by:	gordon, kib
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D13459
2018-02-02 18:03:12 +00:00
Andriy Gapon
6c1e03251e ZFS ARC: restore illumos uses of 'needfree' that were removed in r325851
This is purely a cosmetic change to have a more complete copy of
ifdef-ed out illumos code.

MFC after:	1 week
2018-02-02 12:57:33 +00:00
Hans Petter Selasky
64282a1274 Slightly bump the maximum OID path for loading tunable SYSCTLs.
Coming updates to the mlx5en(4) driver will require this.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-02 12:42:46 +00:00
Konstantin Belousov
938cdc4264 On pageout, in vnode generic pager, for partially dirty page, only
clear dirty bits for completely invalid blocks.

Otherwise we might not write out the last chunk that is shorter than
512 bytes, if the file end is not aligned on disk block boundary.
This become important after the r324794.

PR:	225586
Reported by:	tris_vern@hotmail.com
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-02-02 11:56:30 +00:00
Andrey V. Elsukov
883cd89b05 Merge r1.120 from NetBSD:
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
  not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
  on an mbuf that was already freed.

Reported by:	Maxime Villard <max at m00nbsd dot net>
MFC after:	3 days
2018-02-02 07:39:34 +00:00
Steve Wills
e1782bae5f Correct longjmp
Reviewed by:	nwhitehorn
Differential Revision:	https://reviews.freebsd.org/D14159
2018-02-02 02:28:25 +00:00
Adrian Chadd
877d73ecb4 [etherswitch] add the first pass of a simple API to flush and fetch the L2 address table from the ethernet switch.
This stuff may be a bit fluid during this -HEAD cycle as various other
switch features are added, but the current stuff is enough to drive
initial development and features on the atheros range of integrated
and external switches.

* add a method to flush the whole address table;
* add a method to flush all addresses on a given port;
* add a method to download the address table;
* .. and then a method to fetch entries from the address table.

The table fetch/read methods pass through to the drivers for now since
the drivers may implement different ways of fetching/caching the address
table data.  The atheros devices for example fetch the table by
iterating over the table through a set of registers and so you need
to keep that locked whilst you iterate otherwise you may have the table
flushed half way by a port status change.

This is a no-op until the userland and arswitch code shows up.
2018-02-02 02:05:14 +00:00
Adrian Chadd
a597af9415 [atheros] Update QCA953x support to use the new hints. 2018-02-01 22:01:53 +00:00
Adrian Chadd
1246f4fe1c [atheros] Fix DIR-825C1 to use the new hints.
Tested:

* DIR-825C1
2018-02-01 22:01:11 +00:00
Adrian Chadd
2786bc9951 [atheros] teach these two boards about the new hints location as well. 2018-02-01 22:00:38 +00:00
Adrian Chadd
118c9d516e [atheros] Teach the QCA955x SoC code about the new hints stuff. 2018-02-01 22:00:05 +00:00
Adrian Chadd
7f1a46e2e8 [atheros] Fix-up the base address stuff after I did a drive-by with the calibration data location.
The old way required the data to be present really early and copied it from
memory mapped NOR flash; this only worked during kernel boot but not for
ath/ath_hal modules.

Tested:

* AR9331, Carambola2, ath/hal modules.
2018-02-01 21:58:52 +00:00
Hans Petter Selasky
f71d0b0da7 Fix some recent regressions after r328436 in the LinuxKPI:
1) The OPW() function macro should have the same return type like the
function it executes.
2) The DEVFS I/O-limit should be enforced for all character device reads
and writes.
3) The character device file handle should be passable, same as for
DEVFS based file handles.

Reported by:	jbeich @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-01 19:57:21 +00:00
Hans Petter Selasky
3f3735db30 Make sure the LinuxKPI's internal ERESTARTSYS error code gets translated
into ERESTART for mmap and page fault calls aswell.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-02-01 17:32:45 +00:00
Andrew Turner
c81131d9ec Disable EARLY_PRINTF from the Armada 3700 uart, it breaks when we want
to use EARLY_PRINTF on other SoCs.

Sponsored by:	DARPA, AFRL
2018-02-01 15:05:17 +00:00
Andrew Turner
faa3fd222a Only promote userspace mappings to superpages. This was dropped in r328510,
however due to the break-before-make requirement on arm64 is is currently
unsafe to promote kernel pages.

Sponsored by:	DARPA, AFRL
2018-02-01 14:26:26 +00:00
Kristof Provost
c201b5644d pf: Avoid warning without INVARIANTS
When INVARIANTS is not set the 'last' variable is not used, which can generate
compiler warnings.
If this invariant is ever violated it'd result in a KASSERT failure in
refcount_release(), so this one is not strictly required.
2018-02-01 07:52:06 +00:00
Nathan Whitehorn
619282986d Change the default MSR values used when starting userland and kernel
threads from compile-time defines to global variables. This removes a
significant amount of duplicated runtime patches to the compile-time
defines, centralizing the conditional logic in the early startup code.

Reviewed by:	jhibbits
2018-02-01 05:31:24 +00:00
Nathan Whitehorn
564ac41556 Fix build on 32-bit PowerPC, broken in r328537. 2018-02-01 05:28:02 +00:00
Kirk McKusick
5d84ae8b49 Null out journal softc pointer earlier to avoid a segment fault
that can otherwise occur.

PR:           221804
Submitted by: Andreas Longwitz <longwitz at incore.de>
MFC after:    1 week
2018-01-31 23:30:49 +00:00
Kirk McKusick
0d37a428f0 When reading a cylinder group, break out reporting of check hash errors
from other types of errors so that the error is correctly reported.
2018-01-31 23:13:37 +00:00
Kirk McKusick
5a35a04255 One of the vnode fields listed by vn_printf is the union of pointers
whose type depends on the type of vnode. Correct vn_printf so that
it correctly identifies the name of the pointer that it is printing.

Submitted by: Andreas Longwitz <longwitz at incore.de>
MFC after: 1 week
2018-01-31 22:49:50 +00:00
Vladimir Kondratyev
f20ad0fa5b psm: Add a kludge to support 0x46 identity middle byte Synaptics touchpads
Most synaptics touchpads return 0x47 in middle byte in responce to identify
command as stated in p.4.4 of "Synaptics PS/2 TouchPad Interfacing Guide".
But some devices e.g. found on HP EliteBook 9470m return 0x46 here.
Allow them to be identified as Synaptics as well as 0x47.
ExtendedQueries return incorrect data on such a touchpads so we ignore
their result and set conservative defaults.

PR:		222667
Reported by:	Neel Chauhan <neel@neelc.org>
Tested by:	Neel Chauhan <neel@neelc.org>
Approved by:	gonzo
2018-01-31 22:17:52 +00:00
Vladimir Kondratyev
f451e00544 psm(4): Reduce psm watchdog verbosity
Modern touchpads do not issue interrupts on inactivity so "lost interrupt"
message became annoying spam nowadays. This change quiets the message
if debug.psm.loglevel=5 (or less) is set in /boot/loader.conf

Approved by:	gonzo
2018-01-31 21:46:37 +00:00
Vladimir Kondratyev
7d1460a4b1 psm(4): Add support for HP EliteBook 1040 ForcePads.
ForcePads do not have any physical buttons, instead they detect click
based on finger pressure. Forcepads erroneously report button click
if there are 2 or more fingers on the touchpad breaking multifinger
gestures. To workaround this start reporting a click only after
4 consecutive single touch packets has been received. Skip these packets
in case more contacts appear.

PR:		223369
Reported by:	Neel Chauhan <neel@neelc.org>
Tested by:	Neel Chauhan <neel@neelc.org>
Reviewed by:	gonzo
Approved by:	gonzo
2018-01-31 21:14:59 +00:00
John Baldwin
ec56d65061 Consistently use 16-byte alignment for MIPS N32 and N64.
- Add a new <machine/abi.h> header to hold constants shared between C
  and assembly such as CALLFRAME_SZ.
- Add a new STACK_ALIGN constant to <machine/abi.h> and use it to
  replace hardcoded constants in the kernel and makecontext().  As a
  result of this, ensure the stack pointer on N32 and N64 is 16-byte
  aligned for N32 and N64 after exec(), after pthread_create(), and
  when sending signals rather than 8-byte aligned.

Reviewed by:	jmallett
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D13875
2018-01-31 17:36:39 +00:00
Konstantin Belousov
f7f14d9dea When switching IBRS on, also enable STIBP (Single Thread Indirect
Branch Predictors) mitigation.

DOcument 336996-001 promises that CPUs which implement IBRS but not
STIBP silently ignore setting of the bit instead of trapping.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-31 16:56:02 +00:00
Konstantin Belousov
b31b965e7c Expand IBRS TLA in sysctl help lines.
Requested by:	bz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-31 16:54:05 +00:00
Andriy Gapon
1fc46a9ba0 zfs_rezget: drop cached pages before doing anything else
We did that in the case of success to prevent the use of stale cached
data, but it makes even less sense to keep the cached data when we fail.

Ideally, we should call vgone() on the vnode in the case of zfs_rezget
failure, but the current lock order prevents us from doing that.

The change also rearranges the order of unlinked check and the size
change check.

While there, add missing SET_ERROR in one of the error paths.

MFC after:	2 weeks
2018-01-31 14:44:51 +00:00
Konstantin Belousov
319117fd57 IBRS support, AKA Spectre hardware mitigation.
It is coded according to the Intel document 336996-001, reading of the
patches posted on lkml, and some additional consultations with Intel.

For existing processors, you need a microcode update which adds IBRS
CPU features, and to manually enable it by setting the tunable/sysctl
hw.ibrs_disable to 0.  Current status can be checked in sysctl
hw.ibrs_active.  The mitigation might be inactive if the CPU feature
is not patched in, or if CPU reports that IBRS use is not required, by
IA32_ARCH_CAP_IBRS_ALL bit.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14029
2018-01-31 14:36:27 +00:00
Konstantin Belousov
3b5319325e Do not enable PTI when IA32_ARCH_CAP_RDCL_NO bit is set.
Intel document 336996-001 claims that this will be the way to inform
about Meltdown correction.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-31 14:25:42 +00:00
Hans Petter Selasky
cb57d1dd30 Properly implement the cond_resched() function macro in the LinuxKPI.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-01-31 13:40:36 +00:00
Andriy Gapon
6a8b7aa424 vmm/svm: post LAPIC interrupts using event injection, not virtual interrupts
The virtual interrupt method uses V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR
fields of VMCB to inject a virtual interrupt into a guest VM.  This
method has many advantages over the direct event injection as it
offloads all decisions of whether and when the interrupt can be
delivered to the guest.  But with a purely software emulated vAPIC the
advantage is also a problem.  The problem is that the hypervisor does
not have any precise control over when the interrupt is actually
delivered to the guest (or a notification about that).  Because of that
the hypervisor cannot update the interrupt vector in IRR and ISR in the
same way as real hardware would.  The hypervisor becomes aware that the
interrupt is being serviced only upon the first VMEXIT after the
interrupt is delivered.  This creates a window between the actual
interrupt delivery and the update of IRR and ISR.  That means that IRR
and ISR might not be correctly set up to the point of the
end-of-interrupt signal.

The described deviation has been observed to cause an interrupt loss in
the following scenario.  vCPU0 posts an inter-processor interrupt to
vCPU1.  The interrupt is injected as a virtual interrupt by the
hypervisor.  The interrupt is delivered to a guest and an interrupt
handler is invoked.  The handler performs a requested action and
acknowledges the request by modifying a global variable.  So far, there
is no VMEXIT and the hypervisor is unaware of the events.  Then, vCPU0
notices the acknowledgment and sends another IPI with the same vector.
The IPI gets collapsed into the previous IPI in the IRR of vCPU1.  Only
after that a VMEXIT of vCPU1 occurs.  At that time the vector is cleared
in the IRR and is set in the ISR.  vCPU1 has vAPIC state as if the
second IPI has never been sent.
The scenario is impossible on the real hardware because IRR and ISR are
updated just before the interrupt handler gets started.

I saw several possibilities of fixing the problem.  One is to intercept
the virtual interrupt delivery to update IRR and ISR at the right
moment.  The other is to deliver the LAPIC interrupts using the event
injection, same as legacy interrupts.  I opted to use the latter
approach for several reasons.  It's equivalent to what VMM/Intel does
(in !VMX case).  It appears to be what VirtualBox and KVM do.  The code
is already there (to support legacy interrupts).

Another possibility was to use a special intermediate state for a vector
after it is injected using a virtual interrupt and before it is known
whether it was accepted or is still pending.
That approach was implemented in https://reviews.freebsd.org/D13828
That method is more complex and does not have any clear advantage.

Please see sections 15.20 and 15.21.4 of "AMD64 Architecture
Programmer's Manual Volume 2: System Programming" (publication 24593,
revision 3.29) for comparison between event injection and virtual
interrupt injection.

PR:		215972
Reported by:	ajschot@hotmail.com, grehan
Tested by:	anish, grehan,  Nils Beyer <nbe@renzel.net>
Reviewed by:	anish, grehan
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D13780
2018-01-31 11:14:26 +00:00
Adrian Chadd
cadf7a004a [arswitch] Fix ATU programming on the AR8327 switch.
Doing a flush actually requires setting this bit.
2018-01-31 07:37:33 +00:00
Adrian Chadd
2c6ceccade [arswitch] Fix ATU flushing on AR8216/AR8316 and most of the later chips.
The switch hardware requires this bit to be set in order to kick start the
actual ATU update.  This was being masked on some chips by the learning
programming (what to do when a MAC address moves, hash table collision, etc)
which is currently inconsistent between chips.

Tested:

* AR9344 SoC (AR7240 style switch internal)
2018-01-31 07:36:51 +00:00
Adrian Chadd
676e92f22d [arswitch] add a new debug section for upcoming address table management. 2018-01-31 07:20:34 +00:00
Wojciech Macek
d32802f0c3 PowerNV: fix compilation on non-NV platforms
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-01-31 06:42:01 +00:00
Alexander Motin
6e76bc32d4 Try to preallocate receive memory early.
We may not have enough contiguous memory later, when NTB connection get
established.  It is quite likely that NTB windows are symmetric and this
allocation remain, but even if not, we will just reallocate it later.

MFC after:	2 weeks
2018-01-31 01:04:36 +00:00
John Baldwin
05d56d83b6 Ensure 'name' is not NULL before passing to strcmp().
This avoids a nested page fault when obtaining a stack trace in DDB if
the address from the first frame does not resolve to a known symbol.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2018-01-30 23:29:27 +00:00
John Baldwin
f17985319d Export tcp_always_keepalive for use by the Chelsio TOM module.
This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).

Reviewed by:	bz, emaste
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14129
2018-01-30 23:01:37 +00:00
Ed Maste
37880089ac makesyscalls: permit a range of syscall numbers for UNIMPL
Some ABIs have large gaps in syscall numbers.  Allow gaps to be filled
as ranges of UNIMPL, with an entry like:

248-1023	AUE_NULL	UNIMPL	unimplemented

Reviewed by:	jhb, gnn
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14122
2018-01-30 18:29:38 +00:00
Hans Petter Selasky
1cbc85fd04 Move the mlx5 core device pointer first in the mlx5en priv. This help simplify
checks to recognize own network devices when using mlx5ib. This patch fixes
an issues where mlx5ib fails to recognize mceX network devices for use with
RoCE.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-01-30 12:38:06 +00:00
Edward Tomasz Napierala
f66c3cfc58 Make the handler routine for the hw.usb.template sysctl trigger the USB
host to reprobe the bus by switching the USB pull up resistors off and
back on.  In other words - when FreeBSD is configured as a USB device,
changing the sysctl will be immediately noticed by the machine it's
connected to.

Reviewed by:	hselasky@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-01-30 10:08:11 +00:00
Michal Meloun
f1824e85ef Use more verbose panic messages.
MFC after: 2 weeks
2018-01-30 04:06:30 +00:00
Michal Meloun
962eb1c03f Revert r328511, it was committed with <patch>.diff instead of <patch>.txt as
commit log.
2018-01-30 04:05:03 +00:00
Fedor Uporov
7c4fa61e6f Fix mistake in case of zeroed inode check.
Reported by:	pho
MFC after:	6 months
2018-01-29 22:15:46 +00:00
Fedor Uporov
c0f16c65cd Add flex_bg/meta_bg features RW support.
Reviewed by:    pfg
MFC after:      6 months

Differential Revision:    https://reviews.freebsd.org/D13964
2018-01-29 21:54:13 +00:00
Bryan Drewery
595109196a Don't use an .OBJDIR for 'make sysent'.
Reported by:	emaste, jhb
Sponsored by:	Dell EMC
2018-01-29 19:14:15 +00:00
Warner Losh
de4f4237bf Do the book-keeping on release before we release the reference. The
periph was going away on final release, and then returning and we
started dancing in free memory.

Sponsored by: Netflix
2018-01-29 18:07:14 +00:00
Benno Rice
68f18f30d1 Remove some duplicated sys/conf/files* entries.
net80211/ieee80211_ageq.c was present twice in sys/conf/files so leave the
correctly sorted one. dev/wpi/if_wpi.c was present in sys/conf/files as well
as sys/conf/files.amd64 and sys/conf/files.i386 so prefer the sys/conf/files
entry.

Reviewed by:	allanjude, rstone
2018-01-29 17:32:30 +00:00
Eric van Gyzen
f8116f391a ND6: Set the correct state for new neighbor cache entries
Restore state 6.  Many of the UNH tests end up exercising this
state, where we have a new neighbor cache entry and a new link-layer
entry is being created for it.  The link-layer address is currently
unknown so the initial state of the "llentry" should remain initialized
to ND6_LLINFO_NOSTATE so that the ND code will send a solicitation.
Setting this to ND6_LLINFO_STALE implies that the link-level entry
is valid and can be used (but needs to be refreshed via the Neighbor
Unreachability state machine).

https://forums.freebsd.org/threads/64287/

Submitted by:	Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by:	mjoras, dab, ae
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14059
2018-01-29 16:12:26 +00:00
Andrey V. Elsukov
2164def67c Do not skip scope zone violation check, when mbuf has M_FASTFWD_OURS flag.
When mbuf has M_FASTFWD_OURS flag, this means that a destination address
is our local, but we still need to pass scope zone violation check,
because protocol level expects that IPv6 link-local addresses have
embedded scope zone indexes. This should fix the problem, when ipfw is
used to forward packets to local address and source address of a packet
is IPv6 LLA.

Reported by:	sbruno
MFC after:	3 weeks
2018-01-29 11:03:29 +00:00
Andrey V. Elsukov
efc284cb12 Assign IPv6 link-local address to loopback interfaces whith unit > 0.
When an interface has IFF_LOOPBACK flag in6_ifattach() tries to assing
IPv6 loopback address to this interface. It uses in6ifa_ifpwithaddr()
to check, that interface doesn't already have given address and then
uses in6_ifattach_loopback(). If in6_ifattach_loopback() fails, it just
exits and thus skips assignment of IPv6 LLA.
Fix this using in6ifa_ifwithaddr() function. If IPv6 loopback address is
already assigned in the system, do not call in6_ifattach_loopback().

PR:		138678
MFC after:	3 weeks
2018-01-29 10:33:55 +00:00
Wojciech Macek
70bb600a0a PowerNV: move LPCR and LPID altering to cpudep_ap_early_bootstrap
It turns out that under some circumstances we can get DSI or DSE before we set
LPCR and LPID so we should set it as early as possible.

Authored by:           Patryk Duda <pdk@semihalf.com>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
2018-01-29 09:27:02 +00:00
Wojciech Macek
f0393bbf34 PPC64: use hwref instead of cpuid
On CHRP and PowerNV, use the interrupt server number in the cpuref and pcpu
hwref field instead of the device-tree phandle and make the CPU IDs reported
to the scheduler dense and with the BSP at 0.

Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14011
2018-01-29 09:15:38 +00:00
Wojciech Macek
b74fb1e713 PPC64: cleanup APs startup routines
Cleaning up AP startup routines. This is a mix of changes
required to make PowerNV running and to modify the code
to be more robust. Previously, some races were seen if more
than 90CPUs were online.

Authored by:           Patryk Duda <pdk@semihalf.com>
Submitted by:          Wojciech Macek <wma@semihalf.com>
Obtained from:         Semihalf
Sponsored by:          IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14026
2018-01-29 08:10:03 +00:00
Nathan Whitehorn
eb1baf72ae Remove hard-coded trap-handling logic involving the segmented memory model
used with hashed page tables on AIM and place it into a new, modular pmap
function called pmap_decode_kernel_ptr(). This function is the inverse
of pmap_map_user_ptr(). With POWER9 radix tables, which mapping to use
becomes more complex than just AIM/BOOKE and it is best to have it in
the same place as pmap_map_user_ptr().

Reviewed by:	jhibbits
2018-01-29 04:33:41 +00:00
Li-Wen Hsu
156531d65a Fix kernel build after r328523, correct variable names
Reviewed by:	delphij
Differential Revision:	https://reviews.freebsd.org/D14105
2018-01-29 04:04:52 +00:00
Ian Lepore
2480071fbc Add a NO_GETMAXLUN quirk for the JMicron JMS567 USB to SATA bridge, to
prevent lengthy timeout pauses while probing/attaching drives.
2018-01-29 03:24:02 +00:00
Li-Wen Hsu
99178c2f09 Fix LINT build after r328508, add forgotten part in format string
Reviewed by:	delphij
Differential Revision:	https://reviews.freebsd.org/D14089
2018-01-29 02:29:08 +00:00
Ed Maste
48bc159f28 Correct MD patch in linux64 module Makefile
Reviewed by:	imp
Sponsored by:	Turing Robotic Industries Inc.
Differential Revision:	https://reviews.freebsd.org/D14061
2018-01-29 01:59:04 +00:00
Warner Losh
d6b6639713 Add ISA PNP tables to ISA drivers. Fix a few incidental comments.
ACPI ISA PBP tables not tagged, there's bigger issues with them.
2018-01-29 00:22:30 +00:00
Warner Losh
2ced565834 Tag the current round of deprecated drivers.
Differential Revision: https://reviews.freebsd.org/D13818
2018-01-29 00:14:46 +00:00
Warner Losh
7faed6e3e9 Create deprecation management functions.
gone_in(majar, msg);	If we're running in FreeBSD major, tell
			the user this code may be deleted soon.
			If we're running in FreeBSD major - 1,
			the the user is deprecated and will
			be gone in major.
			Otherwise say nothing.

gone_in_dev(dev, major, msg) Just like gone_in, except use device_printf.

New tunable / sysctl debug.oboslete_panic: 0 - don't panic,
	1 - panic in major or newer , 2 - panic in major - 1 or newer
	default: 0

if NO_OBSOLETE_CODE is defined, then both of these turn into compile
time errors when building for major. Add options NO_OBSOLETE_CODE to
kernel build system.

This lets us tag code that's going away so users know it will be gone,
as well as automatically manage things.

Differential Review: https://reviews.freebsd.org/D13818
2018-01-29 00:14:39 +00:00
Warner Losh
29077eb456 Use atomic load and stores to ensure that the compiler doesn't
optimize away these loops. Change boolean to int to match what atomic
API supplies. Remove wmb() since the atomic_store_rel() on status.done
ensure the prior writes to status. It also fixes the fact that there
wasn't a rmb() before reading done. This should also be more efficient
since wmb() is fairly heavy weight.

Sponsored by: Netflix
Reviewed by: kib@, jim harris
Differential Revision: https://reviews.freebsd.org/D14053
2018-01-29 00:00:52 +00:00
Warner Losh
8cc3cfa86d Out of an abundance of caution, NUL out the first byte in the PNP
info.
2018-01-28 23:58:22 +00:00
Nathan Whitehorn
21776ff850 Remove some unused AIM register declarations that existed to support some
CPUs we have never run on. As a side-effect, removes some #ifdef AIM/#else.
2018-01-28 21:30:57 +00:00
Justin Hibbits
0a3ef103a3 Start building modules for QORIQ64
There's no reason not to build modules for 64-bit QorIQ devices.  This
config has evolved to be analogous to the AIM GENERIC64 kernel, so will grow
to match it in more ways as well.
2018-01-28 20:35:48 +00:00
Justin Hibbits
a72b951348 Consolidate trap instruction checks to a single function
Summary:
Rather than duplicating the checks for programmatic traps all over the code, put
it all in one function.  This helps to remove some of the #ifdefs between AIM
and Book-E.

Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D14082
2018-01-28 19:18:40 +00:00
Alexander Motin
a5232cc4fb Assume Always Running APIC Timer for AMD CPU families >= 0x12.
Fallback to HPET may cause locks congestions on many-core systems.
This change replicates Linux behavior.

MFC after:	1 month
2018-01-28 18:18:03 +00:00
Michal Meloun
2b4c1a7ffc Remove #endif forgotten in r328510.
Pointy hat: mmel
2018-01-28 15:33:32 +00:00
Michal Meloun
6403138990 diff --git a/sys/dev/extres/clk/clk.c b/sys/dev/extres/clk/clk.c
index c6a1f466ceb..c3708a0ce27 100644
--- a/sys/dev/extres/clk/clk.c
+++ b/sys/dev/extres/clk/clk.c
@@ -642,10 +642,11 @@ clknode_adjust_parent(struct clknode *clknode, int idx)
 	if (clknode->parent_cnt == 0)
 		return;
 	if ((idx == CLKNODE_IDX_NONE) || (idx >= clknode->parent_cnt))
-		panic("Invalid clock parent index\n");
+		panic("%s: Invalid parent index %d for clock %s",
+		    __func__, idx, clknode->name);

 	if (clknode->parents[idx] == NULL)
-		panic("%s: Attempt to set invalid parent %d for clock %s",
+		panic("%s: Invalid parent index %d for clock %s",
 		    __func__, idx, clknode->name);

 	/* Remove me from old children list. */
@@ -674,8 +675,8 @@ clknode_init_parent_idx(struct clknode *clknode, int idx)
 	if ((idx == CLKNODE_IDX_NONE) ||
 	    (idx >= clknode->parent_cnt) ||
 	    (clknode->parent_names[idx] == NULL))
-		panic("%s: Invalid clock parent index: %d\n", __func__, idx);
-
+		panic("%s: Invalid parent index %d for clock %s",
+		    __func__, idx, clknode->name);
 	clknode->parent_idx = idx;
 }
2018-01-28 15:20:45 +00:00
Michal Meloun
89b090f1e6 Fix handling of I-cache sync operations
- pmap_enter_object() can be used for mapping of executable pages, so it's
  necessary to handle I-cache synchronization within it.

- Fix race in I-cache synchronization in pmap_enter(). The current code firstly
  maps given page to target VA and then do I-cache sync on it. This causes
  race, because this mapping become visible to other threads, before I-cache
  is synced.
  Do sync I-cache firstly (by using DMAP VA) and then map it to target VA.

- ARM64 ARM permits implementation of aliased (AIVIVT, VIPT) I-cache, but we
  can use different that final VA for flushing it. So we should use full
  I-cache flush on affected platforms. For now, and as temporary solution,
  use full flush always.
2018-01-28 15:02:49 +00:00
Warner Losh
55cf33a584 Add the DF_SUSPENDED flag to flags that are printed. 2018-01-28 05:13:17 +00:00