Commit Graph

127451 Commits

Author SHA1 Message Date
jhb
43e4187c13 Reliably enable debug exceptions on all CPUs.
Previously, debug exceptions were only enabled on the boot CPU if
DDB was enabled in the dbg_monitor_init() function.  APs also called
this function, but since mp_machdep.c doesn't include opt_ddb.h, the
APs ended up calling an empty stub defined in <machine/debug_monitor.h>
instead of the real function.  Also, if DDB was not enabled in the kernel,
the boot CPU would not enable debug exceptions.

Fix this by adding a new dbg_init() function that always clears the OS
lock to enable debug exceptions which the boot CPU and the APs call.
This function also calls dbg_monitor_init() to enable hardware breakpoints
from DDB on all CPUs if DDB is enabled.  Eventually base support for
hardware breakpoints/watchpoints will need to move out of the DDB-only
debug_monitor.c for use by userland debuggers.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D12001
2017-08-12 18:42:54 +00:00
jhb
6e837ed7a2 Don't panic for PT_GETFPREGS.
Only fetch the VFP state from the CPU if the thread whose registers are
being requested is the current thread.  If a stopped thread's registers
are being fetched by a debugger, the saved state in the PCB is already
valid.

Reviewed by:	andrew
MFC after:	1 week
2017-08-12 18:38:18 +00:00
ian
3c3613b042 Bid for the device with BUS_PROBE_GENERIC, because this is very much a
generic driver with minimal feature support for a large number of chips.
More featureful per-chip drivers might exist (especially out-of-tree) and
those should win the bidding even if they use BUS_PROBE_DEFAULT.
2017-08-12 17:39:32 +00:00
np
2506c0989c cxgbe(4): Save the last reported link parameters and compare them with
the current state to determine whether to generate a link-state change
notification.  This fixes a bug introduced in r321063 that caused the
driver to sometimes skip these notifications.

Reported by:	Jason Eggleston @ LLNW
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-12 14:02:19 +00:00
jhb
1cab2038a4 Fix a typo. 2017-08-11 22:47:32 +00:00
emaste
903ddb525d arm: enable ARM_MANY_BOARD in NOTES for LINT build
Added in r238189, ARM_MANY_BOARD adds support for multiple ARM boards in
a single kernel. Include it for LINT builds to avoid duplicate symbol
errors when linking with lld.

Sponsored by:	The FreeBSD Foundation
2017-08-11 19:49:29 +00:00
markj
b1551b3108 Bump KERNELDUMP_BUFFER_SIZE to 4096.
The encrypted kernel dump code writes data in blocks of this size. A buffer
size of 4096 allows encrypted dumps to work with 4Kn drives.

Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11870
2017-08-11 19:24:08 +00:00
ian
7ba2756fac Stop calling atrtc_set() from the xen timer clock_settime() method. That
removes the only reference to atrtc_set() from outside of atrtc.c, so make
it static.

The xen timer driver registers as a realtime clock with 1us resolution.  In
the past that resulted in only the xen timer's clock_settime() getting
called, so it would call atrtc_set() to set the hardware clock as well.  As
of r32090, the clock_settime() method of all registered realtime clocks gets
called, so the xen driver no longer needs to chain-call the lower-resolution
driver.

Thanks to royger@ for talking me through the xen stuff, and for testing.
2017-08-11 19:02:11 +00:00
emaste
237b352c0a Rename at91_pmc's M_PMC malloc type to avoid duplicate definition
M_PMC is defined in sys/dev/hwpmc/hwpmc_mod.c, and the LINT kernel build
fails when linking with lld due to a duplicate symbol error.

Sponsored by:	The FreeBSD Foundation
2017-08-11 18:09:26 +00:00
davidcs
516e23c3c5 Performance enhancements to reduce CPU utililization for large number of
TCP connections (order of tens of thousands), with predominantly Transmits.

Choice to perform receive operations either in IThread or Taskqueue Thread.

Submitted by:Vaishali.Kulkarni@cavium.com
MFC after:5 days
2017-08-11 17:43:25 +00:00
rlibby
585859838d x86/crc32_sse42.c: quiet unused function warning
Reviewed by:	cem
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11980
2017-08-11 17:05:31 +00:00
markj
bce4478d7d Have sendfile_swapin() use vm_page_grab_pages().
Reviewed by:	alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11942
2017-08-11 16:32:24 +00:00
markj
6f4724899b Modify vm_page_grab_pages() to handle VM_ALLOC_NOWAIT.
This will allow its use in sendfile_swapin().

Reviewed by:	alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11942
2017-08-11 16:29:22 +00:00
alc
c85a3e68de An invalid page can't be dirty.
Reviewed by:	kib
MFC after:	1 week
2017-08-11 16:27:54 +00:00
royger
17b1663c87 acpi/srat: fix build without DMAP
Use pmap_mapbios to map memory used to store the cpus array.

Reported by:	lwhsu
X-MFC-with:	r322348
2017-08-11 14:19:55 +00:00
andrew
78ce51bf4f Only return the current cpu if it's in the cpumask. When we restrict the
cpumask it probably means we are unable to sent interrupts to CPUs outside
the map. As such only return the current CPU when it's within the mask
otherwise return the first valid CPU.

This is needed on ThunderX as, in a dual socket configuration, we are
unable to send MSI/MSI-X interrupts between sockets.

Reviewed by:	mmel
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11957
2017-08-11 12:45:58 +00:00
hselasky
1dd6601de7 Make sure the "vm_flags" and "vm_page_prot" fields get set correctly
in the VM area structure in the LinuxKPI when doing mmap() and that
unsupported bits are masked away.

While at it fix some redundant use of parenthesing inside some related
macros.

Found by:	KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-08-11 10:44:40 +00:00
markj
5c221064ba Add a specialized function for DRM drivers to register themselves.
Such drivers attach to a vgapci bus rather than directly to a pci bus. For
the rest of the LinuxKPI to work correctly in this case, we override the
vgapci bus' ivars with those of the grandparent.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11932
2017-08-11 03:59:48 +00:00
markj
865cb1c60b Micro-optimize kmem_unback().
We can remove some unnecessary object radix tree lookups by using the
object memq to iterate over pages in the specified range. This does not,
however, eliminate the lookup needed in vm_page_free_toq() to remove each
tree entry.

Reviewed by:	alc, kib (previous revision)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11945
2017-08-11 03:09:11 +00:00
markj
a65dc108f4 Make vm_page_sunbusy() assert that the page is unlocked.
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11946
2017-08-10 22:43:38 +00:00
ian
ad5484d5d0 Ensure the clocks driver is attached before any drivers that need to enable
clocks in their attach().
2017-08-10 19:42:30 +00:00
royger
89ad37fb84 mptable: fix i386 build failure
Reported by:	emaste
X-MFC-with:	r322347
2017-08-10 17:46:57 +00:00
ken
4a13a1ae21 Changes to make mps(4) and mpr(4) handle reinit with reallocation.
When the mps(4) and mpr(4) drivers need to reinitialize the
firmware, they sometimes need to reallocate all of the memory
allocated by the driver.  The reallocation happens whenever the IOC
Facts change.  That should only happen after a firmware upgrade.

If the reinitialization happens as a result of a timed out command
sent to the card, the command that timed out and triggered the
reinit may have been freed if iocfacts_allocate() reallocated all
memory.  If the caller attempts to access the command after that,
the kernel will panic because the caller will be dereferencing
freed memory.

The solution is to set a flag in the softc when we reallocate,
and avoid dereferencing the command strucure if we've reallocated.

The changes are largely the same in both drivers, since mpr(4) is a
derivative of mps(4).

 o In iocfacts_allocate(), if the IOC Facts have changed and we
   need to reallocate, set the REALLOCATED flag in the softc.

 o Change wait_command() to take a struct mps_command ** instead of
   a struct mps_command *.  This allows us to NULL out the caller's
   command pointer if we have to reinit the controller and the data
   structures get reallocated.  (The REALLOCATED flag will be set
   in the softc if that has happened.)

 o In every place that calls wait_command(), make sure we handle
   the case where the command is NULL after the call.

 o The mpr(4) driver has mpr_request_polled() which can also
   reinitialize the card.  Also check for reallocation there.

Reviewed by:	scottl, slm
MFC after:	1 week
Sponsored by:	Spectra Logic
2017-08-10 14:59:17 +00:00
sbruno
df0fbb5166 Purge deprecated locking macros.
Submitted by:	Matt Macy <matt@mattmacy.io>
Sponsored by:	Limelight Networks
2017-08-10 14:54:36 +00:00
br
b002bfbade Support for v1.10 (latest) of RISC-V privilege specification.
New version is not compatible on supervisor mode with v1.9.1
(previous version).

Highlights:
    o BBL (Berkeley Boot Loader) provides no initial page tables
      anymore allowing us to choose VM, to build page tables manually
      and enable MMU in S-mode.
    o SBI interface changed.
    o GENERIC kernel.
      FDT is now chosen standard for RISC-V hardware description.
      DTB is now provided by Spike (golden model simulator). This
      allows us to introduce GENERIC kernel. However, description
      for console and timer devices is not provided in DTB, so move
      these devices temporary to nexus bus.
    o Supervisor can't access userspace by default. Solution is to
      set SUM (permit Supervisor User Memory access) bit in sstatus
      register.
    o Compressed extension is now turned on by default.
    o External GCC 7.1 compiler used.
    o _gp renamed to __global_pointer$
    o Compiler -march= string is now in use allowing us to choose
      required extensions (compressed, FPU, atomic, etc).

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11800
2017-08-10 14:18:09 +00:00
mw
2fa55a61fb Enable OF_setprop API function to add property in FDT
This patch modifies function ofw_fdt_setprop (called by OF_setprop),
so that it can add property, when replacing is not possible.
Adding property is needed to fixup FDT's that have missing
properties.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11879
2017-08-10 13:45:56 +00:00
hselasky
853c517175 Use integer type to pass around jiffies and/or ticks values in the
LinuxKPI because in FreeBSD ticks are 32-bit.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-08-10 13:05:40 +00:00
hselasky
1ea0a5a734 Fixes for wait event in the LinuxKPI. These are regression issues
after r319757.

1) Correct the return value from __wait_event_common() from 1 to 0 in
case the timeout is specified as MAX_SCHEDULE_TIMEOUT. In the other
case __ret is zero and will be substituted in the last part of the
macro with the appropriate value before return.

2) Make sure the "timeout" argument is casted to "int" before
evaluating negativity. Else the signedness of a "long" might be
checked instead of the signedness of an integer.

3) The wait_event() function should not have a return value.

Found by:	KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-08-10 13:00:10 +00:00
hselasky
d495afeb1e Make sure the linux_wait_event_common() function in the LinuxKPI properly
handles a timeout value of MAX_SCHEDULE_TIMEOUT which basically means there
is no timeout. This is a regression issue after r319757.

While at it change the type of returned variable from "long" to "int" to
match the actual return type.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-08-10 12:51:04 +00:00
royger
55936e520f x86: bump MAX_APIC_ID to 512
Introduce a new define to take int account the xAPIC ID limit, for
systems where x2APIC is not available/reliable.

Also change some of the usages of the APIC ID to use an unsigned int
(which is the correct storage type to deal with x2APIC IDs as found in
x2APIC MADT entries).

This allows booting FreeBSD on a box with 256 CPUs and APIC IDs up to
295:

FreeBSD/SMP: Multiprocessor System Detected: 256 CPUs
FreeBSD/SMP: 1 package(s) x 64 core(s) x 4 hardware threads
Package HW ID = 0
	Core HW ID = 0
		CPU0 (BSP): APIC ID: 0
		CPU1 (AP/HT): APIC ID: 1
		CPU2 (AP/HT): APIC ID: 2
		CPU3 (AP/HT): APIC ID: 3
[...]
	Core HW ID = 73
		CPU252 (AP): APIC ID: 292
		CPU253 (AP/HT): APIC ID: 293
		CPU254 (AP/HT): APIC ID: 294
		CPU255 (AP/HT): APIC ID: 295

Submitted by:		kib (previous version)
Relnotes:		yes
MFC after:		1 month
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D11913
2017-08-10 09:16:40 +00:00
royger
e45f85c53f x86: make the arrays that depend on MAX_APIC_ID dynamic
So that MAX_APIC_ID can be bumped without wasting memory.

Note that the usage of MAX_APIC_ID in the SRAT parsing forces the
parser to allocate memory directly from the phys_avail physical memory
array, which is not the best approach probably, but I haven't found
any other way to allocate memory so early in boot. This memory is not
returned to the system afterwards, but at least it's sized according
to the maximum APIC ID found in the MADT table.

Sponsored by:		Citrix Systems R&D
MFC after:		1 month
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D11912
2017-08-10 09:16:03 +00:00
royger
345fb32684 apic_enumerator: only set mp_ncpus and mp_maxid at probe cpus phase
Populate the lapics arrays and call cpu_add/lapic_create in the setup
phase instead. Also store the max APIC ID found in the newly
introduced max_apic_id global variable.

This is a requirement in order to make the static arrays currently
using MAX_LAPIC_ID dynamic.

Sponsored by:		Citrix Systems R&D
MFC after:		1 month
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D11911
2017-08-10 09:15:18 +00:00
sbruno
fa16cef53f Don't leak mbufs if clusers exceeds the number of segments. This would
leak mbufs over time causing crashes.

PR:		221202
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	gergely.czuczy@harmless.hu
Sponsored by:	Limelight Networks
2017-08-10 03:43:23 +00:00
sbruno
3be0299b70 Export IFCAP_HWSTATS so that we don't experience double stats counting
on iflib enabled devices.

PR:		220198
Submitted by:	Matt Macy <matt@mattmacy.io>
Reported by:	Ben Woods <woodsb02@freebsd.org>
Sponsored by:	Limelight Networks
2017-08-10 03:11:05 +00:00
davidcs
3f256bc89a Provide compile to choose receive processing in either Ithread or Taskqueue Thread. 2017-08-09 22:18:49 +00:00
rlibby
214aa8f943 i386/boot2: -fno-asynchronous-unwind-tables for gcc
The amd64 build of boot2 was failing with gcc 6.3.0 due to being more
than 1 kB too large. It was apparently generating a .eh_frame section
which was not being removed by objcopy -S. The .eh_frame section seems
to be mandatory per the amd64 ABI, but boot2 is compiled for i386 (uses
-m32), and therefore should be optional in this context. Suppress
generation of .eh_frame with the -fno-asynchronous-unwind-tables flag to
gcc. This saves 1348 bytes (the limit is 7680 bytes).

Reviewed by:	dim, imp
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11928
2017-08-09 20:13:49 +00:00
ae
f7286cb332 Make user supplied data checks a bit stricter.
key_msg2sp() is used for parsing data from setsockopt(IP[V6]_IPSEC_POLICY)
call. This socket option is usually used to configure IPsec bypass for
socket. Only privileged user can set this socket option.
The message syntax is described here
	http://www.kame.net/newsletter/20021210/

and our libipsec is usually used to create the correct request.
Add additional checks:
* that sadb_x_ipsecrequest_len is not out of bounds of user supplied buffer
* that src/dst's sa_len is the same
* that 2*sa_len is not out of bounds of user supplied buffer
* that 2*sa_len fits into bounds of sadb_x_ipsecrequest

Reported by:	Ilja van Sprundel
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11796
2017-08-09 19:58:38 +00:00
jkim
d871b34dbc Split identify_cpu() into two functions for amd64 as we do for i386. This
reduces diff between amd64 and i386.  Also, it fixes a regression introduced
in r322076, i.e., identify_hypervisor() failed to identify some hypervisors.
This function assumes cpu_feature2 is already initialized.

Reported by:	dexuan
Tested by:	dexuan
2017-08-09 18:09:09 +00:00
glebius
3eeca31b85 Plug uninitialized stack variable leak in sendfile(2).
Reported by:	Ilja Van Sprundel <ivansprundel ioactive.com>
Submitted by:	Domagoj Stolfa <domagoj.stolfa gmail.com>
MFC after:	1 week
Security:	uninitialized stack variable leak
2017-08-09 17:48:38 +00:00
imp
baef6c410b Also provide a warning for geom_fox.
Differential Review: https://reviews.freebsd.org/D11935
Requested by: jhb@
MFC After: 3 days
2017-08-09 16:37:37 +00:00
imp
158640d13a Mark geom classes as deprecated.
geom_bsd, geom_mbr and geom_sunlabel have been obsolete since Marcel
Moolenaar's geom_part was in FreeBSD 7. They haven't been in GENERIC
since FreeBSD 8. Add warning when used.

geom_vol_ffs has been obsolete since ufs support to geom_label was
committed in FreeBSD 5. It hasn't been in GENERIC since FreeBSD 5.
Add warning when used.

geom_fox has been obsolete since gmultipath was committed in FreeBSD 7.
(no warning added, since this is a very obscure class).

These will all be removed in FreeBSD 12.

MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D11935

Note: Classes will be removed after MFC
2017-08-09 16:15:24 +00:00
mav
a253b5b179 Missing remanant of 322309.
MFC after:	1 week
2017-08-09 13:46:16 +00:00
ae
6dd69d60f6 Add to if_enc(4) ability to capture packets via BPF after pfil processing.
New flag 0x4 can be configured in net.enc.[in|out].ipsec_bpf_mask.
When it is set, if_enc(4) additionally captures a packet via BPF after
invoking pfil hook. This may be useful for debugging.

MFC after:	2 weeks
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D11804
2017-08-09 12:24:07 +00:00
mav
6dbc92fb36 Use "Ibex Peak" codename for "5 Series/3400 Series" chipsets.
This is shorter and unifies naming with later chipsets.

MFC after:	1 week
2017-08-09 12:21:17 +00:00
mav
ce15048959 Add new Intel Lewisburg and Union Point chipset PCI IDs.
While there, polish some old AHCI ones, since they are still reused.

MFC after:	1 week
2017-08-09 12:03:12 +00:00
oleg
b81e7a7f7a Fix comment typo. 2017-08-09 10:46:34 +00:00
hselasky
884a24aa18 Print maximum MTU when trying to set invalid MTU in the mlx4en(4) driver.
Useful for debugging.

Submitted by:		Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-08-09 10:32:51 +00:00
hselasky
1786668d0d Increment queue drops in the network statistics when transmitted packets
are dropped by the mlx4en(4) driver.

Submitted by:		Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-08-09 10:30:55 +00:00
hselasky
92b2a382f1 Add support for RX and TX statistics when the mlx4en(4) PCI device
is in VF or SRIOV mode typically in a virtual machine environment.

Submitted by:		Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-08-09 10:27:21 +00:00
mav
1d2ace71bf Do not loose CCB flags after r320493.
There is at least CAM_UNLOCKED that should be kept.

MFC after:	3 days
2017-08-09 09:13:15 +00:00
des
8930206be0 Correct sysctl names. 2017-08-09 07:24:58 +00:00
sephe
4b42c6e8a4 hyperv/hn: Implement transparent mode network VF.
How network VF works with hn(4) on Hyper-V in transparent mode:

- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
  address.
- Once the network VF is attached, the cooresponding hn(4) waits several
  seconds to make sure that the network VF attach routing completes, then:
  o  Set the intersection of the network VF's if_capabilities and the
     cooresponding hn(4)'s if_capabilities to the cooresponding hn(4)'s
     if_capabilities.  And adjust the cooresponding hn(4) if_capable and
     if_hwassist accordingly. (*)
  o  Make sure that the cooresponding hn(4)'s TSO parameters meet the
     constraints posed by both the network VF and the cooresponding hn(4).
     (*)
  o  The network VF's if_input is overridden.  The overriding if_input
     changes the input packet's rcvif to the cooreponding hn(4).  The
     network layers are tricked into thinking that all packets are
     neceived by the cooresponding hn(4).
  o  If the cooresponding hn(4) was brought up, bring up the network VF.
     The transmission dispatched to the cooresponding hn(4) are
     redispatched to the network VF.
  o  Bringing down the cooresponding hn(4) also brings down the network
     VF.
  o  All IOCTLs issued to the cooresponding hn(4) are pass-through'ed to
     the network VF; the cooresponding hn(4) changes its internal state
     if necessary.
  o  The media status of the cooresponding hn(4) solely relies on the
     network VF.
  o  If there are multicast filters on the cooresponding hn(4), allmulti
     will be enabled on the network VF. (**)
- Once the network VF is detached.  Undo all damages did to the
  cooresponding hn(4) in the above item.

NOTE:
No operation should be issued directly to the network VF, if the
network VF transparent mode is enabled.  The network VF transparent mode
can be enabled by setting tunable hw.hn.vf_transparent to 1.  The network
VF transparent mode is _not_ enabled by default, as of this commit.

The benefit of the network VF transparent mode is that the network VF
attachment and detachment are transparent to all network layers; e.g. live
migration detaches and reattaches the network VF.

The major drawbacks of the network VF transparent mode:
- The netmap(4) support is lost, even if the VF supports it.
- ALTQ does not work, since if_start method cannot be properly supported.

(*)
These decisions were made so that things will not be messed up too much
during the transition period.

(**)
This does _not_ need to go through the fancy multicast filter management
stuffs like what vlan(4) has, at least currently:
- As of this write, multicast does not work in Azure.
- As of this write, multicast packets go through the cooresponding hn(4).

MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11803
2017-08-09 05:59:45 +00:00
mckusick
7898ca2150 Since the switch to GPT disk labels, fsck for UFS/FFS has been
unable to automatically find alternate superblocks. This checkin
places the information needed to find alternate superblocks to the
end of the area reserved for the boot block.

Filesystems created with a newfs of this vintage or later will
create the recovery information. If you have a filesystem created
prior to this change and wish to have a recovery block created for
your filesystem, you can do so by running fsck in forground mode
(i.e., do not use the -p or -y options). As it starts, fsck will
ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should
answer yes.

Discussed with: kib, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11589
2017-08-09 05:17:21 +00:00
alc
318304a5b7 Introduce vm_page_grab_pages(), which is intended to replace loops calling
vm_page_grab() on consecutive page indices.  Besides simplifying the code
in the caller, vm_page_grab_pages() allows for batching optimizations.
For example, the current implementation replaces calls to vm_page_lookup()
on consecutive page indices by cheaper calls to vm_page_next().

Reviewed by:	kib, markj
Tested by:	pho (an earlier version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11926
2017-08-09 04:23:04 +00:00
mw
4bd0fb56b8 Update pl310 node in Armada 38x DTS to match the one used in Linux
Since the cache controller nodes fixup is added to the platform code,
this patch aligns it to the Linux device tree representation.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11884
2017-08-09 01:31:05 +00:00
mw
af179cce73 Enable pl310 coherent operation in platform init for Armada 38x
Updating PL310 sotfware context sc_io_coherent field in
platform_pl310_init() routine for Armada 38x helps to avoid
using 'arm,io-coherent' property, which is by default not present
in the device tree node in Linux.

This way another step for DT unification between two operating
systems is done. The improvemnt will also work after enabling
PLATFORM for Marvell ARMv7 SoCs.

Reviewed by: andrew, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11883
2017-08-09 01:25:47 +00:00
mw
bce1dbee51 Remove clock-frequency properties from Armada 38x timer nodes
Since the timers' base frequency setting is added to the platform code,
this patch removes clock-frequency properties from global
and twd timers, aligning both to the Linux device tree.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11882
2017-08-09 01:20:53 +00:00
mw
d29c07b06f Dynamically configure timers' base frequency for Armada 38x
Instead of using 'clock-frequency' device tree property for global/twd
mpcore timers of Armada 38x SoCs, set it in platform_late_init stage
with arm_tmr_change_frequency() function.

Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11881
2017-08-09 01:14:29 +00:00
mw
60a68b05b3 Enable using ofw_bus_find_compatible in early platform code
Before this patch function ofw_bus_find_compatible was using
memory allocations in order to find compatible node and the property's
length. This way there was always a suited buffer for property,
however this approach had also disadvantages - ofw_bus_find_compatible
couldn't be used when malloc is not available, e.g. during fdt fixup stage.

In order to remove the usage limitation of ofw_bus_find_compatible(),
this patch modifies the function to use ofw_bus_node_is_compatible()
(instead of the one without _int suffix), which uses a fixed
buffer on stack instead of dynamic allocations.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11880
2017-08-09 01:06:40 +00:00
mw
c689953ee8 Add support for "compatible" parameter in ofw_fdt_fixup
Sometimes it's convenient to provide fixup to many boards
that use the same SoC family (eg. Marvell Armada 38x).
Instead of putting multiple entries in fdt_fixup_table,
use one entry which refers to all boards with given SoC.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11878
2017-08-09 00:56:29 +00:00
mw
e77679c61c Restore original /soc ranges on Marvell Armada 38x boards
Because fdt_get_ranges can process now multiple 'ranges' entries,
restoring the ranges from original Linux device trees is possible.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11877
2017-08-09 00:51:45 +00:00
mw
6ab1e86ae3 Enable parsing simple-bus 'ranges' with multiple entries
This patch makes possible to boot with up to 8 ranges in soc.
Dynamic allocation cannot be used, because ftd_get_ranges
function is called early, when malloc is not available.

Change is required for the alignment of Marvell Armada 38x
device trees present in sys/gnu/dts/arm - originally
the platform has 6 entries in simple-bus 'ranges'.

Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: manu, nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11876
2017-08-09 00:45:25 +00:00
ian
3db9a43722 Remove the ds133x and s35390a i2c RTC drivers for now. They both do i2c
transfers in their probe() or attach() routines, and that doesn't work
when the low-level controller requires interrupts to be functional.

The DS133x family of chips is nearly identical to the DS1307 and support
for them should be added to that driver, then the ds133x driver can be
deleted.  The s35390a driver just needs a non-trivial workover.  In both
cases that work will be done and committed separately.
2017-08-08 22:58:34 +00:00
kp
65bf28f5ad pf_get_sport(): Prevent possible endless loop when searching for an unused nat port
This is an import of Alexander Bluhm's OpenBSD commit r1.60,
the first chunk had to be modified because on OpenBSD the
'cut' declaration is located elsewhere.

Upstream report by Jingmin Zhou:
https://marc.info/?l=openbsd-pf&m=150020133510896&w=2

OpenBSD commit message:
 Use a 32 bit variable to detect integer overflow when searching for
 an unused nat port.  Prevents a possible endless loop if high port
 is 65535 or low port is 0.
 report and analysis Jingmin Zhou; OK sashan@ visa@
Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c

PR:		221201
Submitted by:	Fabian Keil <fk@fabiankeil.de>
Obtained from:  OpenBSD via ElectroBSD
MFC after:	1 week
2017-08-08 21:09:26 +00:00
imp
16f6a3a09f Turns out to be even simpler to just not create /dev/efi if we don't
have a efi runtime.
2017-08-08 21:01:11 +00:00
imp
3c9b264a76 Fail to open efirt device when no EFI on system.
libefivar expects opening /dev/efi to indicate if the we can make efi
runtime calls. With a null routine, it was always succeeding leading
efi_variables_supported() to return the wrong value. Only succeed if
we have an efi_runtime table. Also, while I'm hear, out of an
abundance of caution, add a likely redundant check to make sure
efi_systbl is not NULL before dereferencing it. I know it can't be
NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.
2017-08-08 20:44:16 +00:00
mav
ffae9ac0c3 Fix few issues of LinuxKPI workqueue.
LinuxKPI workqueue wrappers reported "successful" cancellation for works
already completed in normal way.  This change brings reported status and
real cancellation fact into sync.  This required for drm-next operation.

Reviewed by:	hselasky (earlier version)
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11904
2017-08-08 19:36:34 +00:00
jhb
e101a5eb93 Fix a NULL pointer dereference in mly_user_command().
If mly_user_command fails to allocate a command slot it jumps to an 'out'
label used for error handling.  The error handling code checks for a data
buffer in 'mc->mc_data' to free before checking if 'mc' is NULL.  Fix by
just returning directly if we fail to allocate a command and only using
the 'out' label for subsequent errors when there is actual cleanup to
perform.

PR:		217747
Reported by:	PVS-Studio
Reviewed by:	emaste
MFC after:	1 week
2017-08-08 17:49:57 +00:00
asomers
94c85a3167 Make p1003_1b.aio_listio_max a tunable
p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
Its value will be bounded from below by the compile-time constant
AIO_LISTIO_MAX and from above by the compile-time constant
MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.

Reviewed by:	jhb, kib
MFC after:	3 weeks
Relnotes:	yes
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D11601
2017-08-08 16:14:31 +00:00
imp
a00e00af97 Use the correct queue depth for nda devices.
Submitted by: Matt Williams
2017-08-08 16:06:16 +00:00
kib
65e37fca9f Fix logic error in the the assert, causing the condition to be always true.
Also improve the formatting of the corresponding KASSERT message.

Based on the submission by:	Svyatoslav <razmyslov@viva64.com>
Found by:	PVS-Studio
PR:	217741
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
2017-08-08 15:46:29 +00:00
grembo
32fc9c2371 Fix typo in cyapa out of bounds check.
PR:		217783
Submitted by:	razmyslov@viva64.com
MFC after:	1 week
2017-08-08 13:27:32 +00:00
hselasky
32b0d28145 Make sure the received IP header gets 32-bit aligned for short packets
in the mlx5en(4) driver.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:49:36 +00:00
hselasky
5215ede46a Count drop events due to lack of PCI bandwidth as queue drops and not as
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.

Submitted by:		gallatin@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:36:57 +00:00
hselasky
ca4d53322a Fix for mlx4en(4) to properly call m_defrag().
The m_defrag() function can only defrag mbuf chains which have a valid
mbuf packet header. In r291699 when the mlx4en(4) driver was converted
into using BUSDMA(9), the call to m_defrag() was moved after the part
of the transmit routine which strips the header from the mbuf chain.
This effectivly disabled the mbuf defrag mechanism and such packets
simply got dropped.

This patch removes the stripping of mbufs from a chain and loads all
mbufs using busdma. If busdma finds there are no segments, unload
the DMA map and free the mbuf right away, because that means all
data in the mbuf has been inlined in the TX ring. Else proceed
as usual.

Add a per-ring rounter for the number of defrag attempts and
make sure the oversized_packets counter gets zeroed while at it.

The counters are per-ring to avoid excessive cache misses in the
TX path.

Submitted by:		mjoras@
Differential Revision:	https://reviews.freebsd.org/D11683
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2017-08-08 11:35:02 +00:00
avg
a1b991e397 MFV r322242: 8373 TXG_WAIT in ZIL commit path
illumos/illumos-gate@d28671a3b0
d28671a3b0

https://www.illumos.org/issues/8373
  The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign
  a transaction to a transaction group.  That seems to be logically
  incorrect as writing of the ZIL block does not introduce any new dirty
  data.  Also, when there is a lot of dirty data, the call can introduce
  significant delays into the ZIL commit path, thus affecting all
  synchronous writes. Additionally, ARC throttling may affect the ZIL
  writing.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 11:26:03 +00:00
avg
15bc146b2f MFV r322240: 8491 uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS
illumos/illumos-gate@79c2b812ee
79c2b812ee

https://www.illumos.org/issues/8491
  The zpool checkpoint feature in DxOS added a new field in the uberblock.
  The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the
  uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279).
  As these two changes come from two different sources and once upstreamed and
  deployed will introduce an incompatibility with each other we want
  to upstream a change that will reserve the padding for both of them so
  integration goes smoothly and everyone gets both features.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Olaf Faaland <faaland1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>

MFC after:	3 weeks
2017-08-08 11:21:58 +00:00
avg
7dcd5167aa MFV r322238: 7915 checks in l2arc_evict could use some cleaning up
illumos/illumos-gate@267ae6c3a8
267ae6c3a8

https://www.illumos.org/issues/7915
  l2arc_evict() is strictly serialized with respect to
  l2arc_write_buffers() and l2arc_write_done().  Normally, l2arc_evict()
  and l2arc_write_buffers() are called from the same thread, so they can
  not be concurrent.  Also, l2arc_write_buffers() uses zio_wait() on the
  parent zio of all cache zio-s.  That ensures that l2arc_write_done()
  is completed before l2arc_write_buffers() returns.  Finally, if a
  cache device is removed, then l2arc_evict() is called under SCL_ALL in
  the exclusive mode.  That ensures that it can not be concurrent with
  the normal L2ARC accesses to the device (including writing and
  evicting buffers).  Given the above, some checks and actions in
  l2arc_evict() do not make sense.  For instance, it must never
  encounter the write head header let alone remove it from the buffer
  list.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 11:19:14 +00:00
avg
171a48d6dd MFV r322236: 8126 ztest assertion failed in dbuf_dirty due to dn_nlevels changing
illumos/illumos-gate@dcb6872c56
dcb6872c56

https://www.illumos.org/issues/8126
  The sync thread is concurrently modifying dn_phys->dn_nlevels
  while dbuf_dirty() is trying to assert something about it, without
  holding the necessary lock. We need to move this assertion further down
  in the function, after we have acquired the dn_struct_rwlock.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 11:14:40 +00:00
avg
304a35f785 zfs: no need for __DECONST after abd constification in r322233
Note that vdev_label_write_pad2() is FreeBSD specific.

MFC after:	2 weeks
X-MFC after:	r322233
2017-08-08 11:07:34 +00:00
avg
4d6701ef1f MFV r322232: 8426 mark immutable buffer arguments as such in abd.h
illumos/illumos-gate@9b195260e2
9b195260e2

https://www.illumos.org/issues/8426
  abd_copy_from_buf and abd_cmp_buf do not modify their void *buf arguments, so
  qualify them with const.
  abd_copy_from_buf_off and abd_cmp_buf_off already had that type for the
  corresponding arguments.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 10:59:18 +00:00
avg
8d28842dec MFV r322229: 7600 zfs rollback should pass target snapshot to kernel
illumos/illumos-gate@77b171372e
77b171372e

https://www.illumos.org/issues/7600
  At present, the kernel side code seems to blindly rollback to whatever happens
  to be the latest snapshot at the time when the rollback task is processed.
  The expected target's name should be passed to the kernel driver and the sync
  task should validate that the target exists and that it is the latest snapshot
  indeed.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	3 weeks
2017-08-08 10:52:01 +00:00
avg
dab99e05e3 MFV r322227: 8377 Panic in bookmark deletion
illumos/illumos-gate@42418f9e73
42418f9e73

https://www.illumos.org/issues/8377
  The problem is that when dsl_bookmark_destroy_check() is executed from open
  context (the pre-check), it fills in dbda_success based on the existence of the
  bookmark.
  But the bookmark (or containing filesystem as in this case) can be destroyed
  before we get to syncing context. When we re-run dsl_bookmark_destroy_check()
  in syncing
  context, it will not add the deleted bookmark to dbda_success, intending for
  dsl_bookmark_destroy_sync() to not process it. But because the bookmark is
  still in dbda_success
  from the open-context call, we do try to destroy it.
  The fix is that dsl_bookmark_destroy_check() should not modify dbda_success
  when called from open context.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 10:48:52 +00:00
avg
b039960135 MFV r322223: 8378 crash due to bp in-memory modification of nopwrite block
illumos/illumos-gate@b7edcb9408
b7edcb9408

https://www.illumos.org/issues/8378
  The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
  we then nopwrite against.
  zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
  to zgd_bp, dbuf_write_ready()
  could change db_blkptr, and dbuf_write_done() could remove the dirty record.
  dmu_sync() then sees the stale
  BP and that the dbuf it not dirty, so it is eligible for nop-writing.
  The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
  db_mtx. We could still see a stale
  db_blkptr, but if it is stale then the dirty record will still exist and thus
  we won't attempt to nopwrite.

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>

MFC after:	2 weeks
2017-08-08 10:46:51 +00:00
avg
122a984039 MFV r322221: 7910 l2arc_write_buffers() may write beyond target_sz
FreeBD note: the essence of this change was committed to FreeBSD in
r314274.  This commit catches up with differences between what was
committed to FreeBSD and what was committed to OpenZFS, mainly more
logical variable names.

illumos/illumos-gate@16a7e5ac11
16a7e5ac11

https://www.illumos.org/issues/7910
  It seems that the change in issue #6950 resurrected the problem that was
  earlier fixed by the change in issue #5219.
  Please also see the following FreeBSD bug report:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216178

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	2 weeks
2017-08-08 10:43:41 +00:00
markj
f71ca66889 Add round_jiffies_up(), local_clock() and __setup_timer() to the LinuxKPI.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11871
2017-08-08 04:34:02 +00:00
markj
b457025073 Add macros for defining attribute groups and for WO and RW attributes.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11872
2017-08-08 04:30:22 +00:00
marius
19310c915c - If available, use TRIM instead of ERASE for implementing BIO_DELETE.
This also involves adding a quirk table as TRIM is broken for some
  Kingston eMMC devices, though. Compared to ERASE (declared "legacy"
  in the eMMC specification v5.1), TRIM has the advantage of operating
  on write sectors rather than on erase sectors, which typically are
  of a much larger size. Thus, employing TRIM, we don't need to fiddle
  with coalescing BIO_DELETE requests that are also of (write) sector
  units into erase sectors, which might not even add up in all cases.
- For some SanDisk iNAND devices, the CMD38 argument, e. g. ERASE,
  TRIM etc., has to be specified via EXT_CSD[113], which now is also
  handled via a quirk.
- My initial understanding was that for eMMC partitions, the granularity
  should be used as erase sector size, e. g. 128 KB for boot partitions.
  However, rereading the relevant parts of the eMMC specification v5.1,
  this isn't actually correct. So drop the code which used partition
  granularities for delmaxsize and stripesize. For the most part, this
  change is a NOP, though, because a) for ERASE, mmcsd_delete() used
  the erase sector size unconditionally for all partitions anyway and
  b) g_disk_limit() doesn't actually take the stripesize into account.
- Take some more advantage of mmcsd_errmsg() in mmcsd(4) for making
  error codes human readable.
2017-08-07 23:33:05 +00:00
imp
be20038c69 Eliminate useless adjustments of aliased device.
No need to set any fields in the cloned device. devfs uses symlinks,
so the adev entries returned won't be presented to the drivers. Since
we don't save copies, nothing else will see them. This code came from
the old compat code, and it appears to be obsolete or never needed.

Submitted by: kib@
Differential Review: https://reviews.freebsd.org/D11919
2017-08-07 22:42:46 +00:00
imp
0fb69df140 Add nvd alias to nda ndoes.
All ndaX and ndaXpY nodes will appear as nvdX and nvdXpY as well
(through symlinks in devfs via the normal disk aliasing mechanism in
GEOM).

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:43 +00:00
imp
9164f4bf81 Expose API to allow disks to ask for alias names in devfs.
Implement disk_add_alias to allow aliases to be added to disks. All
disk have a primary name (say "foo") can also have secondary names
(say "bar") such that all instances of "foo" also have a "bar"
alias. So if you have foo0, foo0p1, foo1, foo1s1 and foo1s1a nodes
created by the foo driver and gpart, device nodes bar0, bar0p1, bar1,
bar1s1 and bar1s1a will appear as symlinks back to the original nodes.
This generalizes to multiple aliases. However, since the unit number
follows the primary name, multiple device drivers can't create the
same aliases unless those drives coorinate the unit number space (eg
you couldn't add an alias 'disk' to both 'da' and 'ada' because it's
possible to have da0 and ada0, because 'disk0' is ambiguous).

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:38 +00:00
imp
044dbe2765 Add alias support to gpart.
When we're creating new providers for each of the partitions, add
aliases to the geom before we create the provider so when geom_dev
tastes the provider, the aliases are in place so the proper /dev
entries are created. So foo5p6 gets created as an alias for bar5p6
when foo is an alias for bar in the geom we're partitioning with
g_part. This also copies aliases from the container geom (eg disk) to
the label geom (the disk with GPT partitioning) so that aliases nest
properly.

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:33 +00:00
imp
db8afdbab3 Add aliasing concept to geom.
Add an alias name list to geoms. Use them in geom_dev to create
aliases. Previously, geom_dev would create an device node for the name
of the geom. Now, additional nodes are created pointing back to the
primary node with make_dev_alias_p. Aliases must be in place on the
geom before any tasting occurs.

Differential Revision: https://reviews.freebsd.org/D11873
2017-08-07 21:12:28 +00:00
mckusick
5d1a60aa44 gjournal is broken in handling its flush_queue. If we have 10 bio's
in the flush_queue:
         1 2 3 4 5 6 7 8 9 10
and another 10 bio's go into the flush queue after only the first five
bio's are removed from the flush queue, the queue should look like:
         6 7 8 9 10 11 12 13 14 15 16 17 18 19 20,
but because of the bug we end up with
         6 11 12 13 14  15 16 17 18 19 20 7 8 9 10.
So the sequence of the bio's is damaged in the flush queue (and
therefore in the journal on disk !). This error can be triggered by
ffs_snapshot() when a block is read with readblock() and gjournal finds
this block in the broken flush queue before it goes to the correct
active queue.

The fix is to place all new blocks at the end of the queue.

Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
2017-08-07 19:40:03 +00:00
mckusick
75f1d384e9 sysctl kern.geom.journal.cache.limit shows negative value for FreeBSD/amd64
system having over 4GB RAM. That's due to:

1) the limit being u_int instead of u_long like vm.kmem_size (the limit is
   half of vm.kmem_size by default for amd64);
2) sysctl handler g_journal_cache_limit_sysctl() using u_int instead of u_long.

The fix is to replace u_int with u_long for the kern.geom.journal.cache.limit
sysctl variable.

PR: 198500
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Reported by: Eugene Grosbein
Discussed with: kib
MFC after: 1 week
2017-08-07 19:18:27 +00:00
kib
e8a9dbdea0 Avoid DI recursion when reclaim_pv_chunk() is called from
pmap_advise() or pmap_remove().

Reported and tested by:	pho (previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-07 17:29:54 +00:00
kib
fdfd6e6d57 Explain why delayed invalidation is not required in pmap_protect() and
pmap_remove_pages().

Submitted by:	alc
MFC after:	1 week
2017-08-07 17:23:10 +00:00
mav
04a64ac1b2 Fix hrtimer_active() in case of cancellation.
While there, switch to FreeBSD internal callout active status.

Reviewed by:	markj, hselasky
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11900
2017-08-07 14:34:05 +00:00
br
3364e8aea9 o Replace __riscv__ with __riscv
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)

This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.

RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):

__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen

Reviewed by:	ngie
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11901
2017-08-07 14:09:57 +00:00
np
3d45a770a6 cxgbe(4): Add the T6 and T5 Unified Wire configuration files to the
kernel, just like for T4, when the driver is compiled into the kernel.

Reported by:	mav@
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-07 14:04:19 +00:00
np
159a7130e0 cxgbe(4): Avoid a NULL dereference that would occur during module unload
if there were problems earlier during attach.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-06 19:45:59 +00:00
kib
3df4ae2137 Remove trivial comments. Remove and-ing with UINT_MAX for minor(),
cast to int already does the required truncation of significant bits.

Requested and reviewed by:	bde
Sponsored by:	The FreeBSD Foundation
2017-08-06 12:27:20 +00:00
andrew
ba47a8c8f3 Mark each cpu in the appropriate cpuset_domain set. This allows devices to
handle cases where they can only run on a single domain.

To allow all devices access to this set we need to move reading the domain
earlier in the boot as it was previously handled in the CPU driver, however
this is too late for the GICv3 ITS driver.

Sponsored by:	DARPA, AFRL
2017-08-05 20:57:34 +00:00
jkim
0ac013123c Detect hypervisors early. We used to set lower hz on hypervisors by default
but it was broken since r273800 (and r278522, its MFC to stable/10) because
identify_cpu() is called too late, i.e., after init_param1().

MFC after:	3 days
2017-08-05 06:56:46 +00:00
tsoome
5dee8edc1d libefi/time.c cstyle cleanup
libefi/time.c is mix of different styles, this update does cleanup.
Also fix 0 versus NULL, and zero the tv structure for case we get error
from UEFI firmware.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D11861
2017-08-05 05:20:03 +00:00
cy
0e6a9a6cef Fix matchcing of NATed ICMP queries (resolving NATed MTU discovery).
MFC after:	1 month
2017-08-05 00:28:42 +00:00
imp
93d910f645 Move EFI fmtdev functionality to libefi
This patch moves code necessary for the fmtdev functionality from
loader to libefi, allowing other applications to make use of it

Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11862
2017-08-04 16:33:36 +00:00
np
6add3f72fd cxgbe(4): Allow the TOE timer tunables to be set with microsecond
precision.  These timers are already displayed in microseconds in the
sysctl MIB.  Add variables to track these tunables while here.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-04 15:57:10 +00:00
andrew
00495d218a Start to teach the GICv3 driver about NUMA. On ThunderX we may have
multiple ITS devices, however we only want a single ITS device to be
configured on each CPU. To fix this only enable ITS when the node matches
the CPUs node.

Sponsored by:	DARPA, AFRL
2017-08-04 13:08:45 +00:00
andrew
0e7986f6d8 Read the numa-node-id property from each CPU node. This will initially be
used to support the dual package ThunderX where we need to send MSI/MSI-X
interrupts to the same package as the device the interrupt came from.

Sponsored by:	DARPA, AFRL
2017-08-04 10:33:22 +00:00
kib
c0e1805d87 Relax visibility for some termios symbols.
They are defined by XSI or newer SUS.
This is a follow-up to r318780.

Reported by:	jbeich
Obtained from:	DragonflyBSD commit e08b3836c962
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-04 09:45:40 +00:00
alc
590490023c In case readers are misled by expressions that combine multiplication and
division, add parentheses to make the precedence explicit.

Submitted by:	Doug Moore <dougm@rice.edu>
Requested by:	imp
Reviewed by:	imp
MFC after:	1 week
X-MFC after:	r321840
Differential Revision:	https://reviews.freebsd.org/D11815
2017-08-04 04:23:23 +00:00
imp
5a0a56d9eb Add EFI utility functions to libefi
This patch adds additional EFI utility functions to convert errno
values to EFI_STATUS errors, as well as EFI times to UNIX times.

Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11858
2017-08-04 04:20:11 +00:00
imp
50dc7e2335 Move EFI ZFS functions to libefi
This patch moves some EFI ZFS functions from loader to libefi,
allowing them to be used by anything that links against libefi.

Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11855
2017-08-04 04:20:06 +00:00
imp
878bd2bb62 Add definitions and utilities for EFI drivers
This patch adds definitions and utility code for creating EFI drivers
using the EFI_DRIVER_BINDING_PROTOCOL.

Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11852
2017-08-04 04:16:41 +00:00
imp
b001b92dc8 Make nvd vs nda choice boot-time rather than build-time
Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and
nda to be installed in the kernel, while allowing only one of them to
create devices. This is an all-or-nothing setting, and you can't
change it after boot-time. However, it will allow easier A/B testing.

Differential Revision: https://reviews.freebsd.org/D11825
2017-08-04 03:40:01 +00:00
np
1ddd451273 cxgbe(4): Always use the first and not the last virtual interface
associated with a port in begin_synchronized_op.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-08-04 01:28:06 +00:00
cem
d4a7946bbd x86: Tag some intrinsics with __pure2
Some C wrappers for x86 instructions do not touch global memory and only act
on their arguments; they can be marked __pure2, aka __const__.  Without this
annotation, Clang 3.9.1 is not intelligent enough on its own to grok that
these functions are __const__.

Submitted by:	Anton Rang <anton.rang AT isilon.com>
Sponsored by:	Dell EMC Isilon
2017-08-03 22:28:30 +00:00
markj
128fe21bdc Bump the maximum file name length in pseudofs filesystems to 48.
The previous limit of 24 was somewhat restrictive, and with this change
ceil(log2(sizeof(struct pfs_node))) is the same as before in both the ILP32
and LP64 models, so the malloc zone used for allocations of struct pfs_node
is the same as before.

Approved by:	des
2017-08-03 21:35:53 +00:00
markj
8431970965 Add subsystem vendor and device ID fields to struct pci_dev.
MFC after:	1 week
2017-08-03 21:14:46 +00:00
manu
27c9f9628b arm: Add a GENERIC-NODEBUG kernel config
Like amd64 or arm64 provide a GENERIC-NODEBUG configuration file that
remove WITNESS and INVARIANTS etc ...
2017-08-03 19:01:46 +00:00
ian
257ab82394 Add missing header file to SRCS.
Reported by:	manu@
2017-08-03 18:49:15 +00:00
ian
54f2ed2588 Switch to iicdev_readfrom/writeto() to do xfers with proper bus ownership.
Tested by:	manu@
2017-08-03 18:43:54 +00:00
ian
c845c5dba0 Add an ahci driver for imx6.
This was submitted by Rogiel Sulzbach (thank you!) but has a few last-minute
changes by me, mostly where the code interfaces to my still-utterly-deficient
imx6_ccm clocks implementation.  So blame me for any mistakes.

Submitted by:	Rogiel Sulzbach <rogiel@rogiel.com>
Differential Revision:	https://reviews.freebsd.org/D11177
2017-08-03 14:43:41 +00:00
np
ccd45df4e2 cxgbe(4): Initial import of the "collect" component of Chelsio unified
debug (cudbg) code, hooked up to the main driver via an ioctl.

The ioctl can be used to collect the chip's internal state in a
compressed dump file.  These dumps can be decoded with the "view"
component of cudbg.

Obtained from:	Chelsio Communications
MFC after:	2 months
Sponsored by:	Chelsio Communications
2017-08-03 14:43:30 +00:00
ngie
7d39d99ea1 Revert r321969
My change had good intentions, but the implementation was incorrect:
- printf was returning the number of characters in the format string
  plus the NUL, but failed in two regards implementation wise:
-- the pathological case, printf(""), wasn't being handled properly since
   the pointer is always incremented, so the value returned would be
   off-by-one.
-- printf(3) reports the number of characters printed post-conversion via
   vfprintf, etc.
- putchar(3) should return the character printed or EOF, not the number
  of characters output to the screen.

My goal in making the change (again) was to increase parity, but as bde
pointed out these are freestanding functions, so they don't have to
conform to libc/POSIX. I argued that the functions should be named
differently since the implementation is different enough to warrant it
and to allow boot2 code to be usable when linked against sys/boot and
libstand and other libraries in base. I have no interest in pushing
this change forward more though, as the original concern I had behind
the change with zfsboottest was resolved in r321849 and r321852. The
next person that updates the toolchain gets to deal with the
inconsistency if it's flagged by a newer compiler.

MFC after:	1 month
Reported by:	ed, markj
2017-08-03 13:50:46 +00:00
hselasky
8b14081d49 Change reject message type when destroying cm_id in ibore.
This patch fixes an interopability issue between FreeBSD and non-FreeBSD
systems when the connection establishment is aborted. Refer to the
initial commit in Linux, drivers/infiniband/core/cm.c,
for a more detailed description.

Obtained from:	Linux
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:31:10 +00:00
hselasky
67b8c14d85 Ticks are 32-bit in FreeBSD.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:18:25 +00:00
hselasky
f54e71e0aa Resolve locking issue for non-sleepable context in the mlx5core.
Code inspection reveals the busdma unload and free functions
do not write to the belonging dma tag and does not need to be
serialized. This allows mlx5_fwp_free() to be called from
software interrupt context.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:14:43 +00:00
hselasky
687ade08e2 Using GFP_ATOMIC with firmware commands is not supported after busdma was
introduced in the mlx5core, because busdma might sleep when loading memory
into DMA.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-08-03 09:11:51 +00:00
markj
3ebf6d52bf Remove D_TRACKCLOSE now that ksyms no longer has a close method.
Reported by:	jhb
X-MFC with:	r321963
2017-08-03 05:55:01 +00:00
ngie
86d915eb15 Fix the return types for printf and putchar to match their libc and
POSIX equivalents

Both printf and putchar return int, not void.

This will allow code that leverages the libcalls and checks/rely on the
return type to interchangeably between loader code and non-loader
code.

MFC after:	1 month
2017-08-03 05:27:05 +00:00
sephe
5863c3132c hyperv/kvp: Use proper size macro for adapter id.
Submitted by:	Christopher Ertl <Christopher.Ertl microsoft com>
MFC after:	3 days
Sponsored by:	Microsoft
2017-08-03 01:44:40 +00:00
markj
09b1c6f030 Rework and simplify the ksyms(4) implementation.
- Store the symbol table contents in an anonymous swap-backed object. Have
  mmap(/dev/ksyms) map that object, and stop mapping the symbol table into
  the calling process in ksyms_open(). Previously we would cache a pointer
  to the pmap of the opening process, and mmap(/dev/ksyms) would create a
  mapping using the physical address found by a pmap lookup at the initial
  mapping address. However, this assumes that the cached pmap is valid,
  which may not be the case. [1]
- Remove the ksyms ioctl interface. It appears to have been added to work
  around a limitation in libelf that no longer exists; see r321842.
  Moreover, the interface is difficult to support and isn't present in
  illumos. Since ksyms was added specifically to support lockstat(1), it
  is expected that this removal won't have any real impact.
- Simplify ksyms_read() to avoid unnecessary copying.
- Don't call the device handle destructor if we fail to capture a snapshot
  of the kernel's symbol table. devfs will do that for us.

Reported by:	Ilja van Sprundel <ivansprundel@ioactive.com> [1]
Reviewed by:	kib (previous revision)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11789
2017-08-03 00:38:13 +00:00
marius
a97fc401cf - Correct the remainder of confusing and error prone mix-ups between
"br" or "bridge" where - according to the terminology outlined in
  comments of bridge.h and mmcbr_if.m  around since their addition in
  r163516 - the bus is meant and used instead. Some of these instances
  are also rather old, while those in e. g. mmc_subr.c are as new as
  r315430 and were caused by choosing mmc_wait_for_request(), i. e. the
  one pre-r315430 outliner existing in mmc.c, as template for function
  parameters in mmc_subr.c inadvertently. This correction translates to
  renaming "brdev" to "busdev" and "mmcbr" to "mmcbus" respectively as
  appropriate.
  While at it, also rename "reqdev" to just "dev" in mmc_subr.[c,h]
  for consistency with was already used in mmm.c pre-r315430, again
  modulo mmc_wait_for_request() that is.
- Remove comment lines from bridge.h incorrectly suggesting that there
  would be a MMC bridge base class driver.
- Update comments in bridge.h regarding the star topology of SD and SDIO;
  since version 3.00 of the SDHCI specification, for eSD and eSDIO bus
  topologies are actually possible in form of so called "shared buses"
  (in some subcontext later on renamed to "embedded" buses).
2017-08-02 21:11:51 +00:00
manu
c319a803c2 arm64: Add Allwinner H5 SoC
Allwinner H5 is an H3 (arm32) with Cortex A53 cores.
Add support for it and enable it in GENERIC kernel config

Tested on: OrangePi PC2
2017-08-02 20:19:19 +00:00
manu
1c0c749e31 allwiner: modclk: Do not try to enable parent clock if it doesn't exist 2017-08-02 20:17:04 +00:00
ian
a92d8ed8a5 Fix the interface to imx_iomux_gpr_get/set(). The functions were defined
as taking a register number, and that would get multiplied by 4 to make
a register address.  But the header file that consumers have to reference
this stuff publishes register addresses, not numbers.  So now everything
works in terms of register addresses.

Note that the HDMI init code was writing into the wrong register before
this change.  Apparently whatever it wrote to was harmless, and apparently
HDMI was working because uboot had set up the right bits.
2017-08-02 18:28:06 +00:00
ian
6a0aef8772 Add missing ofw_bus_if.h src file. 2017-08-02 15:16:40 +00:00
ian
9eb2bdb43a The imx6_snvs driver is not strictly required for the system to run, so
change it from standard to optional and add a device statement for it so
that it's included unless someone uses nodevice to eliminate it.
2017-08-02 15:15:18 +00:00
kib
71aed9ee07 For makedev(), cast the minor argument to unsigned type explicitely,
avoiding possible sign propagation.

Submitted by:	hselasky
2017-08-02 14:54:54 +00:00
hselasky
4218cff66f Fix LinuxKPI regression after r321920. The mda_unit and si_drv0 fields are not
wide enough to hold the full 64-bit dev_t. Instead use the "dev" field in
the "linux_cdev" structure to store and lookup this value.

While at it remove superfluous use of parenthesis inside the
MAJOR(), MINOR() and MKDEV() macros in the LinuxKPI.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-08-02 14:27:27 +00:00
andrew
ff6b78ab3b Fix the return type for get_cntxc(). The register is 64-bit on both arm
and arm64 so move any truncation to the caller.

Submitted by:	Mihai Carabas <mihai.carabas@gmail.com>
X-Differential Revision:	https://reviews.freebsd.org/D10213
2017-08-02 14:12:47 +00:00
ed
28a0fcb4f0 Keep top page on CloudABI to work around AMD Ryzen stability issues.
Similar to r321899, reduce sv_maxuser by one page inside of CloudABI.
This ensures that the stack, the vDSO and any allocations cannot touch
the top page of user virtual memory.

Considering that CloudABI userspace is completely oblivious to virtual
memory layout, don't bother making this conditional based on the CPU of
the running system.

Reviewed by:	kib, truckman
Differential Revision:	https://reviews.freebsd.org/D11808
2017-08-02 13:08:10 +00:00
mjg
420c428175 amd64: annotate the syscall return address check with __predict_false
before:
   0xffffffff80b03ebb <+2059>:	mov    0x460(%r14),%rax
   0xffffffff80b03ec2 <+2066>:	mov    0x98(%rax),%rax
   0xffffffff80b03ec9 <+2073>:	shr    $0x2f,%rax
   0xffffffff80b03ecd <+2077>:	je     0xffffffff80b03edd <amd64_syscall+2093>
   0xffffffff80b03ecf <+2079>:	mov    0x3f8(%r14),%rax
   0xffffffff80b03ed6 <+2086>:	orl    $0x1,0xc8(%rax)
   0xffffffff80b03edd <+2093>:	add    $0xf8,%rsp

after:
   0xffffffff80b03ebb <+2059>:	mov    0x460(%r14),%rax
   0xffffffff80b03ec2 <+2066>:	mov    0x98(%rax),%rax
   0xffffffff80b03ec9 <+2073>:	shr    $0x2f,%rax
   0xffffffff80b03ecd <+2077>:	jne    0xffffffff80b03eef <amd64_syscall+2111>
   0xffffffff80b03ecf <+2079>:	add    $0xf8,%rsp

Reviewed by:	kib
MFC after:	1 week
2017-08-02 11:25:38 +00:00
kib
ea44a9dd55 Change major()/minor() to work with 64bit dev_t.
Since traditional types for the macros values are int, remove the
cookie trick and just split the dev_t at the word boundary.

Reported by:	Victor Stinner <victor.stinner@gmail.com>
PR:	221048
Sponsored by:	The FreeBSD Foundation
2017-08-02 10:14:17 +00:00
kib
38f0d940ba Do not call trapsignal() after handling usermode fault or interrupt,
when a signal is not intended to be sent.

The variable holding the signal number to send is left uninitialized,
which sometimes triggers invalid signal checks.

For NMI, a return to usermode without ast processing is done.  On the
other hand, for spurious dtrace probe interrupt it is usermode which
triggered the interrupt, so handle it through userret() as any other
fault.

Reported by:	Nils Beyer <nbe@renzel.net>
PR:	221151
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-08-02 10:12:10 +00:00
truckman
43c5cd502a Lower the amd64 shared page, which contains the signal trampoline,
from the top of user memory to one page lower on machines with the
Ryzen (AMD Family 17h) CPU.  This pushes ps_strings and the stack
down by one page as well.  On Ryzen there is some sort of interaction
between code running at the top of user memory address space and
interrupts that can cause FreeBSD to either hang or silently reset.
This sounds similar to the problem found with DragonFly BSD that
was fixed with this commit:
  https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20
but our signal trampoline location was already lower than the address
that DragonFly moved their signal trampoline to.  It also does not
appear to be related to SMT as described here:
  https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498

  "Hi, Matt Dillon here. Yes, I did find what I believe to be a
   hardware issue with Ryzen related to concurrent operations. In a
   nutshell, for any given hyperthread pair, if one hyperthread is
   in a cpu-bound loop of any kind (can be in user mode), and the
   other hyperthread is returning from an interrupt via IRETQ, the
   hyperthread issuing the IRETQ can stall indefinitely until the
   other hyperthread with the cpu-bound loop pauses (aka HLT until
   next interrupt). After this situation occurs, the system appears
   to destabilize. The situation does not occur if the cpu-bound
   loop is on a different core than the core doing the IRETQ. The
   %rip the IRETQ returns to (e.g. userland %rip address) matters a
   *LOT*. The problem occurs more often with high %rip addresses
   such as near the top of the user stack, which is where DragonFly's
   signal trampoline traditionally resides. So a user program taking
   a signal on one thread while another thread is cpu-bound can cause
   this behavior. Changing the location of the signal trampoline
   makes it more difficult to reproduce the problem. I have not
   been because the able to completely mitigate it. When a cpu-thread
   stalls in this manner it appears to stall INSIDE the microcode
   for IRETQ. It doesn't make it to the return pc, and the cpu thread
   cannot take any IPIs or other hardware interrupts while in this
   state."
since the system instability has been observed on FreeBSD with SMT
disabled.  Interrupts to appear to play a factor since running a
signal-intensive process on the first CPU core, which handles most
of the interrupts on my machine, is far more likely to trigger the
problem than running such a process on any other core.

Also lower sv_maxuser to prevent a malicious user from using mmap()
to load and execute code in the top page of user memory that was made
available when the shared page was moved down.

Make the same changes to the 64-bit Linux emulator.

PR:		219399
Reported by:	nbe@renzel.net
Reviewed by:	kib
Reviewed by:	dchagin (previous version)
Tested by:	nbe@renzel.net (earlier version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11780
2017-08-02 01:43:35 +00:00
markj
2350ab0562 Amend r321884 to check the refcount and update the class with w_mtx held.
Reviewed by:	jhb
X-MFC with:	r321884
2017-08-01 23:14:38 +00:00
manu
656170de8b Allwinner dtb: Add NanoPi M1 to the build
It was tested on NanoPi M1 Plus.
2017-08-01 20:28:11 +00:00
manu
93d65d812f Alwinner: nanopi-neo: Remove r_i2c node from DTS as it isn't used on the board 2017-08-01 19:22:00 +00:00
manu
29b626a5b5 Allwinner dtb: add link for NanoPi Neo
Reported by:	Richard Puga <richard@puga.net>
Tested by:	Richard Puga <richard@puga.net>, myself
2017-08-01 18:33:27 +00:00
markj
c11beada10 Fix a witness assertion that fires when a lock type's class changes.
When all instances of a lock type are destroyed (for example, after a
module unload), the corresponding witness entry remains associated with
that lock type. In this case, we shouldn't panic if a new instance of the
lock type is created and its lock class does not match that recorded in the
witness entry.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11788
2017-08-01 17:50:28 +00:00
mw
4a43503667 Merge ena-com 1.1.4.2
Update ENA HAL after fixing gcc build in r321861.

Submitted by: rlibby
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Differential Revision: https://reviews.freebsd.org/D11480
2017-08-01 11:00:04 +00:00
royger
61a9e101c9 pci: fix write order when sizing BARs
According to the PCI Local Specification rev. 3.0 in case of a 64-bit
BAR both the low and the high parts of the register should be set to
~0 before attempting to read back the size.

So far I have found no single device that has problems with the
previous approach, but I think it's better to stay on the safe size.

This commit should not introduce any functional change.

MFC after:		3 weeks
Sponsored by:		Citrix Systems R&D
Reviewed by:		jhb
Differential revision:	https://reviews.freebsd.org/D11750
2017-08-01 10:47:44 +00:00
mav
a9f334c2d4 Add explicit check for PCI bus to r321720.
Reported by:	jhb
MFC after:	6 days
2017-08-01 09:22:10 +00:00
ngie
b1b35d300c Standardize paths on SRCTOP instead of .CURDIR-relative paths
MFC after:	1 week
2017-08-01 05:39:40 +00:00
markj
b23f6b9b03 Batch updates to v_wire_count when freeing page table pages on x86.
The removed release stores are not needed since stores are totally
ordered on i386 and amd64.

Reviewed by:	alc, kib (previous revision)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11790
2017-08-01 05:26:30 +00:00
ngie
f4a5c35feb Clean up style in print_state(..) and pager_printf(..)
No functional change intended.

MFC after:	3 days
2017-08-01 05:16:14 +00:00
ian
fac28683ec Add a driver for the Intersil ISL12xx family of i2c RTC chips.
Supports ISL1209, ISL1218, ISL1219, ISL1220, ISL1221 (just basic RTC
functionality, not all the other fancy stuff the chips can do).
2017-08-01 04:16:52 +00:00
alc
48fd0ecbb7 The blist_meta_* routines that process a subtree take arguments 'radix' and
'skip', which denote, respectively, the largest number of blocks that can be
managed by a subtree of that height, and one less than the number of nodes
in a subtree of that height.  This change removes the 'skip' argument from
those functions because 'skip' can be trivially computed from 'radius'.
This change also redefines 'skip' so that it denotes the number of nodes in
the subtree, and so changes loop upper bound tests from '<= skip' to '<
skip' to account for the change.

The 'skip' field is also removed from the blist struct.

The self-test program is changed so that the print command includes the
cursor value in the output.

Submitted by:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
2017-08-01 03:51:26 +00:00
dchagin
b084ee9dfc Implement proper Linux /dev/fd and /proc/self/fd behavior by adding
Linux specific things to the native fdescfs file system.

Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.

Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.

For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.

For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:

    mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd

Reviewed by:	kib@
MFC after:	1 week
Relnotes:	yes
2017-08-01 03:40:19 +00:00
pfg
541463d871 sys/net8021: Add missing braces in setcurchan().
Obtained from:	DragonFlyBSD (git c69e37d6)
MFC after:	3 days
2017-08-01 03:13:43 +00:00
sephe
571da5685c hyperv/hn: Add comment about ether_ifattach event subscription.
MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11710
2017-08-01 02:55:43 +00:00
sephe
f1555b6b44 hyperv/hn: Renaming and minor cleanup
This prepares for the upcoming transparent VF support.

MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11708
2017-08-01 02:45:54 +00:00
ian
c6f61639be Build iicbus/{ds1307,ds3231,nxprtc} as modules. 2017-07-31 22:32:11 +00:00
ian
406fb7d3e7 Restructure the SUBDIR list as 1-per-line and alphabetize, so it will be
easier to add new things (and see what changed) in the future.
2017-07-31 22:26:30 +00:00
ian
700ccdc825 Bugfixes and enhancements...
Don't enable the oscillator when it is found to be stopped at init time,
just let the first setting of valid time start it.  But still report a dead
battery if it's stopped at init time.

Don't force the chip into 24hr mode, just cope with whatever mode it is
already in.

Schedule the clock_settime() callbacks to align the RTC clock to top of
second when setting it.
2017-07-31 22:00:00 +00:00
ian
81eea56b41 No need to call getnanotime() now that the waiting is done by the central
subr_rtc code, switch from CLOCKF_SETTIME_NO_TS to CLOCKF_SETTIME_NO_ADJ
so that we get fed a timestamp, but it's not adjusted to compensate for
inaccuracy in setting time.
2017-07-31 21:53:00 +00:00
mckusick
940856bc57 Avoid reading a snapshot block when it is already in the cache.
Update the use of the B_CACHE flag (since the May 1999 commit
that made it the correct test here).

Reported by: Andreas Longwitz <longwitz@incore.de>
Reviewed by: kib
Tested by: Peter Holm
MFC after: 1 week
2017-07-31 20:41:45 +00:00
markj
2c9a28c567 Batch v_wire_count decrements in vm_hold_free_pages().
Atomic updates to v_wire_count are a significant source of contention, so
combine multiple updates into one in this easy case. Also remove an old
printf that gets executed if the page is shared-busied, which is a case
that will lead to a panic anyway.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11791
2017-07-31 18:48:58 +00:00
markj
da9101cade Don't trace running threads that have interrupts disabled.
In this case we shouldn't assume that the thread has a valid frame pointer.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11787
2017-07-31 17:57:54 +00:00
scottl
58fdb26bf8 Fix a logic bug in the split PCI interrupt code that slipped through
Reported by:	Harry Schmalzbauer
2017-07-31 16:55:56 +00:00
ian
76925ae8c9 Restore a few rather important lines of code that got fumbled in r321746. 2017-07-31 16:46:16 +00:00
ian
7a634438e9 Check the clock-halted flag every time the clock is read, not just once
at startup.  The flag stays set until the clock is loaded with good time,
so we need to keep saying the time is invalid until that happens.
2017-07-31 15:24:40 +00:00
mav
c2a28dc66d Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately.  It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.

The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time.  Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes.  There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2017-07-31 15:23:19 +00:00
ian
81b1ab7989 Add a detach() method. 2017-07-31 14:58:01 +00:00
ian
ee4384851e Switch from using iic_transfer() to iicdev_readfrom/writeto(), mostly so
that transfers will be done with proper ownership of the bus. No
behavioral changes.  Also add a detach() method.
2017-07-31 14:57:02 +00:00
hselasky
99987f4811 Remove some dead statistics related code and a structure field from the
mlx4en driver which is used by its Linux counterpart, but not under
FreeBSD.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-07-31 12:09:24 +00:00
hselasky
fb832dada9 Make sure on-stack buffer is properly aligned.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-07-31 12:03:45 +00:00
ae
4c7d86cbec Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
from enc_hhook().

This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.

Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.

PR:		220217
MFC after:	3 weeks
Sponsored by:	Yandex LLC
2017-07-31 11:04:35 +00:00
hselasky
e032b1d69d Remove cycle_t type from the LinuxKPI similar to Linux upstream.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-07-31 09:17:54 +00:00
hselasky
78a8955834 Fix broken usage of the mlx4_read_clock() function:
- return value has too small width
 - cycle_t is unsigned and cannot be less than zero

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2017-07-31 09:15:15 +00:00
ian
f9021c020b Remove now-unused variable. 2017-07-31 03:19:16 +00:00
ian
a0fd5a136c Use the new clock_schedule() to arrange for clock_settime() to be called
at the right time to keep the RTC hardware time in sync, instead of using
pause_sbt() to sleep until the right time.
2017-07-31 01:36:51 +00:00
ian
4b1f61b073 Add clock_schedule(), a feature that allows realtime clock drivers to
request that their clock_settime() methods be called at a given offset
from top-of-second.  This adds a timeout_task to the rtc_instance so that
each clock can be separately added to taskqueue_thread with the scheduling
it prefers, instead of looping through all the clocks at once with a
single task on taskqueue_thread.  If a driver doesn't call clock_schedule()
the default is the old behavior: clock_settime() is queued immediately.

The motivation behind this is that I was on the path of adding identical
code to a third RTC driver to figure out a delta to top-of-second and
sleep for that amount of time because writing the the RTC registers resets
the hardware's concept of top-of-second.  (Sometimes it's not top-of-second,
some RTC clocks tick over a half second after you set their time registers.)
Worst-case would be to sleep for almost a full second, which is a rude thing
to do on a shared task queue thread.
2017-07-31 01:18:21 +00:00
markj
e0ef97d135 Correct the predicates on which lockstat:::{thread,spin}-spin fire.
In particular, they should fire only if the lock was owned by another
thread when we first attempted to acquire that lock.

MFC after:	1 week
2017-07-31 00:59:28 +00:00
ian
91a53253d4 Add taskqueue_enqueue_timeout_sbt(), because sometimes you want more control
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
2017-07-31 00:54:50 +00:00
scottl
20aeddd599 Don't re-parse PCI IDs in order to set card-specific flags, use
the flags field in the PCIID table.
2017-07-31 00:05:49 +00:00
avos
48003447fd rtwn_usb: add support for fragmented Rx.
Since device can pass multiple frames in a single payload temporary
Rx buffer was big enough to hold all of them; now the driver can
concatenate a single frame from multiple payloads.

The Rx buffer size may be configured via tunable (dev.rtwn.%d.rx_buf_size).

Tested with:
 - rtl8188cus, rtl8188eu and rtl8821au (STA mode).
 - (by kevlo) rtl8192cu and rtl8188eu.

PR:		218527
Reviewed by:	kevlo
Differential Revision:	https://reviews.freebsd.org/D11705
2017-07-30 23:35:21 +00:00
scottl
7c22607544 Change from using underbar function names to normal function names for
the informational print functions.  Collapse the debug API a bit to be
more generic and not require as much code duplication.  While here, fix
a bug in MPS that was already fixed in MPR.
2017-07-30 22:34:24 +00:00
avos
9954785944 zyd: code cleanup + drop unneeded cast.
No functional change intended.
2017-07-30 22:17:08 +00:00
kib
030abb1986 Remove unused symbols.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-30 21:52:22 +00:00
avos
43d8fccf7c rtwn: drop unnecessary / wrong conversion.
The 'chan' field occupies only one byte.
2017-07-30 21:50:45 +00:00
dchagin
93ea6317f4 Avoid using [LINUX_]SHAREDPAGE constant directly in the vdso code.
This is needed for https://reviews.freebsd.org/D11780.

Reported by:	kib@
2017-07-30 21:24:20 +00:00
ian
0b056c8f8d Fix AM/PM mode handling. The bits to mask off in the hours register changes
between 12/24 hour mode.  Also fix conversion between 12 and 24 hour mode.
It's not as easy as adding/subtracting 12, because the clock doesn't roll
over 11->0, it rolls over 12->1; 0 isn't a valid hour in AM/PM mode.
2017-07-30 19:58:31 +00:00
ian
6f87a631b7 Bugfixes and enhancements...
Don't enable the oscillator when it is found to be stopped at init time,
just let the first setting of valid time start it.  But still report a dead
battery if it's stopped at init time.

Don't force the chip into 24hr mode, just cope with whatever mode it is
already in.

Align the RTC clock to top of second when setting it.
2017-07-30 18:46:38 +00:00
hselasky
51c7a87f69 Properly range check length of parsed information elements in RSU driver.
Found by:		Ilja van Sprundel <ivansprundel@ioactive.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2017-07-30 16:45:28 +00:00
ian
b7121ba36f Switch from using iic_transfer() to iicdev_readfrom/writeto(), mostly so
that transfers will be done with proper ownership of the bus. No
behavioral changes.
2017-07-30 16:17:06 +00:00
mav
b63a1a3473 Attach ichwd(4) only to ISA bus of the LPC bridge.
Resource allocation for parent device does not look good by itself, but
attempt to allocate them for unrelated device just does not end up good.
On Asus X99-E WS/USB3.1 system reporting ISA bridge via both PCI and ACPI
this reported to cause kernel panic on shutdown due to messed resources:
https://bugs.freenas.org/issues/25237.

MFC after:	1 week
2017-07-30 15:19:07 +00:00
scottl
c6ab5f700c Split the interrupt setup code into two parts: allocation and configuration.
Do the allocation before requesting the IOCFacts message.  This triggers
    the LSI firmware to recognize the multiqueue should be enabled if available.
    Multiqueue isn't used by the driver yet, but this also fixes a problem with
    the cached IOCFacts not matching latter checks, leading to potential problems
    with error recovery.

    As a side-effect, fetch the driver tunables as early as possible.

Reviewed by:	slm
Obtained from:	Netflix
Differential Revision:	D9243
2017-07-30 06:53:58 +00:00
delphij
8bf5aa5f5f Bump copyright year.
MFC after:	3 days
2017-07-30 06:27:32 +00:00
ian
54aa7612c7 Add the i2c RTC drivers found on various arm systems. 2017-07-30 00:25:29 +00:00
ian
130f5b5d26 Move the device descriptions onto the device lines, so they cut and paste
nicely into other config files.
2017-07-30 00:24:15 +00:00
ian
06ae04e093 Add a few missing i2c devices that build fine on all arches. 2017-07-30 00:01:31 +00:00
ian
f6f6b37562 Fix building this driver on non-FDT platforms. 2017-07-30 00:00:30 +00:00
ian
50c47dc328 Replace the pcf8563 i2c RTC driver with a new nxprtc driver which handles
all the chips in the NXP PCA212x and PCA/PCF85xx series.  In addition to
supporting more chips, this driver uses the countdown timer on the chips as
a fractional seconds counter, giving it a resolution of about 15 milliseconds.
2017-07-29 23:45:57 +00:00
cem
526221c070 kldstat: Use sizeof in place of named constants for sizing
No functional change.

This is handy for FreeBSD derivatives that want to modify the value of
MAXPATHLEN, but not the kld_file_stat ABI.

Submitted by:	Siddhant Agarwal <sagarwal AT isilon.com>
Sponsored by:	Dell EMC Isilon
2017-07-29 23:31:21 +00:00
rmacklem
396df4f3c4 Add kernel support for the NFS client forced dismount "umount -N" option.
When an NFS mount is hung against an unresponsive NFS server, the "umount -f"
option can be used to dismount the mount. Unfortunately, "umount -f" gets
hung as well if a "umount" without "-f" has already been done. Usually,
this is because of a vnode lock being held by the "umount" for the mounted-on
vnode.
This patch adds kernel code so that a new "-N" option can be added to "umount",
allowing it to avoid getting hung for this case.
It adds two flags. One indicates that a forced dismount is about to happen
and the other is used, along with setting mnt_data == NULL, to handshake
with the nfs_unmount() VFS call.
It includes a slight change to the interface used between the client and
common NFS modules, so I bumped __FreeBSD_version to ensure both modules are
rebuilt.

Tested by:	pho
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11735
2017-07-29 19:52:47 +00:00
ian
ea2fa60ba9 Add inline functions to convert between sbintime_t and decimal time units.
Use them in some existing code that is vulnerable to roundoff errors.

The existing constant SBT_1NS is a honeypot, luring unsuspecting folks into
writing code such as long_timeout_ns*SBT_1NS to generate the argument for a
sleep call.  The actual value of 1ns in sbt units is ~4.3, leading to a
large roundoff error giving a shorter sleep than expected when multiplying
by the trucated value of 4 in SBT_1NS.  (The evil honeypot aspect becomes
clear after you waste a whole day figuring out why your sleeps return early.)
2017-07-29 17:00:23 +00:00
mav
952f923749 Fix IORDY bits definition.
According to the ATA specs, IORDYDIS should be bit 10, IORDY -- bit 11.

PR:		221049
Submitted by:	aaron.styx@baesystems.com
MFC after:	1 week
2017-07-29 13:54:28 +00:00
kp
93f7d03418 vtnet: Support jumbo frames without TSO/GSO
Currently in Virtio driver without TSO/GSO features enabled, the max scatter
gather segments for the TX path can be 4, which limits the support for 9K JUMBO
frames. 9K JUMBO frames results in more than 4 scatter gather segments and
virtio driver fails to send the frame down to host OS. With TSO/GSO feature
enabled max scatter gather segments can be 64, then 9K JUMBO frames are fine,
this is making virtio driver to support JUMBO frames only with TSO/GSO.

Increasing the VTNET_MIN_TX_SEGS which is the case for non TSO/GSO to 32 to
support upto 64K JUMBO frames to Host.

Submitted by:	Lohith Bellad <lohithbsd@gmail.com>
Reviewed by:	adrian
Differential Revision:	https://reviews.freebsd.org/D8803
2017-07-29 09:22:48 +00:00
rmacklem
0d5371c8ce Fix possible crash for the NFSv4.1 pNFS client.
If the nfsrpc_createlayoutrpc() call in nfsrpc_getcreatelayout() fails,
the code used nfhpp when it might be set NULL. This patch checks for
the error cases (laystat != 0) and avoids using nfhpp for the failure case.
This would only affect NFSv4.1 mounts with the "pnfs" option.
Found while testing the "umount -N" patch not yet in head.

MFC after:	2 weeks
2017-07-29 02:25:49 +00:00
np
2b7dbb8004 cxgbe/iw_cxgbe: Log the end point's history and flags to the trace
buffer just before it's freed.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-07-28 22:28:45 +00:00
jkim
f0742b64fe Merge ACPICA 20170728. 2017-07-28 22:23:29 +00:00
mw
bb4af4519f Fix remapping VM attributes on Armada 38x
pmap_remap_vm_attr() function requires indexes to
pte2_attr_tab as the arguments (VM_MEMATTR_).
Mistakenly, instead of them, actual values from the
table were used (PTE2_ATTR_), when applying
work-around for Marvell Armada 38x SoCs.

Submitted by: Marcin Wojtas (mw@semihalf.com)
Reported by: Rafal Kozik (rk@semihalf.com)
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11704
2017-07-28 11:51:55 +00:00
loos
9d6de9853e Remove the unused mutex since r273220.
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-07-28 04:41:57 +00:00
markj
9e763423e4 Fix style bugs in ksyms.c.
No functional change intended.

MFC after:	3 days
2017-07-28 03:18:18 +00:00
markj
d74f88f14e Restrict permissions on /dev/ksyms to 0400.
The ksyms(4) device was added specifically for use by lockstat(1), which
as a DTrace consumer must run as root.

Discussed with:	emaste
MFC after:	3 days
2017-07-28 03:14:31 +00:00
adrian
61fb46d666 [ar71xx] get rid of ath_pci - it's built as a module now. 2017-07-28 01:17:38 +00:00
zbb
90fde46fef Fix TEX index acquisition using L2 attributes
The TEX index is selected using (TEX0 C B) bits
from the L2 descriptor. Use correct index by masking
and shifting those bits accordingly.

Differential Revision:	https://reviews.freebsd.org/D11703
2017-07-27 23:14:17 +00:00
sbruno
5728c412f0 Drop IXL RX lock during TCP_LRO, fixes LOR mahem while holding the RX
queue lock when the uppoer stack is called inside TCP_LRO

Submitted by:	Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by:	erj
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11724
2017-07-27 23:01:07 +00:00
sbruno
f2cee7313a Slight restructure of iflib_busdma_load_mbuf_sg() to fix accounting
when m_collapse() fails.

Submitted by:	krzystof.galazka@intel.com
Reviewed by:	Jeb Cramer <cramerj@intel.com>
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D11476
2017-07-27 22:53:47 +00:00
sbruno
4100e547e2 Deprecate unused int isc_max_txqsets and int isc_max_rxqsets as they
were redundant and not being used to set anything up.

Submitted by:	Matt Macy <mmacy@mattmacy.io>
Reported by:	Jeb Cramer <cramerj@intel.com>
Sponsored by:	Limelight Networks
2017-07-27 21:21:43 +00:00
rmacklem
5bbbf9ea35 Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing.
This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".

Suggested by:	kib
MFC after:	2 weeks
2017-07-27 20:55:31 +00:00
kib
380834e269 Make it possible to request nosys logging to console.
New kern.lognosys values are
1 - log to ctty
2 - log to console
3 - log to both.

Inspired by:	eugen
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-07-27 20:45:41 +00:00
manu
0a91cea96f Allwinner A64: fix typo
'pll_ddr0' is the dram parent, not 'pll_ddr'
2017-07-27 17:51:51 +00:00
kib
d55176ddf0 Make the number of children for pctrie node available outside subr_pctrie.c.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11435
2017-07-27 16:40:14 +00:00
ken
3a62e7df63 Remove duplicate assignments from r321622.
Submitted by:	mav
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-07-27 15:51:56 +00:00
ken
8cdbb83221 Fix probing FC targets with hard addressing turned on.
This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.

The intent of that change was to keep loop IDs persistent across
chip reinits.

The problem is that the change turned on the PREVLOOP /
PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
Qlogic chip to not participate in the loop if it can't get the
requested loop address.  It also turned off soft addressing on 2400
(4Gb) and newer controllers.

The isp(4) driver defaults to loop address 0, and the tape drives
I have tested default to loop address 0 if hard addressing is turned
on.  So when hard loop addressing is turned on on the drive, the isp(4)
driver just refuses to participate in the loop.

The solution is to largely revert that change.  I left some elements
in place that are related to virtual ports, since they were new.

This does work with IBM tape drives with hard and soft addressing
turned on.  I have tested it with 4Gb, 8Gb, and 16Gb controllers.

sys/dev/isp.c:
	Largely revert FreeBSD SVN change 289937.  I left the
	ispmbox.h changes in place.

	Don't use the PREV_ADDRESS bit on initialization.  It tells
	the chip to not participate if it can't get the requested
	loop ID.

	Do use soft addressing on 2400 and newer chips.

	Use hard addressing when the user has requested a specific
	initiator ID.  (hint.isp.X.iid=N in /boot/loader.conf)

	Leave some of the virtual port options from that change in
	place, but don't turn on the PREV_ADDRESS bit.

Reviewed by:	mav
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-07-27 15:33:57 +00:00
andrew
610c4279af Always set the receive mask in loader.efi. Some UEFI implementations set
this to be too restrictive. We need to have both broadcast and unicast
enabled for loader to work. Set them in all cases to ensure this is true.

This allows the Cavium ThunderX 2s in the netperf cluster to netboot using
a USB NIC.

PR:		221001
Reviewed by:	emaste, tsoome
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11732
2017-07-27 15:06:34 +00:00
bz
a0dcb7af20 After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).

Reviewed by:		np
Differential Revision:	https://reviews.freebsd.org/D11448
2017-07-27 13:03:36 +00:00
mav
d03d7f8708 adaasync(): Set ADA_STATE_WCACHE based on ADA_FLAG_CAN_WCACHE
The attached patch lets adaasync() set ADA_STATE_WCACHE based on
ADA_FLAG_CAN_WCACHE instead of ADA_FLAG_CAN_RAHEAD.

This fixes a regression introduced in r300207 which changed
the flag names.

PR:		220948
Submitted by:	Fabian Keil <fk@fabiankeil.de>
Obtained from:	ElectroBSD
MFC after:	1 week
2017-07-27 07:28:29 +00:00
emaste
37e93046cb uart: add AX99100 chipset support
PR:		215837
Submitted by:	joe@thrallingpenguin.com
MFC after:	2 weeks
2017-07-27 02:53:18 +00:00
loos
555ebaa3dd Fix the port vlan support in e6000 based switches.
Reduce the use of local copies of switch register data.

The switch now works with the upstream dsa node (i.e. the upstream DTS).

Tested on:	ClearFog Pro (88E6176), SG-3100 (88E6141)
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-07-27 02:38:53 +00:00
marius
ed7adc5aa8 - Check the slot type capability, set SDHCI_SLOT_{EMBEDDED,NON_REMOVABLE}
for embedded slots. Fail in the sdhci(4) initialization for slot type
  shared, which is completely unsupported by this driver at the moment. [1]
  For Intel eMMC controllers, taking the embedded slot type into account
  obsoltes setting SDHCI_QUIRK_ALL_SLOTS_NON_REMOVABLE so remove these quirk
  entries.
- Hide the 1.8 V VDD capability when the slot is detected as non-embedded,
  as the SDHCI specification explicitly states that 1.8 V VDD is applicable
  to embedded slots only. [2]
- Define some easy bits of the SDHCI specification v4.20. [3]
- Don't leak bus_dma(9) resources in failure paths of sdhci_init_slot().

Obtained from:	DragonFlyBSD 65704a46 [1], 7ba10b88 [2], 0df14648 [3]
2017-07-26 22:04:23 +00:00
marius
9353e11f71 Correctly use the size of a pointer rather than that of a pointer to a
pointer.

Reported by:	Coverity
CID:		1378432
2017-07-26 21:59:37 +00:00
emaste
859cc79a19 cc_cubic: restore braces around if-condition block
r307901 was reverted in r321480, restoring an incorrect block
delimitation bug present in the original cc_cubic commit. Restore
only the bugfix (brace addition) from r307901.

CID:		1090182
Approved by:	sbruno
2017-07-26 21:23:09 +00:00
ian
884a91be7d Add a debug sysctl that lets you see i2c bus traffic through this device. 2017-07-26 21:20:57 +00:00
ian
1eb0f3d2f0 Add support for tracking nested calls to iicbus_request/release_bus().
Usually it is sufficient to use iicbus_transfer_excl(), or one of the
higher-level convenience functions that use it, to reserve the bus for the
duration of each register access.  Occasionally it is important that a
series of accesses or read-modify-write operations must be done without any
other intervening access to the device, to prevent corrupting state.

Without support for nested request/release, slave device drivers would have
to stop using high-level convenience functions and resort to working with
arrays of iic_msg structs just for a few operations (often involving
one-time device setup or infrequent configuration changes).

The changes here appear large from a glance at the diff, but in fact they're
nearly trivial, and the large diff is because of changes in indentation and
the re-wrapping of comments caused by that.  One notable change is that
iicbus_release_bus() now ignores the IICBUS_CALLBACK(IIC_RELEASE_BUS) return
value.  The old error handling left the bus in a kind of limbo state where
it was still owned at the iicbus layer, but drivers rarely check the return
of the release call, and it's unclear what they would do to recover from an
error return anyway.  No existing low-level drivers return any kind of error
from IIC_RELEASE_BUS except one EINVAL for "you don't own the bus", to which
the right response is probably to carry on with the process of releasing the
reference to the bus anyway.
2017-07-26 21:06:26 +00:00
ian
7b74ced64a Add a pair of convenience routines for doing simple "register" read/writes
on i2c devices, where the "register" can be any length.

Many (perhaps most) common i2c devices are organized as a collection of
(usually 1-byte-wide) registers, and are accessed by first writing a 1-byte
register index/offset number, then by reading or writing the data.
Generally there is an auto-increment feature so the when multiple bytes
are read or written, multiple contiguous registers are accessed.

Most existing slave device drivers allocate an array of iic_msg structures,
fill in all the transfer info, and invoke iicbus_transfer().  These new
functions commonize all that and reduce register access to a simple call
with a few arguments.
2017-07-26 20:40:24 +00:00
np
b027bca0d6 cxgbe(4): Some updates to the common code.
- Updated register ranges.
- Helper routines for access to TP registers.
- Updated routine to read flash parameters.

Obtained from:	Chelsio Communications
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2017-07-26 20:20:58 +00:00
kib
8fd2052d87 Mark pages after EOF as clean after pageout.
Suppose that a file on NFS has partially filled last page, and this
page is dirty.  NFS VOP_PAGEOUT() method only marks the the page clean
up to the block of the last written byte, leaving other blocks dirty.
Also any page which erronously exists in the vnode vm_object past EOF
is also left marked as dirty.

With the introduction of the buf-cache coherent pager, each pass of
syncer over the object with such page results in creation of B_DELWRI
buffer due to VOP_WRITE() call.  This buffer is noted on next syncer
pass, which results e.g. a visible manifestation of shutdown never
finishing vnode sync.  Note that before buf-cache coherency commit, a
dirty page might left never synced to server if a partial writes
occur.

Fix this by clearing dirty bits after EOF.  Only blocks of the partial
page which are completely after EOF are marked clean, to avoid
possible user data loss.

Reported by:	mav
Reviewed by:	alc, markj
Tested by:	mav, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D11697
2017-07-26 20:07:05 +00:00
kib
a748399a51 Move rtvals initialization out of the region protected by NFS node
lock.

Noted by:	alc
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11697
2017-07-26 20:01:31 +00:00
andrew
2b18a5ba48 Pass the last exception trap frame to kdb_trap. This allows show registers
in ddb to show the traps registers, and not the registers from within the
panic call.

Sponsored by:	DARPA, AFRL
2017-07-26 17:39:10 +00:00
ed
feffd68602 Upgrade to the latest sources generated from the CloudABI specification.
The CloudABI specification has had some minor changes over the last half
year. No substantial features have been added, but some features that
are deemed unnecessary in retrospect have been removed:

- mlock()/munlock():

  These calls tend to be used for two different purposes: real-time
  support and handling of sensitive (cryptographic) material that
  shouldn't end up in swap. The former use case is out of scope for
  CloudABI. The latter may also be handled by encrypting swap.

  Removing this has the advantage that we no longer need to worry about
  having resource limits put in place.

- SOCK_SEQPACKET:

  Support for SOCK_SEQPACKET is rather inconsistent across various
  operating systems. Some operating systems supported by CloudABI (e.g.,
  macOS) don't support it at all. Considering that they are rarely used,
  remove support for the time being.

- getsockname(), getpeername(), etc.:

  A shortcoming of the sockets API is that it doesn't allow you to
  create socket(pair)s, having fake socket addresses associated with
  them. This makes it harder to test applications or transparently
  forward (proxy) connections to them.

  With CloudABI, we're slowly moving networking connectivity into a
  separate daemon called Flower. In addition to passing around socket
  file descriptors, this daemon provides address information in the form
  of arbitrary string labels. There is thus no longer any need for
  requesting socket address information from the kernel itself.

This change also updates consumers of the generated code accordingly.
Even though system calls end up getting renumbered, this won't cause any
problems in practice. CloudABI programs always call into the kernel
through a kernel-supplied vDSO that has the numbers updated as well.

Obtained from:	https://github.com/NuxiNL/cloudabi
2017-07-26 06:57:15 +00:00
kib
819a09c0c9 Mark name_PCTRIE_LOOKUP_LE() generated function unused.
The PCTRIE macro will be shortly applied in a situation where
LOOKUP_LE is not needed.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11435
2017-07-26 06:42:01 +00:00
adrian
4cc04baa58 [iwm] Sync rs (rate-selection) API definitions from Linux iwlwifi.
* While there clean up alignments and line wrapping in existing
  definitions for rs API in if_iwmreg.h

Obtained from:	dragonflybsd.git 085e37a042bdb17081e495e46919359ce43aa118
2017-07-26 05:52:37 +00:00
adrian
eeb25960ec [iwm] Add iwm_mvm_send_lq_cmd() from Linux iwlwifi to if_iwm_util.c.
Obtained from:	dragonflybsd.git 8a5dd7783e407856754093f5b1c9c757c64534b7
2017-07-26 05:51:31 +00:00
adrian
77854ca634 [iwm] Sync statistics API definitions with Linux iwlwifi.
Obtained from:	dragonflybsd.git 75895a53a9c1ba60d75be9b4bf6e49a37f91a7cf
2017-07-26 05:40:52 +00:00