Commit Graph

126937 Commits

Author SHA1 Message Date
Hans Petter Selasky
c2a1e80706 Ticks are integer type in FreeBSD.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:36:32 +00:00
Hans Petter Selasky
a005c157e4 Configure firmware to use RX hash format in mini CQE in mlx5en(4).
When using CQE zipping, one can choose between RX hash and Checksum.
This will indicate the parameter on which a zipping session should be
stopped.

While porting the Linux code, Checksum was chosen. However, the value
of Checksum is not being used anywhere.
For the FreeBSD driver, we prefer to use the RX hash format which will
guarantee the RX hash value for all the mini CQEs.
While at it, make sure to initialize the Checksum value in the
decompressed CQE.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:55 +00:00
Hans Petter Selasky
d52ffcb71c Disable CQE zipping by default in mlx5en(4).
After doing performance measurements, it seems like CQE zipping doesn't
have any significant benefit.
Moreover, we know that this feature is disabled by default on other
operating systems (Linux for example).

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:35 +00:00
Hans Petter Selasky
f0dcb8dff5 Split mlx5e_update_stats_work() in mlx5en(4).
Split the function into the mlx5e_update_stats_locked() core and make
mlx5e_update_stats_work() call the _locked helper, similar to many other
places in the kernel. This improves the code structure, making the
locking clean.

Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:14 +00:00
Hans Petter Selasky
91f13f8368 Implement fast close of RX channel in mlx5en(4).
Instead of waiting for all jobs to be cancelled, simply close the completion
queue to prevent more completion events and let mlx5e_destroy_rq() cleanup
the remaining mbufs.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:34:42 +00:00
Hans Petter Selasky
243853215d Correct number of elements for priority to traffic class mappings in mlx5en(4).
The number of priorities is always 8, while the number of traffic classes
supported can vary. While at it convert the sysctl node into an array.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:34:14 +00:00
Hans Petter Selasky
ffadb62f20 Remove unused module parameter in mlx5ib.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:33:29 +00:00
Hans Petter Selasky
6428c27faf Make sure to error out when arming the CQ fails in mlx4ib and mlx5ib.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:33:09 +00:00
Hans Petter Selasky
38f38e9fda Make sure to error out when arming the CQ fails in ibcore.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:45 +00:00
Hans Petter Selasky
069963d772 Destroy port stats debug context in correct order in mlx5en(4).
Destroy children nodes before parent nodes.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:22 +00:00
Hans Petter Selasky
c66537d7b2 Fix tx_jumbo_packets counter in mlx5en(4).
Instead of reading Ethernet RFC 2819 pXtoYoctets counters from
hardware which counts RX octets, count tx_stat_pXtoYoctets from
Ethernet extended counters which counts TX octets.

TX jumbo counters should be accumulated only after the PPCNT
counters were fetched from hardware with their latest value.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:03 +00:00
Hans Petter Selasky
bcfad02593 Update Ethernet extended counters in mlx5en(4).
Expose all Ethernet extended counters those counters via debug_stats
sysctl:
dev.mce.X.debug_stats

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:31:32 +00:00
Hans Petter Selasky
5169fb81ca Protect from infinite sw-reset loop in mlx5core.
Avoid an infinite software firmware reset loop that may be caused by a
hardware bug by limiting the maximum number of resets.
The counter between resets is reset by request for reset, and not by a
successful reset.
The interval between two resets can be configured via sysctl:
hw.mlx5.sw_reset_timeout
which is global to all mlx5 devices in the system.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:30:47 +00:00
Hans Petter Selasky
192fc18d49 Disable all MSIX interrupts before shutdown in mlx5.
Make sure the interrupt handlers don't race with the fast unload one
code in the shutdown handler.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:30:18 +00:00
Hans Petter Selasky
a3a31fde6d Import Linux code to implement mlx5_ib_disassociate_ucontext() in mlx5ib.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:29:45 +00:00
Hans Petter Selasky
983026ea83 Add temperature warning event to log in mlx5core.
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.

Linux commit:
1865ea9adbfaf341c5cd5d8f7d384f19948b2fe9

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:28:18 +00:00
Hans Petter Selasky
7646dc2347 Correctly define the interface state bits in mlx5en(4).
While at it remove unused interface state bits. This also fixes and issue
during shutdown:

There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.

The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.

Linux commit:
10a8d00707082955b177164d4b4e758ffcbd4017
b3cb5388499c5e219324bfe7da2e46cbad82bfcf

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:27:29 +00:00
Hans Petter Selasky
e5eae1dc7d Enable FPGA and FPGA QP errors for EQ and call the handler in mlx5core.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:26:33 +00:00
Hans Petter Selasky
c322dbafd5 Add MLX5_FPGA_RELOAD IOCTL(2) to mlx5fpga.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:25:14 +00:00
Hans Petter Selasky
423530be04 Add support for Dynamic Interrupt Moderation, DIM, in mlx5en(4).
Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.

Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:23:33 +00:00
Marius Strobl
007b804fc7 Allow to build without INET and INET6 again after r347221.
Submitted by:	cam
2019-05-08 09:03:43 +00:00
Xin LI
c9083b850a Move contrib/zlib to sys/contrib/zlib so that we can use it in kernel.
This is a prerequisite of unifying kernel zlib instances.

Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20191
2019-05-08 08:43:15 +00:00
Justin Hibbits
7d91f528a6 powerpc: hide innocuous printf behind bootverbose
NUMA associativity, and OFW node existence, is completely optional, and
shouldn't warn always.
2019-05-08 03:15:22 +00:00
Kyle Evans
251a32b5b2 tun/tap: merge and rename to tuntap
tun(4) and tap(4) share the same general management interface and have a lot
in common. Bugs exist in tap(4) that have been fixed in tun(4), and
vice-versa. Let's reduce the maintenance requirements by merging them
together and using flags to differentiate between the three interface types
(tun, tap, vmnet).

This fixes a couple of tap(4)/vmnet(4) issues right out of the gate:
- tap devices may no longer be destroyed while they're open [0]
- VIMAGE issues already addressed in tun by kp

[0] emaste had removed an easy-panic-button in r240938 due to devdrn
blocking. A naive glance over this leads me to believe that this isn't quite
complete -- destroy_devl will only block while executing d_* functions, but
doesn't block the device from being destroyed while a process has it open.
The latter is the intent of the condvar in tun, so this is "fixed" (for
certain definitions of the word -- it wasn't really broken in tap, it just
wasn't quite ideal).

ifconfig(8) also grew the ability to map an interface name to a kld, so
that `ifconfig {tun,tap}0` can continue to autoload the correct module, and
`ifconfig vmnet0 create` will now autoload the correct module. This is a
low overhead addition.

(MFC commentary)

This may get MFC'd if many bugs in tun(4)/tap(4) are discovered after this,
and how critical they are. Changes after this are likely easily MFC'd
without taking this merge, but the merge will be easier.

I have no plans to do this MFC as of now.

Reviewed by:	bcr (manpages), tuexen (testing, syzkaller/packetdrill)
Input also from:	melifaro
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D20044
2019-05-08 02:32:11 +00:00
Conrad Meyer
2cb54a800c random: x86 driver: Prefer RDSEED over RDRAND when available
Per
https://software.intel.com/en-us/blogs/2012/11/17/the-difference-between-rdrand-and-rdseed
, RDRAND is a PRNG seeded from the same source as RDSEED.  The source is
more suitable as PRNG seed material, so prefer it when the RDSEED intrinsic
is available (indicated in CPU feature bits).

Reviewed by:	delphij, jhb, imp (earlier version)
Approved by:	secteam(delphij)
Security:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20192
2019-05-08 00:45:16 +00:00
Conrad Meyer
fce2d624ea vmm(4): Pass through RDSEED feature bit to guests
Reviewed by:	jhb
Approved by:	#bhyve (jhb)
MFC after:	2 leapseconds
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20194
2019-05-08 00:40:08 +00:00
Warner Losh
e7ef108cf4 Add missing newline to debug printf. 2019-05-08 00:09:10 +00:00
Michael Tuexen
132ea9f2ad Remove non-functional SCTP checksum offload support for virtio.
Checksum offloading for SCTP is not currently specified for virtio.
If the hypervisor announces checksum offloading support, it means TCP
and UDP checksum offload. If an SCTP packet is sent and the host announced
checksum offload support, the hypervisor inserts the IP checksum (16-bit)
at the correct offset, but this is not the right checksum, which is a CRC32c.
This results in all outgoing packets having the wrong checksum and therefore
breaking SCTP based communications.

This patch removes SCTP checksum offloading support from the virtio
network interface.

Thanks to Felix Weinrank for making me aware of the issue.

Reviewed by:		bryanv@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D20147
2019-05-07 20:28:12 +00:00
Edward Tomasz Napierala
faf2fa21d7 Support PTRACE_GETREGSET w/ NT_PRSTATUS in Linux ptrace(2).
While Linux strace(1) doesn't strictly require it - it has a fallback
to PTRACE_GETREGS - it's a newer interface, so we better support it
before the old one is deprecated.

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20152
2019-05-07 19:06:41 +00:00
Ed Maste
0e26cd440f make sysent after r347228
Regenerate to add @generated tag in generated files.
2019-05-07 18:10:21 +00:00
Conrad Meyer
7d7db5298d device_printf: Use sbuf for more coherent prints on SMP
device_printf does multiple calls to printf allowing other console messages to
be inserted between the device name, and the rest of the message.  This change
uses sbuf to compose to two into a single buffer, and prints it all at once.

It exposes an sbuf drain function (drain-to-printf) for common use.

Update documentation to match; some unit tests included.

Submitted by:	jmg
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16690
2019-05-07 17:47:20 +00:00
Ed Maste
5350e15d0d makesyscalls: use @generated tag in generated files
Multiple tools use @generated to identify generated files (for example,
in a review Phabricator will by default hide diffs in generated files).
Use the @generated tag in makesyscalls.sh as we've done for other
generated files.

Reviewed by:	cem
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20183
2019-05-07 16:17:33 +00:00
Mark Johnston
7d43b5c98e Simplify the test against maxproc in fork1().
Previously nprocs_new would be tested against maxprocs twice when
nprocs_new < maxprocs - 10.  Eliminate the unnecessary comparison.

Submitted by:	Wuyang Chung <wuyang.chung1@gmail.com>
GitHub PR:	https://github.com/freebsd/freebsd/pull/397
MFC after:	1 week
2019-05-07 15:03:26 +00:00
Ruslan Bukin
bf03b1f1f9 Disable interrupts first and then set spinlock_count to 1.
Otherwise interrupt can be generated just after setting spinlock_count
and before disabling interrupts.

Sponsored by:	DARPA, AFRL
2019-05-07 14:32:17 +00:00
Ruslan Bukin
75cf8837a9 Provide a template for busdma code for RISC-V.
RISC-V ISA specifies no cache management instructions so leave cache
operations in cpufunc.h as no-op for now.

Note some new hardware comes with their own memory-mapped cache
management controller.

Tested on HiFive Unleashed board with cgem(4).

Reviewed by:	markj
Obtained from:	arm64
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D20126
2019-05-07 13:41:43 +00:00
Ed Maste
6e0e532a90 Use @generated tag in generated files
Multiple tools use @generated to identify generated files (for example,
in a review Phabricator will by default hide diffs in generated files).
Use the @generated tag in makeobjops.awk and vnode_if.awk as we've done
for other generated files.

Sponsored by:	The FreeBSD Foundation
2019-05-07 13:04:26 +00:00
Marius Strobl
ca2ebb27ed o Avoid determining the MAC class (LEM/EM or IGB) - possibly even multiple
times - on every interrupt by using an own set of device methods for the
  IGB class. This translates to introducing igb_if_intr_{disable,enable}()
  and igb_if_{rx,tx}_queue_intr_enable() with that IGB-specific code moved
  out of their EM counterparts and otherwise continuing to use the EM IFDI
  methods also for IGB.
  Note that igb_if_intr_{disable,enable}() also issue E1000_WRITE_FLUSH as
  lost with the conversion of igb(4) to iflib(4).
  Also note, that the em_if_{disable,enable}_intr() methods are renamed to
  em_if_intr_{disable,enable}() for consistency with the names used in the
  interface declaration.
o In em_intr():
  - Don't bother to bail out if the interrupt type is "legacy", i. e. INTx
    or MSI, as iflib(4) doesn't use ift_legacy_intr methods for MSI-X. All
    other iflib(4)-based drivers avoid this check, too.
  - Given that only the MSI-X interrupts have one-shot behavior (by taking
    advantage of the EIAC register), explicitly disable interrupts. Hence,
    em_intr() now matches what {em,igb}_irq_fast() previously did (in case
    of igb(4) supposedly also to work around MSI message reordering errata
    on certain systems).
o In em_if_intr_disable():
  - Clear the EIAC register unconditionally for 82574 and not just in case
    of MSI-X, matching em_if_intr_enable() and bringing back the last hunk
    of r206437 lost with the iflib(4) conversion.
  - Write to EM_EIAC for clearing said register instead of to the IGB-only
    E1000_EIAC used ever since the iflib(4) conversion.

Reviewed by:	shurd
Differential Revision:	https://reviews.freebsd.org/D20176
2019-05-07 08:31:54 +00:00
Marius Strobl
3d10e9ed62 o Use iflib_fast_intr_rxtx() also for "legacy" interrupts, i. e. INTx and
MSI. Unlike as with iflib_fast_intr_ctx(), the former will also enqueue
  _task_fn_tx() in addition to _task_fn_rx() if appropriate, bringing TCP
  TX throughput of EM-class devices on par with the MSI-X case and, thus,
  close to wirespeed/pre-iflib(4) times again. [1]
  Note that independently of the interrupt type, the UDP performance with
  these MACs still is abysmal and nowhere near to where it was before the
  conversion of em(4) to iflib(4).
o In iflib_init_locked(), announce which free list failed to set up.
o In _task_fn_tx() when running netmap(4), issue ifdi_intr_enable instead
  of the ifdi_tx_queue_intr_enable method in case of a "legacy" interrupt
  as the latter is valid with MSI-X only.
o Instead of adding the missing - and apparently convoluted enough that a
  DBG_COUNTER_INC was put into a wrong spot in _task_fn_rx() - checks for
  ifdi_{r,t}x_queue_intr_enable being available in the MSI-X case also to
  iflib_fast_intr_rxtx(), factor these out to iflib_device_register() and
  make the checks fail gracefully rather than panic. This avoids invoking
  the checks at runtime over and over again in iflib_fast_intr_rxtx() and
  _task_fn_{r,t}x() - even if it's just in case of INVARIANTS - and makes
  these functions more readable.
o In iflib_rx_structures_setup(), only initialize LRO resources if device
  and driver have LRO capability in order to not waste memory. Also, free
  the LRO resources again if setting them up fails for one of the queues.
  However, don't bother invoking iflib_rx_sds_free() in that case because
  iflib_rx_structures_setup() doesn't call iflib_rxsd_alloc() either (and
  iflib_{device,pseudo}_register() will issue iflib_rx_sds_free() in case
  of failure via iflib_rx_structures_free(), but there definitely is some
  asymmetry left to be fixed, though).
o Similarly, free LRO resources again in iflib_rx_structures_free().
o In iflib_irq_set_affinity(), handle get_core_offset() errors gracefully
  instead of panicing (but only in case of INVARIANTS). This is a follow-
  up to r344132, as such driver bugs shouldn't be fatal.
o Likewise, handle unknown iflib_intr_type_t in iflib_irq_alloc_generic()
  gracefully, too.
o Bring yet more sanity to iflib_msix_init():
  - If the device doesn't provide enough MSI-X vectors or not all vectors
    can be allocate so the expected number of queues in addition to admin
    interrupts can't be supported, try MSI next (and then INTx) as proper
    MSI-X vector distribution can't be assured in such cases. In essence,
    this change brings r254008 forward to iflib(4). Also, this is the fix
    alluded to in the commit message of r343934.
  - If the MSI-X allocation has failed, don't prematurely announce MSI is
    going to be used as the latter in fact may not be available either.
  - When falling back to MSI, only release the MSI-X table resource again
    if it was allocated in iflib_msix_init(), i. e. isn't supplied by the
    driver, in the first place.
o In mp_ndesc_handler(), handle unknown type arguments gracefully, too.

PR:		235031 (likely) [1]
Reviewed by:	shurd
Differential Revision:	https://reviews.freebsd.org/D20175
2019-05-07 08:28:35 +00:00
Dmitry Chagin
52a9e429c8 Remove wrong copyright line. Discussed with Carlos Neira.
Reported by:	Rodney W. Grimes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13656
2019-05-07 05:08:13 +00:00
Konstantin Belousov
078116a662 amd64: fix BUS_SPACE_MAXSIZE to 64bit max value.
Reviewed by:	jhb, tychon (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20154
2019-05-07 01:18:57 +00:00
Doug Moore
27d172bb12 The intention of the blist cursor is for the search for free blocks to
resume where the last search left off. Suppose that there are no free
blocks of size 32, but plenty of size 16. If we repeatedly request
size 32 blocks, fail, and retry with size 16 blocks, then the failures
all reset the cursor to the beginning of memory, making the 16 block
allocation use a first fit, rather than next fit, strategy.

This change has blist_alloc make a copy of the cursor for its own
decision making, and only updates the real blist cursor after a
successful allocation, making those 16 block searches behave like
next-fit searches.

Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20177
2019-05-06 22:12:15 +00:00
Marius Strobl
1722eeac95 - Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx.
- Remove the only ever written to ift_db_mtx_name member of struct iflib_txq.
- Remove the unused or only ever written to ifr_size, ifr_cq_pidx, ifr_cq_gen
  and ifr_lro_enabled members of struct iflib_rxq.
- Consistently spell DMA, RX and TX uppercase in comments, messages etc.
  instead of mixing with some lowercase variants.
- Consistently use if_t instead of a mix of if_t and struct ifnet pointers.
- Bring the function comments of _iflib_fl_refill(), iflib_rx_sds_free() and
  iflib_fl_setup() in line with reality.
- Judging problem reports, people are wondering what on earth messages like:
  "TX(0) desc avail = 1024, pidx = 0"
  are trying to indicate. Thus, extend this string to be more like that of
  non-iflib(4) Ethernet MAC drivers, notifying about a watchdog timeout due
  to which the interface will be reset.
- Take advantage of the M_HAS_VLANTAG macro.
- Use false/true rather than FALSE/TRUE for variables of type bool.
- Use FALLTHROUGH as advocated by style(9).
2019-05-06 20:56:41 +00:00
Dmitry Chagin
4e2f69f1cf Adds sys/class/net devices to linsysfs.
Only two interfaces are created eth0 and lo and they expose
the following properties:
address, addr_len, flags, ifindex, mty, tx_queue_len and type.

Initial patch developed by Carlos Neira in 2017 and finished by me.

PR:		223722
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D13656
2019-05-06 20:01:13 +00:00
Dmitry Chagin
bbac65c772 Rewrite linux_ifflags() in more readable Linuxulator style.
Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20146
2019-05-06 19:57:51 +00:00
Dmitry Chagin
9c1437ae57 Complete r347052 (https://reviews.freebsd.org/D20137) as it it was not
a final revision.

Fix style issues and change bool-like variables from int to bool.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20141
2019-05-06 19:56:13 +00:00
Alexander Motin
8cb46437a7 Drop periph lock around cam_periph_unmapmem().
Since r345656 it may call copyout(), that may sleep.

MFC after:	3 days
Sponsored by:	iXsystems, Inc.
2019-05-06 19:08:03 +00:00
Dmitry Chagin
7c28c7e84f The build process generates assym.inc from genassym.o, so don't forget
to clean genassym.o

MFC after:	2 weeks
2019-05-06 18:46:42 +00:00
Conrad Meyer
6b6e2954dd List-ify kernel dump device configuration
Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails.  E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.

This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now.  It also does not implement
IPv6 netdump.

savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.

This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device.  Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.

As part of revving the ABI, unify netdump and disk dump configuration ioctl
/ structure, and leave room for ipv6 netdump as a future possibility.
Backwards-compatibility ioctls are added to smooth ABI transition,
especially for developers who may not keep kernel and userspace perfectly
synced.

Reviewed by:	markj, scottl (earlier version)
Relnotes:	maybe
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19996
2019-05-06 18:24:07 +00:00
Hans Petter Selasky
46068e86c3 Use PCIV_INVALID in pci_channel_offline() in the LinuxKPI.
Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:22:45 +00:00
Hans Petter Selasky
fa23397925 Disabling a PCI device should only disable busmaster in the LinuxKPI.
As Linux comment for this function point:
Signal to the system that the PCI device is not in use by the system
anymore. This only involves disabling PCI bus-mastering, if active.

Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:17:38 +00:00
Hans Petter Selasky
34cb771e01 Implement print_hex_dump_debug() function macro in the LinuxKPI.
Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		slavash@
Sponsored by:		Mellanox Technologies
2019-05-06 16:10:26 +00:00
Ed Maste
e0bfdf599d Reformat arm64 linux syscalls.master per current style
Equivalent to r339958 for sys/kern/syscalls.master.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14858
2019-05-06 16:07:14 +00:00
Hans Petter Selasky
4580f5eadd Allow controlling pr_debug at runtime in the LinuxKPI.
Turning on pr_debug at compile time make it non-optional at runtime.
This often means that the amount of the debugging is unbearable.
Allow developer to turn on pr_debug output only when needed.

Build tested drm-current-kmod prior to commit.

MFC after:		1 week
Submitted by:		kib@
Sponsored by:		Mellanox Technologies
2019-05-06 16:00:20 +00:00
Roger Pau Monné
b951b8f721 geom: fix initialization order
There's a race between the initialization of devsoftc.mtx (by devinit)
and the creation of the geom worker thread g_run_events, which calls
devctl_queue_data_f. Both of those are initialized at SI_SUB_DRIVERS
and SI_ORDER_FIRST, which means the geom worked thread can be created
before the mutex has been initialized, leading to the panic below:

 wpanic: mtx_lock() of spin mutex (null) @ /usr/home/osstest/build.135317.build-amd64-freebsd/freebsd/sys/kern/subr_bus.c:620
 cpuid = 3
 time = 1
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe003b968710
 vpanic() at vpanic+0x19d/frame 0xfffffe003b968760
 panic() at panic+0x43/frame 0xfffffe003b9687c0
 __mtx_lock_flags() at __mtx_lock_flags+0x145/frame 0xfffffe003b968810
 devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfffffe003b968840
 g_dev_taste() at g_dev_taste+0x463/frame 0xfffffe003b968a00
 g_load_class() at g_load_class+0x1bc/frame 0xfffffe003b968a30
 g_run_events() at g_run_events+0x197/frame 0xfffffe003b968a70
 fork_exit() at fork_exit+0x84/frame 0xfffffe003b968ab0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003b968ab0
 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
 KDB: enter: panic
 [ thread pid 13 tid 100029 ]
 Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

Fix this by initializing geom at SI_ORDER_SECOND instead of
SI_ORDER_FIRST.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kevans, markj
Differential revision:	https://reviews.freebsd.org/D20148
2019-05-06 09:48:34 +00:00
Konstantin Belousov
391918a3c1 Do not flush NFS node from NFS VOP_SET_TEXT().
The more appropriate place to do the flushing is VOP_OPEN().  This was
uncovered because VOP_SET_TEXT() is now called with the vnode'
vm_object rlocked, which is incompatible with the flush operations.

After the move, there is no need for NFS-specific VOP_SET_TEXT
overload.

Sponsored by:	The FreeBSD Foundation
MFC after:	30 days
2019-05-06 08:49:43 +00:00
Konstantin Belousov
12487941f4 Noted by: alc
Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	6 days
2019-05-06 08:46:11 +00:00
Tycho Nightingale
8d2a55ca67 zero inputs to vm_page_initfake() for predictable results
Reviewed by:	kib
Submitted by:	Anton Rang <rang at acm.org>
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20162
2019-05-06 00:57:05 +00:00
Justin Hibbits
2154a866b6 powerpc/booke: Use #ifdef __powerpc64__ instead of hw_direct_map in places
Since the DMAP is only available on powerpc64, and is *always* available on
Book-E powerpc64, don't penalize either side (32-bit or 64-bit) by always
checking hw_direct_map to perform operations.  This saves 5-10% time on
various ports builds, and on buildworld+buildkernel on Book-E hardware.

MFC after:	3 weeks
2019-05-05 20:23:43 +00:00
Justin Hibbits
bfd0787769 powerpc/booke: Fix size check for phys_avail in pmap bootstrap
Use the nitems() macro instead of the expansion, a'la r298352.  Also, fix the
location of this check to after initializing availmem_regions_sz, so that the
check isn't always against 0, thus always failing (nitems(phys_avail) is always
more than 0).
2019-05-05 20:05:50 +00:00
Alexander Motin
0404d5981d Decode some more ATA commands found in ACS-4.
MFC after:	1 week
2019-05-05 17:10:12 +00:00
Mark Johnston
9e56947ffc Ensure that error is initialized in ufs_bmap_seekdata().
Reported and tested by:	jhibbits
MFC with:	r346932
Sponsored by:	The FreeBSD Foundation
2019-05-05 16:57:03 +00:00
Alexander Motin
1aed499575 Decode Deallocate Logical Block Features.
MFC after:	1 week
2019-05-05 15:47:21 +00:00
Konstantin Belousov
78022527bb Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount.  To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change.  vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP.  vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:20:43 +00:00
Konstantin Belousov
7f1446052f Do not collapse objects with OBJ_NOSPLIT backing swap object.
NOSPLIT swap objects are not anonymous, they are used by tmpfs regular
files and POSIX shared memory.  For such objects, collapse is not
permitted.

Reported by:	mjg
Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:06:19 +00:00
Konstantin Belousov
2d6b8546b7 imgact_elf: do not relock the text vnode if possible.
We unlock the vnode around malloc(M_WAITOK), to make it possible for
pagedaemon to flush vnode pages for us.  Instead of doing it
unconditionally, first try M_NOWAIT allocation, which typically
succeed.  Only on failure, unlock the vnode and retry with M_WAITOK.

Reviewed by:	markj, trasz
Tested by:	mjg, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19923
2019-05-05 11:04:01 +00:00
Adrian Chadd
b4967c9b6f [ath_rate_sample] Have the final attempted rate in 11n modes to be the lowest one.
Right now ath_rate_sample has a fixed rate schedule, rather than the minstrel_ht
style "best, good, most reliable" triplet.  So, if higher rates are tried then
it'll not fail back to a lower MCS rate in that transmission schedule.

This means that in low SNR situations it'll not easily drop to MCS0 unless enough
transmissions occur to allow rate control to eventually decide to drop; and if
it's TCP traffic it'll get slowed down because of packet loss.

It's worse for 2-stream and 3-stream rates; it doesn't ever fall back to lower
stream rates, and these higher stream rates required higher SNR to work.

So instead let's (for now?) have each of the 11n transmit rates use MCS0 as
the last attempt. ath_rate_sample will quickly see that rate succeeds more
and will move to it much quicker.

Testing:

* AR9344 (Wasp) - 2G STA mode
2019-05-05 06:32:40 +00:00
Adrian Chadd
7d450faa6f [ath] [ath_rate] Fix ANI calibration during non-ACTIVE states; start poking at rate control
These are some fun issues I've found with my upstairs wifi link at such a ridiculous
low signal level (like, < 5dB.)

* Add per-station tx/rx rssi statistics, in potential preparation to use that
  in the RX rate control.

* Call the rate control on each received frame to let it potentially use
  it as a hint for what rates to potentially use.  It's a no-op right now.

* Do ANI calibration during scan as well. The ath_newstate() call was disabling the
  ANI timer and only re-enabling it during transitions to _RUN.  This has the
  unfortunate side-effect that if ANI deafened the NIC because of interference
  and it disassociated, it wouldn't be reset and the scan would never hear beacons.

The ANI configuration is stored at least globally on some HALs and per-channel
on others.  Because of this a NIC reset wouldn't help; the ANI parameters would
simply be programmed back in.

Now, I have a feeling I also need to do this during AUTH/ASSOC too and maybe,
if I'm feeling clever, I need to reset the ANI parameters on a given channel
during a transition through INIT or if the VAP is destroyed/re-created.
However for now this gets me out of the immediate weeds with connectivity
upstairs (and thus I /can/ commit); I'll keep chipping away at tidying this
stuff up in subsequent commits.

Tested:

* AR9344 (Wasp), 2G STA mode
2019-05-05 04:56:37 +00:00
Conrad Meyer
665919aaaf x86: Implement MWAIT support for stopping a CPU
IPI_STOP is used after panic or when ddb is entered manually.  MONITOR/
MWAIT allows CPUs that support the feature to sleep in a low power way
instead of spinning.  Something similar is already used at idle.

It is perhaps especially useful in oversubscribed VM environments, and is
safe to use even if the panic/ddb thread is not the BSP.  (Except in the
presence of MWAIT errata, which are detected automatically on platforms with
known wakeup problems.)

It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0
(off).  This commit also introduces the tunable
"machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has
known errata, but may be set to "1" in loader.conf to signal that mwait
wakeup is broken on CPUs FreeBSD does not yet know about.

Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't
help bhyve hypervisors running FreeBSD guests.

Submitted by:   Anton Rang <rang AT acm.org> (earlier version)
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 20:34:26 +00:00
Konstantin Belousov
ecaed009a9 arm64: Properly restore PAN when done with userspace access in casueword.
Approved by:	andrew
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-05-04 19:40:30 +00:00
Mateusz Guzik
5408c6db4e sysv: get rid of fork/exit hooks if the code is compiled in
Sponsored by:	The FreeBSD Foundation
2019-05-04 19:05:30 +00:00
Mateusz Guzik
37d2b1f3e5 Annotate nprocs with __exclusive_cache_line
Sponsored by:	The FreeBSD Foundation
2019-05-04 19:04:17 +00:00
Kirk McKusick
44b193b09e Zero out the file directory entry metadata to reduce disk
scavenging disclosure.

Submitted by: David G. Lawrence <dg@dglawrence.com>
MFC after:    1 week
2019-05-04 18:00:57 +00:00
Conrad Meyer
83dc49beaf x86: Define pc_monitorbuf as a logical structure
Rather than just accessing it via pointer cast.

No functional change intended.

Discussed with:	kib (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 17:35:13 +00:00
Justin Hibbits
73a30b035e powerpc/mpc85xx: Attach MPC85xx PCI bus and root complex at the right pass
No signifcant change, just matches other PCI attachments, attaching at
BUS_PASS_BUS.

MFC after:	2 weeks
2019-05-04 16:24:43 +00:00
Ganbold Tsagaankhuu
07be3e3dc9 Add emmc clock definitions for Rockchip RK3399 SoC. 2019-05-04 10:48:44 +00:00
Hans Petter Selasky
442d12d89c Fix regression issue after r346645 in the LinuxKPI.
The S/G list must be mapped AS-IS without any optimisations.
This also implies that sg_dma_len() must be equal to sg->length.
Many Linux drivers assume this and this fixes some DRM issues.

Put the BUS DMA map pointer into the scatter-gather list to
allow multiple mappings on the same physical memory address.

The FreeBSD version has been bumped to force recompilation of
external kernel modules.

Sponsored by:		Mellanox Technologies
2019-05-04 09:47:01 +00:00
Hans Petter Selasky
8ec9f0282a Fix regression issue after r346645 in the LinuxKPI.
Properly handle error case when mapping DMA address fails.

Sponsored by:		Mellanox Technologies
2019-05-04 09:30:03 +00:00
Justin Hibbits
e280e2ea3d powerpc: Optimize padding in bus_dma_tag
Avoid 8 bytes of padding (2 noncontiguous ints).

Submitted by:	Brandon Bergren <git_bdragon.rtk0.net>
Differential Revision: https://reviews.freebsd.org/D20121
2019-05-04 02:45:24 +00:00
Justin Hibbits
5d67b612d0 powerpc: Merge all pmap struct definitions
Summary:
A few ports fail to build due to missing pmap-related definitions, which are
specific per-pmap type.  This tries to appease those ports, by merging all
pmaps together.

A future change will move the inline page directory out of the Book-E pmap,
to eliminate the last #ifdefs in pmap.h and complete the merge.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D20119
2019-05-04 02:34:28 +00:00
Kirk McKusick
0061238fb0 This update eliminates a kernel stack disclosure bug in UFS/FFS
directory entries that is caused by uninitialized directory entry
padding written to the disk. It can be viewed by any user with read
access to that directory. Up to 3 bytes of kernel stack are disclosed
per file entry, depending on the the amount of padding the kernel
needs to pad out the entry to a 32 bit boundry. The offset in the
kernel stack that is disclosed is a function of the filename size.
Furthermore, if the user can create files in a directory, this 3
byte window can be expanded 3 bytes at a time to a 254 byte window
with 75% of the data in that window exposed. The additional exposure
is done by removing the entry, creating a new entry with a 4-byte
longer name, extracting 3 more bytes by reading the directory, and
repeating until a 252 byte name is created.

This exploit works in part because the area of the kernel stack
that is being disclosed is in an area that typically doesn't change
that often (perhaps a few times a second on a lightly loaded system),
and these file creates and unlinks themselves don't overwrite the
area of kernel stack being disclosed.

It appears that this bug originated with the creation of the Fast
File System in 4.1b-BSD (Circa 1982, more than 36 years ago!), and
is likely present in every Unix or Unix-like system that uses
UFS/FFS. Amazingly, nobody noticed until now.

This update also adds the -z flag to fsck_ffs to have it scrub
the leaked information in the name padding of existing directories.
It only needs to be run once on each UFS/FFS filesystem after a
patched kernel is installed and running.

Submitted by: David G. Lawrence <dg@dglawrence.com>
Reviewed by:  kib
MFC after:    1 week
2019-05-03 21:54:14 +00:00
John Baldwin
c2b4cedd78 Emulate the "ADD reg, r/m" instruction (opcode 03H).
OVMF's flash variable storage is using add instructions when indexing
the variable store bootrom location.

Submitted by:	D Scott Phillips <d.scott.phillips@intel.com>
Reviewed by:	rgrimes
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19975
2019-05-03 21:48:42 +00:00
Kirk McKusick
ab2214d400 Simplify calculation of DIRECTSIZ. No functional change intended.
Suggested by: kib
MFC after:    1 week
2019-05-03 21:46:25 +00:00
Mark Johnston
bc79b41c40 Disallow excessively small times of day in clock_settime(2).
Reported by:	syzkaller
Reviewed by:	cem, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20151
2019-05-03 21:26:44 +00:00
Robert Watson
5c95417dad When MAC is enabled and a policy module is loaded, don't unconditionally
lock mac_ifnet_mtx, which protects labels on struct ifnet, unless at least
one policy is actively using labels on ifnets.  This avoids a global mutex
acquire in certain fast paths -- most noticeably ifnet transmit.  This was
previously invisible by default, as no MAC policies were loaded by default,
but recently became visible due to mac_ntpd being enabled by default.

gallatin@ reports a reduction in PPS overhead from 300% to 2.2% with this
change.  We will want to explore further MAC Framework optimisation to
reduce overhead further, but this brings things more back into the world
of the sane.

MFC after:	3 days
2019-05-03 20:38:43 +00:00
Matt Macy
e2621d9657 Allow iflib drivers to pass a pointer to their own ifmedia structure.
Tested by: emaste@

Differential Revision:	https://reviews.freebsd.org/D19946
2019-05-03 20:05:31 +00:00
Andrew Gallatin
35961dce98 Select lacp egress ports based on NUMA domain
This change creates an array of port maps indexed by numa domain
for lacp port selection. If we have lacp interfaces in more than
one domain, then we select the egress port by indexing into the
numa port maps and picking a port on the appropriate numa domain.

This is behavior is controlled by the new ifconfig use_numa flag
and net.link.lagg.use_numa sysctl/tunable (both modeled after the
existing use_flowid), which default to enabled.

Reviewed by:	bz, hselasky, markj (and scottl, earlier version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20060
2019-05-03 14:43:21 +00:00
Dmitry Chagin
d151344dbf In order to reduce duplication between MD parts of the Linuxulator
move bits that are MI out into the headers in compat/linux.
For that remove bogus _packed attribute from struct l_sockaddr
and use MI types for struct members.

And continue to move into the linux_common module a code that is
intended for both Linuxulator modules (both instruction set - 32 & 64 bit)
or for external modules like linsysfs or linprocfs.

To avoid header pollution introduce new sys/compat/linux_common.h header.

Reviewed by:	emaste
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20137
2019-05-03 08:42:49 +00:00
Edward Tomasz Napierala
967cbe64b1 Decode more CPU flags in cpuinfo.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20145
2019-05-03 08:27:03 +00:00
Doug Moore
64f8d2575a fls() should find the most significant bit of an int faster than a
linear search can, so use it to avoid a linear search in isqrt.

Approved by: kib (mentor), markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20102
2019-05-03 02:55:54 +00:00
Ed Maste
ce3da455e9 iflib: remove assertion that isc_capabilities is nonzero
It's atypical, but not invalid, for a driver to pass no capabilities.

Submitted by:	Gerald Aryeetey <aryeeteygerald_rogers.com>
Reviewed by:	shurd
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20142
2019-05-02 19:13:31 +00:00
Edward Tomasz Napierala
6c8cb13dd8 Fix flags in cpuinfo.
Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20139
2019-05-02 19:02:16 +00:00
Conrad Meyer
d6745408c7 Add a COMPAT_FREEBSD12 kernel option.
Use it wherever COMPAT_FREEBSD11 is currently specified, like r309749.

Reviewed by:	imp, jhb, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20120
2019-05-02 18:10:23 +00:00
Emmanuel Vadot
9acc2a6af6 dtb: Include RK3399 RockPro64 DTS in kernel build
The DTS for this board is already present in sys/gnu/dts/arm64/rockchip/
and just needs to be enabled.

Submitted by:	alex@wied.io
Differential Revision:	https://reviews.freebsd.org/D19823
2019-05-02 17:04:01 +00:00
Kyle Evans
2de4a7aa21 fdt: Fix installation of aarch64 dtb
r345519 rewrote parts of how we build .dtb, but mistakenly dropped the
vendor dir for aarch64.  Simply drop the :T for building ${DTB} in the
aarch64 case- it'll get applied at install-time as-needed, with :H:T for
determining the vendor dir.

Reported by:	manu
Tested by:	manu
Reviewed by:	manu
MFC after:	3 days
2019-05-02 16:56:03 +00:00
Emmanuel Vadot
5b1309542e arm64: Add support for NanoPI NEO2
Add overlay files and activate devicetree file for NanoPi NEO2 featuring
Allwinner H5 ARM64 core.
To enable sound, dma and codec drivers are enabled for build.

Submitted by:	Manuel Stühn (freebsdnewbie@freenet.de)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D20129
2019-05-02 12:56:13 +00:00
Dmitry Chagin
03ddf624e6 Remove unneeded includes.
MFC after:	2 week
2019-05-02 09:00:36 +00:00
Edward Tomasz Napierala
12f3888a98 Add sys/devices/system/cpu/{possible,present} to linsysfs(5).
That makes Linux lscpu(1) work.

Reviewed by:	dchagin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20131
2019-05-02 08:17:29 +00:00
Justin Hibbits
b4698b7a6c powerpc: Drop OPAL_HANDLE_HMI2 for now, to avoid panicking
It's possible for a Hypervisor Maintenance Interrupt (HMI) to occur while in
the pmap code, holding locks.  This can cause WITNESS to panic due to lock
errors in calling pmap_kextract().  Since we don't yet handle the flags
returned by OPAL_HANDLE_HMI2, just stop using it, so that we don't call into
pmap_kextract().

Reported by:	pkubaj
2019-05-02 03:39:03 +00:00
Andrew Turner
fa19730c61 Restore x18 in efi_arch_leave.
Some UEFI implementations trash this register and, as we use it as a
platform register, the kernel doesn't save it before calling into the UEFI
runtime services. As we have a copy in tpidr_el1 restore from there when
exiting the EFI environment.

PR:		237234, 237055
Reviewed by:	manu
Tested On:	Ampere eMAG
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Sponsored by:	Ampere Computing (hardware)
Differential Revision:	https://reviews.freebsd.org/D20127
2019-05-01 17:12:49 +00:00
Ruslan Bukin
adf208e786 Deactivate IRQ resource by calling to intr_deactivate_irq().
This is the part of INTRNG support that was missed.

Sponsored by:	DARPA, AFRL
2019-05-01 15:03:12 +00:00
Ganbold Tsagaankhuu
65f1fc3f3f Add a hw.model sysctl oid for arm64 which reports the CPU model similar to armv6/7.
Reviewed by:	andrew, manu
Differential Revision:	https://reviews.freebsd.org/D20123
2019-05-01 14:20:31 +00:00
Konstantin Belousov
19f5d9f27f Fix another race between vm_map_protect() and vm_map_wire().
vm_map_wire() increments entry->wire_count, after that it drops the
map lock both for faulting in the entry' pages, and for marking next
entry in the requested region as IN_TRANSITION. Only after all entries
are faulted in, MAP_ENTRY_USER_WIRE flag is set.

This makes it possible for vm_map_protect() to run while other entry'
MAP_ENTRY_IN_TRANSITION flag is handled, and vm_map_busy() lock does
not prevent it. In particular, if the call to vm_map_protect() adds
VM_PROT_WRITE to CoW entry, it would fail to call
vm_fault_copy_entry(). There are at least two consequences of the
race: the top object in the shadow chain is not populated with
writeable pages, and second, the entry eventually get contradictory
flags MAP_ENTRY_NEEDS_COPY | MAP_ENTRY_USER_WIRED with VM_PROT_WRITE
set.

Handle it by waiting for all MAP_ENTRY_IN_TRANSITION flags to go away
in vm_map_protect(), which does not drop map lock afterwards. Note
that vm_map_busy_wait() is left as is.

Reported and tested by:	pho (previous version)
Reviewed by:	Doug Moore <dougm@rice.edu>, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20091
2019-05-01 13:15:06 +00:00
Dmitry Chagin
5d520d7fab Follow the FreeBSD and implement PDEATH_SIG prctl ops in the Linuxulator.
It was first introduced in r163734 and missied by me in r283383.

MFC after:	1 week
2019-04-30 17:18:05 +00:00
Hans Petter Selasky
a6619e8d9c Reduce the number of mutexes after r346645 in the LinuxKPI.
Make function macro wrappers for locking and unlocking to ease readability.

No functional change.

Discussed with:		kib@, tychon@ and zeising@
Sponsored by:		Mellanox Technologies
2019-04-30 10:41:20 +00:00
Hans Petter Selasky
93a203ea65 Make the dma_pool structure private to the LinuxKPI similar to Linux.
No functional change.

Discussed with:		kib @
Sponsored by:		Mellanox Technologies
2019-04-30 09:38:22 +00:00
Hans Petter Selasky
5a637529ed Store a pointer to the device instead of the PCI device in the DMA pool
implementation in the LinuxKPI. This avoids use of container_of().

No functional change.

Discussed with:		kib @
Sponsored by:		Mellanox Technologies
2019-04-30 09:26:11 +00:00
Justin Hibbits
0af5d6f7d9 powerpc: Stop pretending we run on e500v1 cores
Unconditional writing to MAS7, which doesn't exist on the e500v1 core, in a
TLB miss handler has been in the code for several years now.  Since this has
gone unnoticed for so long, it's easily concluded that e500v1 is not in use
with FreeBSD.  Simplify the code path a bit, by unconditionally zeroing MAS7
instead of calling a subroutine to do it.
2019-04-30 03:45:46 +00:00
Justin Hibbits
7122ab6ed3 powerpc64: Fix switch panic from cpu_throw()
r18 is used to hold the old PCB flags, but cpu_throw doesn't populate r18
with PCB flags, since the old thread is gone.  This can lead to panics on
cores that don't have the registers guarded by these flags.
2019-04-29 22:37:35 +00:00
Mark Johnston
cc2c33dfb1 Optimize lseek(SEEK_DATA) on UFS.
This version fixes the problems identified in r345244.

Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19598
2019-04-29 22:05:26 +00:00
Alexander Motin
fb6a844704 ip multicast debug: fix strings vs defines
Turning on multicast debug made multicast failure worse
because the strings and #define values no longer matched
up.  Fix them, and make sure they stay matched-up.

Submitted by:	torek
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-04-29 18:09:55 +00:00
Leandro Lupori
508864649b [PPC64] Turn opal_flash.c into a device
This change makes it easier to enable/disable the inclusion of
OPAL flash in the kernel.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D20098
2019-04-29 16:50:33 +00:00
Ruslan Bukin
5a51e5e49d o Rewrite softdma_process_tx() of Altera SoftDMA engine driver
so it does not require a bounce buffer. The only need for this was
  to align the buffer address. Implement unaligned access and we don't
  need to copy data twice.
o Remove contigmalloc-based bounce buffer from xDMA code since it is
  not suitable for arbitrary memory provided by platform, which is
  sometimes a dedicated piece of memory that is not managed by OS at all.

Sponsored by:	DARPA, AFRL
2019-04-29 16:27:15 +00:00
Mark Johnston
8e7130a8a7 Stop checking TD_IDLETHREAD() in buffer cache routines.
These predicates are vestigal and cannot be true today.  For example,
idle threads are not allowed to acquire locks.

Also cache curthread in breada().

No functional change intended.

Reviewed by:	kib, mckusick
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20066
2019-04-29 13:23:32 +00:00
Andrey V. Elsukov
90ecb41fba Add IPv6 support for O_IPLEN opcode.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-04-29 09:33:16 +00:00
Justin Hibbits
e2e3e7d28e powerpc: Make OPAL root node probe at bus pass
This way its children can attach earlier if needed, and some subsystems are
attached earlier, like the asynchronous token management.

MFC after:	2 weeks
2019-04-29 01:10:57 +00:00
Konstantin Belousov
9891fa5592 Remove witness warning, same as r346351 for busdma_dmar.
bounce_bus_dmamap_create() does not sleep either.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2019-04-28 18:45:44 +00:00
Edward Tomasz Napierala
e52fba212d Make isp(4) suggest loading ispfw(4) when it fails to attach.
It cannot load it automatically at boot, because the root filesystem
is not there yet. An alternative would be adding ispfw(4) to GENERIC,
but it's an additional 1MB.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D19369
2019-04-28 15:08:57 +00:00
Cy Schubert
17e17a17cf Left justify a function header brace as it should be.
No functional change.

MFC after:	3 days
2019-04-28 04:05:43 +00:00
Justin Hibbits
d1d73b0e27 powerpc: Add support for additional FSCR-managed facilities
Add support to enable, save, and restore the following facilities:
* Target Address Register (bctar) -- seemingly just another register to
  branch to.
* Event-based branching -- an interrupt-like userspace event handler
  subsystem.
* Load-monitored facility -- A facility that allows monitoring a range of
  physical memory, and triggering an event on access.  Targeted to garbage
  collection software features.
2019-04-27 22:30:22 +00:00
Justin Hibbits
3eb5d5dd25 powerpc: Add SPR definitions for additional POWER8/POWER9 facilities
This only adds the new SPR definitions and the associated FSCR bits.  The
facilities themselves will be added in separate commits.
2019-04-27 19:32:33 +00:00
Justin Hibbits
8b7f0d83e6 powerpc64: Add the DSCR facility on POWER8 and later
The Data Stream Control Register (DSCR) is privileged on POWER7, but
unprivileged (different register) on POWER8 and later.  However, it's now
guarded by a new register, the Facility Status and Control Register, instead of
the MSR like other pre-existing facilities (FPU, Altivec).  The FSCR must be
managed explicitly, since it's effectively an extension of the MSR.

Tested by:	Brandon Bergren
2019-04-27 16:28:34 +00:00
Emmanuel Vadot
c711d88236 arm: allwinner: a10: Correct pin functions
PB20 and PB21 alternate function 1 is i2c2 not i2c1

Reported by:	Horiki Mori (yamori813@yahoo.co.jp)
PR:	 237401
MFC after:	1 week
2019-04-27 14:59:08 +00:00
Emmanuel Vadot
80e8fa0810 arm64: allwinner: ccu_de2: Remove H5 compatible
We don't have the display engine driver commited in FreeBSD yet so it is
useless to expose the clocks yet (and also it have not been tested on H5).

Reported by:	Manuel Stühn (freebsdnewbie@freenet.de)
PR:	 237571
MFC after:	1 week
2019-04-27 14:56:24 +00:00
Emmanuel Vadot
bfb92761dc arm64: allwinner: Add compatible strings for clock devices used on both Allwinner H3 and H5
Allwinner H3 and H5 share many internal components, that's why they can
use the same drivers.
This patch adds the compatible strings to enable clock drivers
probing on Allwinner NanoPI NEO2 device.

Tested on: NanoPi NEO2 (by submitter), OrangePi PC2 (by manu)
Submitted by:	Manuel Stühn (freebsdnewbie@freenet.de)
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D20069
2019-04-27 14:48:27 +00:00
Justin Hibbits
f074eff155 powerpc: Add POWER8NVL definition
The POWER8NVL (POWER8 NVLink) architecturally behaves identically to the
POWER8, with a different PVR identifier.  Mark it as such, so it shows up
appropriately to the user.

Reported by:	Alexey Kardashevskiy
MFC after:	2 weeks
2019-04-27 02:33:49 +00:00
Justin Hibbits
19cfd8759e powerpc: micro-optimize cpu_switch()
Since the non-volatile registers are restored at the end of cpu_switchin (of
the new thread) they're free for us to use for our own purposes.  Load the
PCB_FLAGS into a non-volatile register so it's preserved across the C
function calls that manage FPU and altivec state.  This removes 4 loads from
each file.  Might be a trivial performance improvement (~12 clock cycles per
context switch).

MFC after:	3 weeks
2019-04-27 00:53:41 +00:00
Alan Somers
e430d1ed78 Don't symlink fusefs.ko to fuse.ko on PPC
Some PPC systems (PowerNV) use msdosfs for /boot, which can't handle either
symlinks or hardlinks. So on PPC, copy the module instead. This change fixes
installkernel on such systems after r345350.

Reported by:	Brandon Bergren <git_bdragon.rtk0.net>
Reviewed by:	jhibbits, rgrimes
MFC after:	2 weeks
MFC-With:	345350, 346441
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19993
2019-04-26 20:15:47 +00:00
Alexander Motin
7763842174 Add mutex_destroy() missed in r334844.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-04-26 19:02:21 +00:00
Alexander Motin
32d8034f77 Fix minor mismerges.
No functional change.

MFC after:	1 week
2019-04-26 18:25:59 +00:00
Alan Somers
f841e638fb [skip ci] fix typo in comment from r59840
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-04-26 15:00:59 +00:00
Ed Maste
5803d72f7e make sysent after r346273 (readlinkat arg correction)
PR:		197915
Reminded by:	dchagin
2019-04-26 12:55:52 +00:00
Justin Hibbits
17b72853f4 powerpc64: Clear FSCR SPR, so that it's in a known state
This now turns any access to the DSCR SPR into a SIGILL.  Later commits will
make DCSR work correctly on POWER8 and POWER9.

PR:		237208
2019-04-26 03:18:49 +00:00
Justin Hibbits
38a6d5495b powerpc: Fix whitespace in SPR header. 2019-04-26 03:13:44 +00:00
Justin Hibbits
da54cd8721 powerpc: Add another feature2 flag, and update power9 definition
Also fix the definition of PPC_FEATURE2_HTM_NOSUSPEND, a bad line copy.

This now closer matches Linux's definition.
2019-04-26 02:30:03 +00:00
Rodney W. Grimes
a488c9c99a Add accessor function for vm->maxcpus
Replace most VM_MAXCPU constant useses with an accessor function to
vm->maxcpus which for now is initialized and kept at the value of
VM_MAXCPUS.

This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
work from D10070 to adjust it for the cpu topology changes that
occured in r332298

Submitted by:		Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
Reviewed by:		Patrick Mooney <patrick.mooney@joyent.com>
Approved by:		bde (mentor), jhb (maintainer)
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D18755
2019-04-25 22:51:36 +00:00
Ian Lepore
20105d31ee Fix typo: the 4th argument to GPIO_PIN_ACCESS_32 is the set of pins to
change, not the variable used to return the original pin state.

PR:		237378
Reported by:	Mori Hiroki <yamori813@yahoo.co.jp>
2019-04-25 22:27:56 +00:00
Johannes Lundberg
af248a7cee Don't call cdev_init where cdev_alloc is called. cdev_alloc already
handles initialization.

Reported by:	johalun
Reviewed by:	hps
Approved by:	imp (mentor), hps
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19565
2019-04-25 21:54:32 +00:00
Stephen Hurd
f154ece02e iflib: Better control over queue core assignment
By default, cores are now assigned to queues in a sequential
manner rather than all NICs starting at the first core. On a four-core
system with two NICs each using two queue pairs, the nic:queue -> core
mapping has changed from this:

0:0 -> 0, 0:1 -> 1
1:0 -> 0, 1:1 -> 1

To this:

0:0 -> 0, 0:1 -> 1
1:0 -> 2, 1:1 -> 3

Additionally, a device can now be configured to use separate cores for TX
and RX queues.

Two new tunables have been added, dev.X.Y.iflib.separate_txrx and
dev.X.Y.iflib.core_offset. If core_offset is set, the NIC is not part
of the auto-assigned sequence.

Reviewed by:	marius
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D20029
2019-04-25 21:24:56 +00:00
Emmanuel Vadot
16de4430fe arm: allwinner: aw_pwm: compile it as module too
MFC after:	1 month
2019-04-25 18:44:03 +00:00
Emmanuel Vadot
3de3007594 arm: allwinner: Add pnp info to aw_rsb and compile it as module too
MFC after:	1 month
2019-04-25 18:43:01 +00:00
Emmanuel Vadot
56c37d89b8 arm: allwinner: Add pnp info to if_awg and compile it as module too
While here make it depend on aw_sid as it's needed for mac generation.

MFC after:	1 month
2019-04-25 18:42:27 +00:00
John Baldwin
83bf5ec367 Remove p_code from struct proc.
Contrary to the comments, it was never used by core dumps or
debuggers.  Instead, it used to hold the signal code of a pending
signal, but that was replaced by the 'ksi_code' member of ksiginfo_t
when signal information was reworked in 7.0.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D20047
2019-04-25 18:42:07 +00:00
Emmanuel Vadot
abc15d70b0 arm: allwinner: Add pnp info to aw_rtc and compile it as module too
MFC after:	1 month
2019-04-25 18:41:05 +00:00
Emmanuel Vadot
dbc8d8a261 arm: allwinner: Add pnp info to axp81x and compile it as module too
MFC after:	1 month
2019-04-25 18:40:23 +00:00
Emmanuel Vadot
f9b1c6a029 arm: allwinner: Add pnp info to aw_thermal and compile it as module too
MFC after:	1 month
2019-04-25 18:39:41 +00:00
Emmanuel Vadot
db0e5bf390 arm: allwinner: Add pnpinfo for aw_sid and add module Makefile
MFC after:	1 month
2019-04-25 18:38:38 +00:00
Kyle Evans
e3a883c386 tap(4): Correct driver name...
Reported by:	rgrimes
Pointy hat to:	kevans
MFC after:	3 days
X-MFC-With:	r346688
2019-04-25 18:26:34 +00:00
Kyle Evans
9ea63b2caa tap(4): Add a MODULE_VERSION
Otherwise tap(4) can be loaded by loader despite being compiled into the
kernel, causing a panic as things try to double-initialize.

PR:		220867
MFC after:	3 days
2019-04-25 18:22:22 +00:00
Tycho Nightingale
b09626b330 LinuxKPI buildfix for ppc64 after r346645.
Proposed by:	hselasky
Sponsored by:	Dell EMC Isilon
2019-04-25 18:13:55 +00:00
Andrew Gallatin
50575ce11c Track TCP connection's NUMA domain in the inpcb
Drivers can now pass up numa domain information via the
mbuf numa domain field.  This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.

Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.

Reviewed by:	markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20028
2019-04-25 15:37:28 +00:00
Kyle Evans
c83651445b tun(4): Don't allow open of open or dying devices
Previously, a pid check was used to prevent open of the tun(4); this works,
but may not make the most sense as we don't prevent the owner process from
opening the tun device multiple times.

The potential race described near tun_pid should not be an issue: if a
tun(4) is to be handed off, its fd has to have been sent via control message
or some other mechanism that duplicates the fd to the receiving process so
that it may set the pid. Otherwise, the pid gets cleared when the original
process closes it and you have no effective handoff mechanism.

Close up another potential issue with handing a tun(4) off by not clobbering
state if the closer isn't the controller anymore. If we want some state to
be cleared, we should do that a little more surgically.

Additionally, nothing prevents a dying tun(4) from being "reopened" in the
middle of tun_destroy as soon as the mutex is unlocked, quickly leading to a
bad time. Return EBUSY if we're marked for destruction, as well, and the
consumer will need to deal with it. The associated character device will be
destroyed in short order.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20033
2019-04-25 13:46:12 +00:00
Kyle Evans
d91262603b tun/tap: close race between destroy/ioctl handler
It seems that there should be a better way to handle this, but this seems to
be the more common approach and it should likely get replaced in all of the
places it happens... Basically, thread 1 is in the process of destroying the
tun/tap while thread 2 is executing one of the ioctls that requires the
tun/tap mutex and the mutex is destroyed before the ioctl handler can
acquire it.

This is only one of the races described/found in PR 233955.

PR:		233955
Reviewed by:	ae
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20027
2019-04-25 12:44:08 +00:00
Hans Petter Selasky
d4fedb75ec LinuxKPI buildfix for 32-bit DMA architectures after r346645.
The <sys/pctrie.h> APIs expect a 64-bit DMA key.
This is fine as long as the DMA is less than or equal to 64 bits, which
is currently the case.

Sponsored by:		Mellanox Technologies
2019-04-25 09:13:15 +00:00
Rebecca Cran
56a70105df ACPI SPCR: handle BaudRate=0
From 7d8dc6544c

"The mcbin (and likely others) have a nonstandard uart clock. This means
that the earlycon programming will incorrectly set the baud rate if it is
specified. The way around this is to tell the kernel to continue using the
preprogrammed baud rate. This is done by setting the baud to 0."

Our drivers (uart_dev_ns8250) do respect zero, but SPCR would error. Let's
not error.

Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	mw, imp, bcran
Differential Revision:	https://reviews.freebsd.org/D19914
2019-04-25 02:16:48 +00:00
John Baldwin
6b0451d603 Add support for AES-CCM to ccr(4).
This is fairly similar to the AES-GCM support in ccr(4) in that it will
fall back to software for certain cases (requests with only AAD and
requests that are too large).

Tested by:	cryptocheck, cryptotest.py
MFC after:	1 month
Sponsored by:	Chelsio Communications
2019-04-24 23:31:46 +00:00
John Baldwin
8ccf3d974f Don't panic for empty CCM requests.
A request to encrypt an empty payload without any AAD is unusual, but
it is defined behavior.  Removing this assertion removes a panic and
instead returns the correct tag for an empty buffer.

Reviewed by:	cem, sef
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20043
2019-04-24 23:27:39 +00:00
John Baldwin
a2ad169e61 Fix requests for "plain" SHA digests of an empty buffer.
To workaround limitations in the crypto engine, empty buffers are
handled by manually constructing the final length block as the payload
passed to the crypto engine and disabling the normal "final" handling.
For HMAC this length block should hold the length of a single block
since the hash is actually the hash of the IPAD digest, but for
"plain" SHA the length should be zero instead.

Reported by:	NIST SHA1 test failure
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2019-04-24 23:18:10 +00:00
Oleksandr Tymoshenko
cc1ac7fcda [acpi_ibm] Add support for newer Thinkpad models
Add support for newer Thinkpad models with id LEN0268. Was tested on
Thinkpad T480 and ThinkPad X1 Yoga 2nd gen.

PR:		229120
Submitted by:	Ali Abdallah <aliovx@gmail.com>
MFC after:	1 week
2019-04-24 23:10:19 +00:00
Tycho Nightingale
f211d536b6 LinuxKPI should use bus_dma(9) to be compatible with an IOMMU
Reviewed by:	hselasky, kib
Tested by:	greg@unrelenting.technology
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19845
2019-04-24 20:30:45 +00:00
Alexander Motin
9c498bd5c3 Call delist_dev() before destroy_dev_sched_cb().
destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-04-24 19:56:02 +00:00
Conrad Meyer
f1498d7aa3 x86: Halt non-BSP CPUs on panic IPI_STOP
We may need the BSP to reboot, but we don't need any AP CPU that isn't the
panic thread.  Any CPU landing in this routine during panic isn't the panic
thread, so we can just detect !BSP && panic and shut down the logical core.

The savings can be demonstrated in a bhyve guest with multiple cores; before
this change, N guest threads would spin at 100% CPU.  After this change,
only one or two threads spin (depending on if the panicing CPU was the BSP
or not).

Konstantin points out that this may break any future patches which allow
switching ddb(4) CPUs after panic and examining CPU-local state that cannot
be inspected remotely.  In the event that such a mechanism is incorporated,
this behavior could be made configurable by tunable/sysctl.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20019
2019-04-24 18:24:22 +00:00
Ruslan Bukin
ef5a75b193 Add support for Cadence network controller found in HiFive Unleashed board.
Reviewed by:	markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19798
2019-04-24 13:44:30 +00:00
Ruslan Bukin
7bad03a8b5 Implement pic_pre_ithread(), pic_post_ithread().
Reviewed by:	markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19819
2019-04-24 13:41:46 +00:00
Andrew Gallatin
6d49b41ee8 iflib: Add pfil hooks
As with mlx5en, the idea is to drop unwanted traffic as early
in receive as possible, before mbufs are allocated and anything
is passed up the stack.  This can save considerable CPU time
when a machine is under a flooding style DOS attack.

The major change here is to remove the unneeded abstraction where
callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and
are responsible for NULL'ing that mbuf themselves. Now this happens
directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us
to use the decision (and potentially mbuf) returned by the pfil
hooks. The driver can now recycle mbufs to avoid re-allocation when
packets are dropped.

Reviewed by:	marius  (shurd and erj also provided feedback)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19645
2019-04-24 13:32:04 +00:00
Andrey V. Elsukov
aee793eec9 Add GRE-in-UDP encapsulation support as defined in RFC8086.
This GRE-in-UDP encapsulation allows the UDP source port field to be
used as an entropy field for load-balancing of GRE traffic in transit
networks. Also most of multiqueue network cards are able distribute
incoming UDP datagrams to different NIC queues, while very little are
able do this for GRE packets.

When an administrator enables UDP encapsulation with command
`ifconfig gre0 udpencap`, the driver creates kernel socket, that binds
to tunnel source address and after udp_set_kernel_tunneling() starts
receiving of all UDP packets destined to 4754 port. Each kernel socket
maintains list of tunnels with different destination addresses. Thus
when several tunnels use the same source address, they all handled by
single socket.  The IP[V6]_BINDANY socket option is used to be able bind
socket to source address even if it is not yet available in the system.
This may happen on system boot, when gre(4) interface is created before
source address become available. The encapsulation and sending of packets
is done directly from gre(4) into ip[6]_output() without using sockets.

Reviewed by:	eugen
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D19921
2019-04-24 09:05:45 +00:00
Justin Hibbits
19b86243f4 powerpc: Add a couple missing isyncs
mtmsr and mtsr require context synchronizing instructions to follow.  Without
a CSI, there's a chance for a machine check exception.  This reportedly does
occur on a MPC750 (PowerMac G3).

Reported by:	Mark Millard
2019-04-24 02:51:58 +00:00
Kyle Evans
af44a26351 fdt: stop installing FDT_DTS_FILE
r346307 inadvertently started installing FDT_DTS_FILE along with the kernel.
While this isn't necessarily bad, it was not intended or discussed and it
actively breaks some current setups that don't anticipate any .dtb being
installed when it's using static fdt. This change could be reconsidered down
the line, but it needs to be done with prior discussion.

Fix it by pushing FDT_DTS_FILE build down into the raw dtb.build.mk bits.
This technically allows modules building DTS to accidentally specify an
FDT_DTS_FILE that gets built but isn't otherwise useful (since it's not
installed), but I suspect this isn't a big deal and would get caught with
any kind of testing -- and perhaps this might end up useful in some other
way, for example by some module wanting to embed fdt in some other way than
our current/normal mechanism.

Reported by:	Mori Hiroki <yamori813@yahoo.co.jp>
MFC after:	3 days
X-MFC-With:	r346307
2019-04-24 01:11:50 +00:00
Dmitry Chagin
c034ecf316 Since r339624 HEAD does not need for backslashes in syscalls.master,
however to make a merge r345471 to the stable add backslashes
to the syscalls.master.

MFC after:	3 days
2019-04-23 18:10:46 +00:00
Kyle Evans
e8de0c3bda tun(4): Defer clearing TUN_OPEN until much later
tun destruction will not continue until TUN_OPEN is cleared. There are brief
moments in tunclose where the mutex is dropped and we've already cleared
TUN_OPEN, so tun_destroy would be able to proceed while we're in the middle
of cleaning up the tun still. tun_destroy should be blocked until these
parts (address/route purges, mostly) are complete.

PR:		233955
MFC after:	2 weeks
2019-04-23 17:28:28 +00:00
Conrad Meyer
5947c05768 ip6_randomflowlabel: Avoid blocking if random(4) is not available
If kern.random.initial_seeding.bypass_before_seeding is disabled, random(4)
and arc4random(9) will block indefinitely until enough entropy is available
to initially seed Fortuna.

It seems that zero flowids are perfectly valid, so avoid blocking on random
until initial seeding takes place.

Discussed with:	bz (earlier revision)
Reviewed by:	thj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20011
2019-04-23 17:18:20 +00:00
Leandro Lupori
8920043674 [PPC64] Fix wrong KASSERT in mphyp_pte_insert()
As mphyp_pte_unset() can also remove PTE entries, and as this can
happen in parallel with PTEs evicted by mphyp_pte_insert(), there
is a (rare) chance the PTE being evicted gets removed before
mphyp_pte_insert() is able to do so. Thus, the KASSERT should
check wether the result is H_SUCCESS or H_NOT_FOUND, to avoid
panics if the situation described above occurs.

More details about this issue can be found in PR 237470.

PR:		237470
Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D20012
2019-04-23 17:11:45 +00:00
Conrad Meyer
a9f7f19242 netdump: Fix !COMPAT_FREEBSD11 unused variable warning
Reported by:	Ralf Wenk <iz-rpi03_hs-karlsruhe.de>
Sponsored by:	Dell EMC Isilon
2019-04-23 17:05:57 +00:00
Ed Maste
e53f03384e Enable Mellanox drivers (modules) on AArch64
Tested by Greg V with mlx5en on an Ampere eMAG instance at Packet.com on
c2.large.arm (with some additional uncommitted PCIe WIP).

PR:		237055
Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	hselasky
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19983
2019-04-23 15:11:01 +00:00
Konstantin Belousov
c4cc609796 poib: assign link-local address according to RFC
RFC 4391 specifies that the IB interface GID should be re-used as IPv6
link-local address.  Since the code in in6_get_hw_ifid() ignored
IFT_INFINIBAND case, ibX interfaces ended up with the local address
borrowed from some other interface, which is non-compliant.

Use lowest eight bytes from GID for filling the link-local address,
same as Linux.

Reviewed by:	bz (previous version), ae, hselasky, slavash,
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20006
2019-04-23 12:23:44 +00:00
Bjoern A. Zeeb
d86ecbe993 iFix udp_output() lock inconsistency.
In r297225 the initial INP_RLOCK() was replaced by an early
acquisition of an r- or w-lock depending on input variables
possibly extending the write locked area for reasons not entirely
clear but possibly to avoid a later case of unlock and relock
leading to a possible race condition and possibly in order to
allow the route cache to work for connected sockets.

Unfortunately the conditions were not 1:1 replicated (probably
because of the route cache needs). While this would not be a
problem the legacy IP code compared to IPv6 has an extra case
when dealing with IP_SENDSRCADDR. In a particular case we were
holding an exclusive inp lock and acquired the shared udbinfo
lock (now epoch).
When then running into an error case, the locking assertions
on release fired as the udpinfo and inp lock levels did not match.

Break up the special case and in that particular case acquire
and udpinfo lock depending on the exclusitivity of the inp lock.

MFC After:	9 days
Reported-by:	syzbot+1f5c6800e4f99bdb1a48@syzkaller.appspotmail.com
Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D19594
2019-04-23 10:12:33 +00:00
Wojciech Macek
c591d46ee9 This patch offers a workaround to buf_ring reordering
visible on armv7 and armv8. Similar issue to rS302292.

Obtained from:         Semihalf
Authored by:           Michal Krawczyk <mk@semihalf.com>
Approved by:           wma
Differential Revision: https://reviews.freebsd.org/D19932
2019-04-23 06:36:32 +00:00
Justin Hibbits
f4c5f64d30 [PowerPC64] pseries-llan: increment packet output counters on error and success
Summary: when using pseries-llan driver, Opkts and Oerrs counters (netstat
-i) are always zero. This patch adds an small error handling to increment
these counters.

Submitted by:	alfredo.junior_eldorado.org.br
Differential Revision: https://reviews.freebsd.org/D20009
2019-04-23 03:19:03 +00:00
Justin Hibbits
ba5189f7be powerpc64/pseries: Fix hypervisor call with extra arguments
Some hypervisor calls, such as H_SEND_LOGICAL_LAN, take more arguments than
are traditionally passed in registers.  The HCALL ABI will accept these
arguments in r11 and r12.  With ELFv2 ABI, these arguments are 2
double-words lower than ELFv1 ABI, as two double-words in the stack frame
are no longer used, and therefore removed from the frame.  Fix the offsets
for loading the registers for the HCALL.  This fixes the phyp_llan driver
with ELFv2 kernel.

Submitted by:	alfredo.junior_eldorado.org.br
Differential Revision:	https://reviews.freebsd.org/D20008
2019-04-23 03:05:26 +00:00
Hans Petter Selasky
6bbdbbb830 Revert r346530 until further.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-22 19:36:19 +00:00
Andrew Gallatin
7687707dd4 Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets.  When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node.  Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.

Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.

Reviewed by:	glebius, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19930
2019-04-22 19:24:21 +00:00
Navdeep Parhar
61e02298ce cxgbe/t4_tom: Add a "TCB history" feature that samples hardware state
for a tid and maintains a running history of some interesting events.

Service TCP_INFO queries from the history when the tid is being tracked
there.
2019-04-22 17:48:10 +00:00
Navdeep Parhar
be7eaf979e cxgbe(4): Make sure bundled_fw is always initialized before use.
This fixes a bug that prevented the driver from auto-flashing the
firmware when it didn't see one on the card.  This feature was
introduced in r321390 and this bug was introduced in r343269.

Reported by:	gallatin@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-04-22 17:00:30 +00:00
Bjoern A. Zeeb
ade1258dc1 r297225 move the assignment of sin from add to the top of the function.
sin is not changed after the initial assignment, so no need to set it again.

MFC after:	10 days
2019-04-22 14:53:53 +00:00
Bjoern A. Zeeb
e932299837 Remove some excessive brackets.
No functional change.

MFC after:	10 days
2019-04-22 14:20:49 +00:00
Mark Johnston
94851f3788 Clarify the relationship between INVARIANTS and DIAGNOSTIC a bit.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-04-22 11:31:13 +00:00
Mark Johnston
c4e5de7e75 Disable vm map consistency checking by default on INVARIANTS kernels.
The checks are too expensive for a general-purpose kernel.  Enable the
checks when DIAGNOSTIC is defined and provide a sysctl to enable the
checks in a non-DIAGNOSTIC INVARIANTS kernel.

Reviewed by:	kib
Discussed with:	Doug Moore <dougm@rice.edu>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19999
2019-04-22 11:23:35 +00:00
Hans Petter Selasky
04f44499ca Fix build for mips and powerpc after r346530.
Need to include sys/kernel.h to define SYSINIT() which is used
by sys/eventhandler.h .

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-22 08:32:00 +00:00
Hans Petter Selasky
40eb389666 Fix panic in network stack due to memory use after free in relation to
fragmented packets.

When sending IPv4 and IPv6 fragmented packets and a fragment is lost,
the mbuf making up the fragment will remain in the temporary hashed
fragment list for a while. If the network interface departs before the
so-called slow timeout clears the packet, the fragment causes a panic
when the timeout kicks in due to accessing a freed network interface
structure.

Make sure that when a network device is departing, all hashed IPv4 and
IPv6 fragments belonging to it, get freed.

Backtrace:
panic()
icmp6_reflect()

hlim = ND_IFINFO(m->m_pkthdr.rcvif)->chlim;
^^^^ rcvif->if_afdata[AF_INET6] is NULL.

icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()

Differential Revision:	https://reviews.freebsd.org/D19622
Reviewed by:		bz (network), adrian
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-22 07:27:24 +00:00
Conrad Meyer
83efd2885e gnop(8): Nopify configuration as a kernel dump device
As a dummy / no-op dump device, to facilitate dumpon(8) testing.

Reviewed by:	markj (earlier version)
Differential Revision:	https://reviews.freebsd.org/D19991
2019-04-22 03:25:49 +00:00
Alexander Motin
5a9170aa4c Report DIF protection type the disk is formatted with.
Some disks formatted with protection report errors if written without
protection used.  This should help to diagnose the problem.

MFC after:	2 weeks
2019-04-22 01:08:14 +00:00
Rick Macklem
a6f77c9a6e Add #ifdef INET as requested by bz@. 2019-04-21 22:53:51 +00:00
Alexander Motin
ed569aadca Polish SCSI sense data validity checks.
According to specs and common sense, all sense data reported in descriptor
format should be valid.  But practice shows different, some devices return
descriptors with invalid data, resulting in error messages looking worse.

Decouple block/stream commands sense data and information field printing.
Looking on present specs, there are much more cases when those fields are
not related, and incomplete old code was not printing valid sense data and
leaving empty lines for invalid.

MFC after:	2 weeks
2019-04-21 19:07:03 +00:00
Ian Lepore
9e655cd522 Move the reporting of spurious interrupts under bootverbose control, because
occasional spurious interrupts are a normal thing on this hardware.  Also,
change the name of the cpu-local interrupt controller driver from local_intc
to lintc, because the name gets built into interrupt names, which have to
fit into a 19-byte field for stats reporting (so this allows 5 more bytes
of the actual interrupt name to be displayed).
2019-04-21 17:39:01 +00:00
Adrian Chadd
a8083b9c0b [ath] [ath_hal] [ath_hal_9300] Extend the start PCU receive to handle resetting ANI.
One of the fun issues with scanning has been how the existing
ANI values were programmed into the hardware when channels were
changed.  If you're on a really crappy channel and ANI has made
you deaf then when you scan you continue to be deaf on all channels.

This code passes in a flag to startpcureceive which in AR5416 and later
is also used to enable ANI.  This allows it to know if it's a normal
operation or a scan operation.

This fixes my situation at home where a temporary spot of a device
going deaf due to interference starts scanning and .. can't hear
anything until I restart.

Now, this isn't the full fix - ideally:

(a) all the ANI config and per-channel information would be migrated
     to the shared HAL stuff and enabled for all of the NICs;
(b) when a station reassociates and some other error conditions
    (like missed beacons, NF calibration failures, etc) a knob
    to reset ANI parameters would likely help recovery.

But hey, I'm committing bits of code again! woo!

Tested:

* AR9344 (2G), STA operation
2019-04-21 02:36:01 +00:00
Vladimir Kondratyev
bf33f20d96 psm(4): give names to synaptics commands
Submitted by:	Ben LeMasurier <ben@crypt.ly>
MFC after:	2 weeks
2019-04-20 21:06:12 +00:00
Vladimir Kondratyev
0c8a908463 psm(4): respect tap_disabled configuration with enabled Extended support
This fixes a bug where, even when hw.psm.tap_enabled=0, touchpad taps
were processed.
tap_enabled has three states: unconfigured, disabled, and enabled (-1, 0, 1).
To respect PR kern/139272, taps are ignored only when explicity disabled.

Submitted by:	Ben LeMasurier <ben@crypt.ly> (initial version)
MFC after:	2 weeks
2019-04-20 21:04:56 +00:00
Vladimir Kondratyev
51319286ed psm(4): do not process gestures when palm is present
Ignoring of gesture processing when the palm is detected helps to reduce
some of the erratic pointer behavior.

This fixes regression introduced in r317814

Reported by:	Ben LeMasurier <ben@crypt.ly>
MFC after:	2 weeks
2019-04-20 21:02:41 +00:00
Vladimir Kondratyev
232e4318b0 psm(4): Add support for 4 and 5 finger touches in synaptics driver
While 4-th and 5-th finger positions are not exported through PS/2
interface, total number of touches is reported by MT trackpads.

MFC after:	2 weeks
2019-04-20 21:00:44 +00:00
Conrad Meyer
60ade167fd netdump: Fix 11 compatibility DIOCSKERNELDUMP ioctl
The logic was present for the 11 version of the DIOCSKERNELDUMP ioctl, but
had not been updated for the 12 ABI.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D19980
2019-04-20 16:07:29 +00:00
Ed Maste
ff9be73ee3 Enable ioremap for aarch64 in the LinuxKPI
Required for Mellanox drivers (e.g. on Ampere eMAG at Packet.com).

PR:		237055
Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D19987
2019-04-20 15:57:05 +00:00