Commit Graph

125262 Commits

Author SHA1 Message Date
Brooks Davis
7a5db3a770 Remove ifdef BOOTCDROM option to start init.
When BOOTCDROM is defined (via CFLAGS as there is no config option)
it causes -C to be passed to init, but our init and the version of
sysinstall I glanced at in 6.x don't support -C. The last plausibly
related support was removed from the tree in 1995.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18431
2018-12-05 17:29:14 +00:00
Mark Johnston
9d2877fc3d Clamp the INPCB port hash tables to IPPORT_MAX + 1 chains.
Memory beyond that limit was previously unused, wasting roughly 1MB per
8GB of RAM.  Also retire INP_PCBLBGROUP_PORTHASH, which was identical to
INP_PCBPORTHASH.

Reviewed by:	glebius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17803
2018-12-05 17:06:00 +00:00
Mateusz Guzik
f26db6948d sx: retire SX_NOADAPTIVE
The flag is not used by anything for years and supporting it requires an
explicit read from the lock when entering slow path.

Flag value is left unused on purpose.

Sponsored by:	The FreeBSD Foundation
2018-12-05 16:43:03 +00:00
Hans Petter Selasky
52da588961 Remove redundant declaration after r341517.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-12-05 15:56:44 +00:00
Hans Petter Selasky
6da0d28e6a Fix some build of LinuxKPI on some platforms after r341518.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-12-05 15:53:34 +00:00
Hans Petter Selasky
8a886978d4 Fix LINT build after r341572.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-12-05 15:42:31 +00:00
Vincenzo Maffione
1d9ec3ee50 netmap.h: include stdatomic.h
The stdatomic.h header exports atomic_thread_fence(), that
can be used to implement the nm_stst_barrier() macro needed
by netmap.

MFC after:	3 days
2018-12-05 15:38:52 +00:00
Slava Shwartsman
0f3b263d83 mlx4/mlx5: Updated driver version to 3.5.0
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:25:34 +00:00
Slava Shwartsman
cc971b2261 mlx5en: Implement backpressure indication.
The backpressure indication is implemented using an unlimited rate type of
mbuf send tag. When the upper layers typically the socket layer has obtained such
a tag, it can then query the destination driver queue for the current
amount of space available in the send queue.

A single mbuf send tag may be referenced multiple times and a refcount has been added
to the mlx5e_priv structure to track its usage. Because the send tag resides
in the mlx5e_channel structure, there is no need to wait for refcounts to reach
zero until the mlx4en(4) driver is detached. The channels structure is persistant
during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed
without any need of synchronization.

The mlx5e_snd_tag structure was extended to contain a type field, because there are now
two different tag types which end up in the driver which need to be distinguished.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:25:03 +00:00
Slava Shwartsman
71defeda26 mlx5en: Improve configuration of HW LRO.
In order to enable HW LRO, both the "hw_lro" sysctl in the mlx5en(4) config
space must be set, and the ifconfig(8) LRO capability must be set. Any other
settings will disable HW LRO.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:24:33 +00:00
Slava Shwartsman
01f02abfec mlx5en: Count all transmitted and received bytes.
Add counter for all transmitted and received bytes. Currently only all
transmitted and received packets were counted. Fix description of RX LRO
counters while at it.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:24:02 +00:00
Slava Shwartsman
3230c29d72 mlx5en: Statically allocate and free the channel structure(s).
By allocating the worst case size channel structure array
at attach time we can eliminate various NULL checks in the
fast path. And also reduce the chance for use-after-free
issues in the transmit fast path.

This change is also a requirement for implementing
backpressure support.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:23:31 +00:00
Slava Shwartsman
b3cf149325 mlx5en: Fix race in mlx5e_ethtool_debug_stats().
Writing to the debug stats variable must be locked,
else serialization will be lost which might cause
various kernel panics due to creating and destroying
sysctls out of order.

Make sure the sysctl context is initialized after freeing
the sysctl nodes, else they can be freed twice.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:23:01 +00:00
Slava Shwartsman
42390bb884 mlx5en: Add support for IFM_10G_LR and IFM_40G_ER4 media types.
Inspect the ethernet compliance code to figure out actual cable type by reading
the PDDR module info register.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:22:30 +00:00
Slava Shwartsman
f292a94d6b mlx5en: Don't set rate on SQs when the SQ is already stopped.
This can happen when connections are short lived and leads to
a firmware error printout in dmesg, syndrome 0x51cfb0, because
the SQ is in the wrong state.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:21:59 +00:00
Slava Shwartsman
3e581cabf0 mlx5en: Fix for inlining issues in transmit path
1) Don't exceed the drivers own hardcoded TX inline limit.

The blueflame register size can be much greater than the hardcoded limit
for inlining. Make sure we don't exceed the drivers own limit, because this
also means that the maximum number of TX fragments becomes invalid and
then memory size assumptions in the TX path no longer hold up.

2) Make sure the mlx5_query_min_inline() function returns an error code.

3) Header inlining is required when using TSO.

4) Catch failure to compute inline header size for TSO.

5) Add support for UDP when computing inline header size.

6) Fix for inlining issues with regards to DSCP.

Make sure we inline 4 bytes beyond the ethernet and/or
VLAN header to workaround a hardware bug extracting
the DSCP field from the IPv4/v6 header.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:21:28 +00:00
Slava Shwartsman
d51ced5fae mlx5en: Remove the DRBR and associated logic in the transmit path.
The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:20:57 +00:00
Slava Shwartsman
e870c0ab61 mlx5en: Implement support for bandwidth limiting in by ratio, ETS.
Add support for setting the bandwidth limit as a ratio rather than in bits per
second. The ratio must be an integer number between 1 and 100 inclusivly.

Implement the needed firmware commands and SYSCTLs through mlx5en(4).

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:20:26 +00:00
Slava Shwartsman
d82f1c13ad mlx5fpga: Add set and query connect/disconnect FPGA
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:19:55 +00:00
Slava Shwartsman
085b35bb69 mlx5fpga: IOCTL for FPGA temperature measurement
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:19:23 +00:00
Slava Shwartsman
b5f5751275 mlx5fpga: Support MorseQ board
Added and supported new enum "morseQ = 4" for fpga_id field

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:18:52 +00:00
Slava Shwartsman
d50c55f17d mlx5fpga_tools initial code import.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:17:22 +00:00
Slava Shwartsman
e9dcd83155 mlx5fpga: Initial code import.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:11:20 +00:00
Slava Shwartsman
67db687393 mlx5ib: Set default active width and speed when querying port.
Make sure the active width and speed is set in case the
translate_eth_proto_oper() function doesn't recognize the
current port operation mask.

Linux commit:
7672ed33c4c15dbe9d56880683baaba4227cf940

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:49:11 +00:00
Slava Shwartsman
d3300d4aed mlx5ib: Make sure the congestion work timer does not escape the drain procedure.
If the mlx5_ib_read_cong_stats() function was running when mlx5ib was unloaded,
because this function unconditionally restarts the timer, the timer can still
be pending after the delayed work has been cancelled. To fix this simply loop
on the delayed work cancel procedure as long as it returns non-zero.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:48:39 +00:00
Slava Shwartsman
ac2fdeb4e7 mlx5ib: Fix null pointer dereference in mlx5_ib_create_srq
Although "create_srq_user" does overwrite "in.pas" on some paths, it
also contains at least one feasible path which does not overwrite it.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:48:10 +00:00
Slava Shwartsman
c9e9b5c104 mlx5ib: Fix sign extension in mlx5_ib_query_device
"fw_rev_min(dev->mdev)" with type "unsigned short" (16 bits, unsigned) is
promoted in "fw_rev_min(dev->mdev) << 16" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long" (64 bits, unsigned). If
"fw_rev_min(dev->mdev) << 16" is greater than 0x7FFFFFFF, the upper bits of the
result will all be 1.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:41 +00:00
Slava Shwartsman
31c3f64819 mlx5: Fix driver version location
Driver description should be set by core and not by the Ethernet driver.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:10 +00:00
Slava Shwartsman
721a1a6a69 mlx5: Fixes to allow command polling mode to exist alongside event mode.
A command is either polling or event driven and the mode cannot change
during execution of a command. Make sure the event handler only handle
commands which are not polled. This is done by checking the command mode
in the command handler before completing commands.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:46:39 +00:00
Slava Shwartsman
b084af6cdc mlx5: Fix wrong size allocation for QoS ETC TC register
The driver allocates wrong size (due to wrong struct name) when issuing
a query/set request to NIC's register.

Linux commit:
d14fcb8d877caf1b8d6bd65d444bf62b21f2070c

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:46:09 +00:00
Slava Shwartsman
8718eb63f8 mlx5: Add software tx_jumbo_packets counter
This counter will represent transmitted packets which has more than
1518 octets.
The NIC has multiple hardware counters for counting transmitted
packets larger than 1518 octets. Each counter counts the packets
in specific range.
We accumulate those counters to have a single counter.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:45:37 +00:00
Slava Shwartsman
feb5f357ea mlx5: Implement support for configuring PCIe packet write ordering via a sysctl.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:45:08 +00:00
Slava Shwartsman
70b417cf90 mlx5: Extend vector argument to u64.
Else the MLX5_TRIGGERED_CMD_COMP flag will be masked away.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:44:38 +00:00
Slava Shwartsman
29e544513e mlx5: Add global control to disable firmware reset, for all mlx5 devices.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:44:08 +00:00
Slava Shwartsman
2119f825d1 mlx5: Fix use-after-free in self-healing flow
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:43:37 +00:00
Slava Shwartsman
8f7f07368d mlx5: Move hw.mlx5 node definition to mlx5_core.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:43:07 +00:00
Slava Shwartsman
63cc6d1bc2 mlx5: Convert some spaces into tabs and use device_printf() instead of printf().
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:42:36 +00:00
Slava Shwartsman
abb28d287b mlx5: Add SRQ fixes from Linux
Combine multiple fixes from Linux to SRQ.
Linux commits:
c73b791 IB/mlx5: Assign SRQ type earlier
0fd27a8 IB/mlx5: Fix out-of-bound access
c2b37f7 IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
d63c467 RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:42:06 +00:00
Slava Shwartsman
3b21d18587 mlx5: Fix for potential memory leaks.
Make sure allocated data gets freed in error cases.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:41:37 +00:00
Slava Shwartsman
07b624ed71 mlx5: Discard unused return values.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:41:06 +00:00
Slava Shwartsman
843a89d37e mlx5: Raise fatal IB event when sys error occurs
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Linux commit:
aba462134634b502d720e15b23154f21cfa277e5

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:40:36 +00:00
Slava Shwartsman
2bf40c3608 mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow.

Linux commit:
28e9091e3119933c38933cb8fc48d5618eb784c8

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:40:05 +00:00
Slava Shwartsman
8567696305 mlx4en: Optimise reception of small packets.
Copy small packets like TCP ACKs into a new mbuf
reusing the existing mbuf to receive a new ethernet
frame. This avoids wasting buffer space for
small sized packets.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:39:35 +00:00
Slava Shwartsman
6217a33f85 mlx4: Make sure default VNET is set when adding a new interface.
Adding an interface might be done outside the device_attach() routine
and will then cause a panic, due to the VNET not being defined.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:39:05 +00:00
Slava Shwartsman
ec673cf60b mlx4en: Remove duplicate statistics variable assignment.
The "priv->pkstats.rx_dropped" is written twice in a row.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:38:35 +00:00
Slava Shwartsman
93bf821652 mlx4en: Add support for receiving all data using one or more MCLBYTES sized mbufs.
Also when the MTU is greater than MCLBYTES.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:32:46 +00:00
Slava Shwartsman
63d7a8d9a8 mlx4en: Add support for netdump.
Implement the needed callback functions and support for polling the driver.

Differential Revision: https://reviews.freebsd.org/D15259
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:32:15 +00:00
Slava Shwartsman
b7d573c5a4 mlx4en: Remove the DRBR and associated logic in the transmit path.
The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:31:45 +00:00
Slava Shwartsman
5dc2eaac65 mlx4en: Add driver version to sysctl desc
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:31:14 +00:00
Slava Shwartsman
c8aa689960 mlx4: Add board identifier and firmware version to sysctl
In last mlx4 update (r325841) we lost the sysctl to show the
firmware version for mlx4 devices.
Add both board identifier and firmware version under:
sys.device.mlx4_core0.hw sysctl node.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:30:48 +00:00
Slava Shwartsman
9024b80885 mlx4core: Add checks for invalid port numbers.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:30:16 +00:00
Slava Shwartsman
65ad766f36 mlx4: Zero initialize device capabilities to avoid use of uninitialized fields.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:29:46 +00:00
Slava Shwartsman
601e19f00a mlx4core: Avoid multiplication overflow by casting multiplication.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:29:16 +00:00
Slava Shwartsman
3006180863 krping: Fix for memory leak in error case.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:27:48 +00:00
Slava Shwartsman
186058782a ipoib: Notify on modify QP failure only when relevant
Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.

The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.

Linux commit:
5dc78ad1904db597bdb4427f3ead437aae86f54c

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:27:17 +00:00
Slava Shwartsman
a061f0eb65 ipoib: increase the non-cm queue length
When a packet needs fragmentation, it might generate more than 3 fragments.
With the queue length 3, all fragments are generated faster than the
queue is drained, which effectively drops fourth and later fragments on
the floor.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:26:47 +00:00
Slava Shwartsman
d705eff259 ipoib: Don't do a light flush when MTU is unchanged.
When changing the MTU of ibX network interfaces, check that the MTU was really
changed before requesting an update of the multicast rules. Else we might go
into an infinite loop joining and leaving ibX multicast groups towards the
opensm master interface.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:26:17 +00:00
Slava Shwartsman
099ad46e81 ipoib: correct setting MTU from inside ipoib(4).
It is not enough to set ifnet->if_mtu to change the interface MTU.
System saves the MTU for route in the radix tree, and route cache keeps
the interface MTU as well. Since addition of the multicast group causes
recalculation of MTU, even bringing the interface up changes MTU from
4042 to 1500, which makes the system configuration inconsistent. Worse,
ip_output() prefers route MTU over interface MTU, so large packets are
not fragmented and dropped on floor.

Fix it for ipoib(4) using the same approach (or hack) as was applied
for it_tun/if_tap in r339012.  Thanks to bz@ for giving the hint.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:25:47 +00:00
Slava Shwartsman
e13619b68b ibcore: Fix clearing of bound device interface.
Binding to a loopback device is not allowed. Make sure the destination
device address is global by clearing the bound device interface.
Only do this conditionally, else link local addresses won't work.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:25:13 +00:00
Slava Shwartsman
a9c20af23d ibcore: ip6_dev_find() needs to know the scope ID.
Else the wrong network device can be returned for link-local addresses.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:24:43 +00:00
Slava Shwartsman
5ace00dffe ibcore: Fix sleeping in atomic when RoCE is used
A couple of places in the CM do

    spin_lock_irq(&cm_id_priv->lock);
    ...
    if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))

However when the underlying transport is RoCE, this leads to a sleeping function
being called with the lock held - the callchain is

    cm_alloc_response_msg() ->
      ib_create_ah_from_wc() ->
        ib_init_ah_from_wc() ->
          rdma_addr_find_l2_eth_by_grh() ->
            rdma_resolve_ip()

and rdma_resolve_ip() starts out by doing

    req = kzalloc(sizeof *req, GFP_KERNEL);

not to mention rdma_addr_find_l2_eth_by_grh() doing

    wait_for_completion(&ctx.comp);

to wait for the task that rdma_resolve_ip() queues up.

Fix this by moving the AH creation out of the lock.

Linux commit:
c76161181193985087cd716fdf69b5cb6cf9ee85

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:24:12 +00:00
Slava Shwartsman
0231669450 ibcore: Add missing unref of netdevice.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:23:44 +00:00
Slava Shwartsman
ae8534ca28 ibcore: Fix loopback with rdma-cm.
Trying to validate loopback fails because rtalloc1() resolves system
local addresses to the loopback network interface, lo0. Fix this by
explicitly checking for loopback during validation of the source
and destination network address. If the source address belongs to
a local network interface and is equal to the destination address,
there is no need to run the destination address through rtalloc1().

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:23:14 +00:00
Slava Shwartsman
f3cf3b7e84 ibcore: Make sure all VNETs are scanned for VLAN interfaces.
The master network interface and the VLANs may reside in different VNETs.
Make sure that all VNETs are searched when scanning for GID entries.

Submitted by:   netapp
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:22:43 +00:00
Slava Shwartsman
af60974508 ibcore: Always check return value from ib_init_ah_from_wc().
This prevents code from accepting RoCEv1 connections when
only ROCEv2 is enabled and vice versa.

Linux commit:
0c4386ec77cfcd0ccbdbe8c2e67dd3a49b2a4c7f

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:22:07 +00:00
Slava Shwartsman
9fc9810098 ibcore: Add missing check for failure.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:21:20 +00:00
Slava Shwartsman
33d7f9b8fb ibcore: Fix an array index check
The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:

Overrunning array class->method_table of 80 8-byte elements at element index 127
(byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class)
(which evaluates to 127).

Linux commit:
2fe2f378dd45847d2643638c07a7658822087836

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:20:51 +00:00
Slava Shwartsman
fe521b4443 ibcore: Check ib_find_pkey() return value.
Linux commit:
d3a2418ee36a59bc02e9d454723f3175dcf4bfd9

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:20:22 +00:00
Slava Shwartsman
4aa5230dc5 ibcore: Add support for IB_SPEED_HDR in sysfs rate printout.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:19:52 +00:00
Slava Shwartsman
475c8de7bf ibcore: Don't access invalid port.
The port number in the listen_id_priv has been observed to be zero which
means no port has been selected. The current code lacks a check for invalid
port number.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:19:21 +00:00
Slava Shwartsman
4b9b52a1bd ibcore: Discard unused error codes.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:18:50 +00:00
Slava Shwartsman
f628150b6c ibcore: Make sure GID index variable gets initialized.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:18:20 +00:00
Slava Shwartsman
452d59e130 linuxkpi: Really check if PCI is offline
Currently we always return false if for PCI offline query.
Try to read PCI config, if the return value if 0xffff probably the
PCI is offline.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:17:45 +00:00
Slava Shwartsman
92cbd83001 linuxkpi: properly implement netif_carrier_ok().
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:17:15 +00:00
Slava Shwartsman
9c7b53cc65 linuxkpi: Fix for use-after-free when tearing down character devices.
Make sure we hold a reference on the character device for every opened file
to prevent the character device to be freed prematurely.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:16:39 +00:00
Slava Shwartsman
be34cfc587 linuxkpi: implement idr_is_empty() and ida_is_empty().
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:15:57 +00:00
Vincenzo Maffione
b6e66be22b netmap: align codebase to the current upstream (760279cfb2730a585)
Changelist:
  - Replace netmap passthrough host support with a more general
    mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop.
    No kernel threads are used to use this feature: the application
    is required to spawn a thread (or a process) and issue a
    SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The
    kernel loop is executed by the ioctl implementation, which returns
    to userspace only when a different thread calls SYNC_KLOOP_STOP
    or the netmap file descriptor is closed.
  - Update the if_ptnet driver to cope with the new data structures,
    and prune all the obsolete ptnetmap code.
  - Add support for "null" netmap ports, useful to allocate netmap_if,
    netmap_ring and netmap buffers to be used by specialized applications
    (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect.
  - Various fixes and code refactoring.

Sponsored by:	Sunny Valley Networks
Differential Revision:	https://reviews.freebsd.org/D18015
2018-12-05 11:57:16 +00:00
Alex Richardson
c10d927c40 Fix newvers.sh with BUILD_WITH_STRICT_TMPPATH=1
newvers.sh runs mkfifo which did not exist before this change.
However, I didn't notice before because it is run from a function
where a missing command does cause a noticeable failure.

Reviewed By:	emaste, markj
Differential Revision: https://reviews.freebsd.org/D18377
2018-12-05 10:57:57 +00:00
Eric van Gyzen
9dae3b521e altq: manual cleanup after r341507
Remove a file that became practically empty.
Fix indentation.

Like r341507, I do not plan to MFC, but anyone else can.
2018-12-04 23:53:42 +00:00
Eric van Gyzen
325fab802e altq: remove ALTQ3_COMPAT code
This code has apparently never compiled on FreeBSD since its
introduction in 2004 (r130365).  It has certainly not compiled
since 2006, when r164033 added #elsif [sic] preprocessor directives.
The code was left in the tree to reduce the diff from upstream (KAME).
Since that upstream is no longer relevant, remove the long-dead code.

This commit is the direct result of:

    unifdef -m -UALTQ3_COMPAT sys/net/altq/*

A later commit will do some manual cleanup.

I do not plan to MFC this.  If that would help you, go for it.
2018-12-04 23:46:43 +00:00
Brooks Davis
948574123d Regen after r341495: Remove NOARGS from oaccept. 2018-12-04 21:57:26 +00:00
Brooks Davis
41f7b25317 Remove NOARGS from oaccept.
This was in the orignal patch, but lost in a rebase.

Reported by:	andrew
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15816
2018-12-04 21:56:45 +00:00
Maxim Sobolev
9dcafe16d4 Another attempt to fix issue with the DIOCGDELETE ioctl(2) not
handling slightly out-of-bound requests properly (r340187).
Perform range check here rather then rely on g_delete_data() to DTRT.

The g_delete_data() would always return success for requests
starting just the next byte after providers media boundary.

MFC after:	4 weeks
2018-12-04 21:48:56 +00:00
Brooks Davis
63de13cfee Regen after r341474: Normalize COMPAT_43 syscall declarations. 2018-12-04 16:49:14 +00:00
Brooks Davis
d48719bd96 Normalize COMPAT_43 syscall declarations.
Have ogetkerninfo, ogetpagesize, ogethostname, osethostname, and oaccept
declare o<foo>_args structs rather than non-compat ones. Due to a
failure to use NOARGS in most cases this adds only one new declaration.

No changes required in freebsd32 as only ogetpagesize() is implemented
and it has a 32-bit specific implementation.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15816
2018-12-04 16:48:47 +00:00
Andrey V. Elsukov
d66f9c86fa Add ability to request listing and deleting only for dynamic states.
This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but
after rules reloading some state must be deleted. Added new flag '-D'
for such purpose.

Retire '-e' flag, since there can not be expired states in the meaning
that this flag historically had.

Also add "verbose" mode for listing of dynamic states, it can be enabled
with '-v' flag and adds additional information to states list. This can
be useful for debugging.

Obtained from:	Yandex LLC
MFC after:	2 months
Sponsored by:	Yandex LLC
2018-12-04 16:12:43 +00:00
Andrey V. Elsukov
cefe3d67e2 Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".

Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.

The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.

Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.

ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.

ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.

External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.

Obtained from:	Yandex LLC
MFC after:	2 months
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D17532
2018-12-04 16:01:25 +00:00
Andrey V. Elsukov
0df76496a6 Add assertion to check that named object has correct type.
Obtained from:	Yandex LLC
MFC after:	1 week
2018-12-04 15:12:28 +00:00
Justin Hibbits
bfed756af6 Sprinkle EARLY_DRIVER_MODULE around the tree
Mark some buses as BUS_PASS_BUS, and some resources as BUS_PASS_RESOURCE.
This also decouples some resource attachment orderings from being races by
device tree ordering, instead relying on the bus pass to provide the
ordering.

This was originally intended to support multipass suspend/resume, but it's
also needed on PowerMacs when using fdt, as the device tree seems to get
created in reverse of the OFW tree.
Reviewed by:	nwhitehorn (long ago)
Differential Revision:	https://reviews.freebsd.org/D918
2018-12-04 04:55:49 +00:00
Justin Hibbits
f1e0cb5ef1 powerpc: preload_addr_relocate is no longer necessary for booke
The same behavior was moved to machdep.c, paired with AIM's relocation,
making this redundant.  With this, it's now possible to boot FreeBSD with
ubldr on a uboot Book-E platform, even with a
KERNBASE != VM_MIN_KERNEL_ADDRESS.
2018-12-04 03:51:10 +00:00
Brooks Davis
3a325dec32 Remove a needlessly clever hack to start init with sys_exec().
Construct a struct image_args with the help of new exec_args_*() helper
functions and call kern_execve().

The previous code mapped a page in userspace, copied arguments out
to it one at a time, and then constructed a struct execve_args all so
that sys_execve() can call exec_copyin_args() to copy the data back in
to a struct image_args.

Opencode the part of pre_execve()/post_execve() that releases a
reference to the initial vmspace. We don't need to stop threads like
they do.

Reviewed by:	kib, jhb (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15469
2018-12-04 00:15:47 +00:00
Konstantin Belousov
f186340011 Improve procstat reporting for the linux cdev file descriptors.
If there is a vnode attached to the linux file, use it to fill
kinfo_file.  Otherwise, report a new KF_TYPE_DEV file type, without
supplying any type-specific information.

KF_TYPE_DEV is supposed to be used by most devfs-specific file types.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-12-03 23:39:45 +00:00
Mark Johnston
02164d3603 Add a missing definition for the !COMPAT_FREEBSD32 case.
Reported by:	jenkins
MFC with:	r341442
Sponsored by:	The FreeBSD Foundation
2018-12-03 21:07:10 +00:00
Mark Johnston
352aaa5122 Plug memory disclosures via ptrace(2).
On some architectures, the structures returned by PT_GET*REGS were not
fully populated and could contain uninitialized stack memory.  The same
issue existed with the register files in procfs.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	kib
MFC after:	3 days
Security:	kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18421
2018-12-03 20:54:17 +00:00
Toomas Soome
7aaf685ba7 zfs: we can boot from dataset with large_dnode enabled
loader has been supporting large_dnode for some time, no need to block the
feature for boot dataset.

Reviewed by:	avg
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D18391
2018-12-03 19:35:21 +00:00
Justin Hibbits
f095905ca4 powerpc: Check for a fdt in the metadata if it doesn't already exist
It's possible the fdt pointer was passed in via the metadata, as is done in
ubldr.  Check for the fdt here, instead of working with a NULL fdt, and
panicking.
2018-12-03 04:56:06 +00:00
Justin Hibbits
3c0b081966 powerpc/booke: Check for the metadata address by physical address
The metadata pointer will almost never be at or above 'btext', as btext is a
relocated symbol, so will be based at VM_MIN_KERNEL_ADDRESS, not at
KERNBASE.  Check the address against kernload, where the kernel is
physically loaded.
2018-12-03 04:47:28 +00:00
Oleksandr Tymoshenko
440e13cf1e Fix PCI driver unload for Marvell PCI controller
Add generic implementation for bus_deactivate_resource method. Without
it bus_release_resource fails with "Failed to release active resource"
message

MFC after:	1 week
2018-12-02 21:58:36 +00:00
Andreas Tobler
905bbe376c Build the dtb for the rock64 board.
Reviewed by:	manu@
2018-12-02 19:36:20 +00:00
Andreas Tobler
4df1cc71fd Add rule to build the dtb for the rock64 board.
Reviewed by:	manu@
2018-12-02 19:35:22 +00:00
Konstantin Belousov
b8c20c02cc Fix off-by-one (page) errors in checks in d_mmap methods of several drivers.
Reported by:	C Turt <ecturt@gmail.com>
Reviewed by:	alc, markj
admbug:		781
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-12-02 18:30:58 +00:00
Konstantin Belousov
d77e8982ab Add a comment noting that the additional range checks are not needed.
The object size is set in the dsp_mmap_single() which provides the
range limit by vm_fault().

Reported by:	C Turt <ecturt@gmail.com>
Reviewed by:	alc, markj
admbug:		781
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-12-02 13:29:13 +00:00
Konstantin Belousov
83fb1d62ca Fix off by one in hpet_mmap() csw method.
Reported by:	C Turt <ecturt@gmail.com>
Reviewed by:	alc, markj
Tested by:	pho
admbug:		781
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-12-02 13:27:36 +00:00
Konstantin Belousov
10d9120c44 Change the vm_ooffset_t type to unsigned.
The type represents byte offset in the vm_object_t data space, which
does not span negative offsets in FreeBSD VM.  The change matches byte
offset signess with the unsignedness of the vm_pindex_t which
represents the type of the page indexes in the objects.

This allows to remove the UOFF_TO_IDX() macro which was used when we
have to forcibly interpret the type as unsigned anyway.  Also it fixes
a lot of implicit bugs in the device drivers d_mmap methods.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2018-12-02 13:16:46 +00:00
Konstantin Belousov
200bf72793 Correct accuracy of the barrier writes accounting.
Discussed with:	mckusick
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-12-02 12:53:39 +00:00
Michal Meloun
f788dba5bb Return computed real memory size, not a value from similarly named
global variable.

MFC after:	1 week
2018-12-02 07:39:16 +00:00
Conrad Meyer
d86e0b338f pmcr: Fix pstate setting on Power8
Fix p-state setting on Power8 by removing the accidental double-indirection of
the pstate_ids table.

The pstate_ids table comes from the OF property "ibm,pstate-ids."  On Power9,
the values happen to be identical to the indices, so the extra indirection was
harmless.  On Power8, the values were out of the range [0, npstates], so
pmcr_set() would fail the spec[0] range check with EINVAL.

While here, include both the value and index in the driver-specific register
array as spec[0] and spec[1] respectively.  They're redundant, but relatively
harmless, and it may aid debugging.

While here, fix the range check to exclude the index npstates, which is one
past the last valid index.

PR:		233693
Reported and tested by:	sbruno
Reviewed by:	jhibbits
2018-12-01 21:37:47 +00:00
Conrad Meyer
23459e135d Restore build of the kernel, removed through r341377
Turns out we already had a path_mtu_discovery variable.

X-MFC-with:	r341377
2018-12-01 21:28:05 +00:00
Emmanuel Vadot
04f9b8a116 Add Silergy SYR827 PMIC driver
SYR827 is a PMIC that can output a voltage from 0.7125V to 1.5V in 12.5mV steps
It's controlled via I2C.

MFC after:	1 month
2018-12-01 20:31:49 +00:00
Emmanuel Vadot
affb46a826 arm64: rockchip: rk805: Add basic support for RK808 PMIC
RK808 PMIC is the companion chip for RK3399 SoC.
Add basic regulator support in RK805 since they are similar.

MFC after:	1 month
2018-12-01 20:31:05 +00:00
Cy Schubert
a23b6c018b Remove IFF_DRVRLOCK as it is used in IRIX only (and we all know IRIX
is dead). This includes collaterally removing code shared by HP/UX,
SGI, and Linux, where IP Filter will in all likelihood for various
reasons never run again.

MFC after:	1 week
2018-12-01 20:30:18 +00:00
Emmanuel Vadot
1d47648a76 arm64: rockchip: rk_i2c: Use correct clock
While here add RK3399 support and call clk_set_assigned to set the correct
clock set in the DTS.

MFC after:	1 month
2018-12-01 20:29:42 +00:00
Emmanuel Vadot
36ae7efe61 arm64/rockchip: add RK3399 support
Add CRU (Clock and Reset Unit) driver for RK3399.
Add support in rk_pinctrl driver.

Submitted by:  Greg V <greg@unrelenting.technology> (Original version)
Differential Revision: https://reviews.freebsd.org/D16732

MFC after:	1 month
2018-12-01 20:28:16 +00:00
Emmanuel Vadot
c3aec23870 arm64: rockchip: Add RK3399_CLK_PLL
PLLs on the RK3399 are different than the ones on the RK3328.
Add a new type and some dedicated recalc and set_freq functions.
Rename the RK3328 dedicated rk_clk_pll function with rk3328_ prefix.

MFC after:	1 month
2018-12-01 20:26:59 +00:00
Cy Schubert
8a50297550 Restore handling of PMTU discovery, removed through an unifdef(1)
following the MFV of r254219 into r255332. In addition the 'FreeBSD'
macro was never defined in ipfilter 5.1.2 thus it never would have
been enabled in the first place.

This work is prompted by a general cleanup of the IP Filter code
prompted by working to resolve a PR. More to follow.

MFC after:	1 week
2018-12-01 17:59:41 +00:00
Konstantin Belousov
a823302783 Allow to create swap zone larger than v_page_count / 2.
If user configured the maxswapzone tunable, just take the literal
value for the initial zone sizing attempt.  Before, it was only
possible to reduce the zone by the tunable.

While there, correct the message which was not correct when zone
creation rounded the size up.

Reported by:	jmg
Reviewed by:	markj
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D18381
2018-12-01 16:50:12 +00:00
Konstantin Belousov
36e1b9702e Correct the tunable name in the message.
Submitted by:	 Andre Albsmeier <mail@fbsd.e4m.org>
PR:	231577
MFC after:	1 week
2018-12-01 16:43:18 +00:00
Mateusz Guzik
ddf6571230 amd64: align target memmove buffer to 16 bytes before using rep movs
See the review for sample test results.

Reviewed by:	kib (kernel part)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18401
2018-12-01 14:20:32 +00:00
Kristof Provost
b2e0b24f76 pf: Fix panic on overlapping interface names
In rare situations[*] it's possible for two different interfaces to have
the same name. This confuses pf, because kifs are indexed by name (which
is assumed to be unique). As a result we can end up trying to
if_rele(NULL), which panics.

Explicitly checking the ifp pointer before if_rele() prevents the panic.
Note pf will likely behave in unexpected ways on the the overlapping
interfaces.

[*] Insert an interface in a vnet jail. Rename it to an interface which
exists on the host. Remove the jail. There are now two interfaces with
the same name in the host.
2018-12-01 09:58:21 +00:00
Eric van Gyzen
984969cd96 Fix reporting of SS_ONSTACK
Fix reporting of SS_ONSTACK in nested signal delivery when sigaltstack()
is used on some architectures.

Add a unit test for this.  I tested the test by introducing the bug
on amd64.  I did not test it on other architectures.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D18347
2018-11-30 22:44:33 +00:00
Mateusz Guzik
94243af2da amd64: handle small memmove buffers with overlapping stores
Handling sizes of > 32 backwards will be updated later.

Reviewed by:	kib (kernel part)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18387
2018-11-30 20:58:08 +00:00
Michael Tuexen
c8b53ced95 Limit option_len for the TCP_CCALGOOPT.
Limiting the length to 2048 bytes seems to be acceptable, since
the values used right now are using 8 bytes.

Reviewed by:		glebius, bz, rrs
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18366
2018-11-30 10:50:07 +00:00
Andrey V. Elsukov
851073551d Adapt the fix in r341008 to correctly work with EBR.
IFNET_RLOCK_NOSLEEP() is epoch_enter_preempt() in FreeBSD 12+. Holding
it in sysctl_rtsock() doesn't protect us from ifnet unlinking, because
unlinking occurs with IFNET_WLOCK(), that is rw_wlock+sx_xlock, and it
doesn check that concurrent code is running in epoch section. But while
we are in epoch section, we should be able to do access to ifnet's
fields, even it was unlinked. Thus do not change if_addr and if_hw_addr
fields in ifnet_detach_internal() to NULL, since rtsock code can do
access to these fields and this is allowed while it is running in epoch
section.

This should fix the race, when ifnet_detach_internal() unlinks ifnet
after we checked it for IFF_DYING in sysctl_dumpentry.

Move free(ifp->if_hw_addr) into ifnet_free_internal(). Also remove the
NULL check for ifp->if_description, since free(9) can correctly handle
NULL pointer.

MFC after:	1 week
2018-11-30 10:36:14 +00:00
Emmanuel Vadot
fc6ebb297b arm64: allwinner: Add 792Mhz frequency to sun50i-a64-opp
This is the frequency of the cpu on the Pinebook so add it to make
cpufreq find the current setting.
Note that this dtbo on the Pinebook doesn't work right now as u-boot
dtb doesn't have symbols and so it fails to apply. Linux 4.20 have
the dts and will be imported once taggued.

MFC after:	1 month
X-MFC with:	r341268
2018-11-30 10:31:30 +00:00
Andrew Rybchenko
ad72d03040 sfxge(4): rollback last seen VLAN TCI if Tx packet is dropped
Early processing of a packet on transmit may change last seen
VLAN TCI in the queue context. If such a packet is eventually
dropped, last seen VLAN TCI must be set to its previous value.

Submitted by:   Ivan Malov <Ivan.Malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18288
2018-11-30 07:11:05 +00:00
Andrew Rybchenko
b162acfe52 sfxge(4): ensure EvQ poll stops when abort is requested
If an event handler requested an abort, only the inner loop was
guarenteed to be broken out of - the outer loop could continue
if total == batch.

Fix this by poisoning batch to ensure it is different to total.

Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18287
2018-11-30 07:10:54 +00:00
Andrew Rybchenko
c6831b0bcb sfxge(4): support Medford2
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18286
2018-11-30 07:10:43 +00:00
Andrew Rybchenko
f0a2945d38 sfxge(4): update external port number calculation
Revise the external port calculation to support all
X2 port modes. The previous algorithm could not
handle different port numbering schemes on each cage.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18285
2018-11-30 07:10:32 +00:00
Andrew Rybchenko
d707fb201e sfxge(4): correct annotations where NULL input is OK
Correct annotations where NULL input can be permitted

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18284
2018-11-30 07:10:20 +00:00
Andrew Rybchenko
405f7a36fe sfxge(4): support new link modes in the driver
Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18283
2018-11-30 07:10:09 +00:00
Andrew Rybchenko
f0095e1f86 sfxge(4): use transceiver ID when reading info
In efx_mcdi_phy_module_get_info() probe the
transceiver identification byte rather than assume
the module matches the fixed port type.  This
supports scenarios such as a SFP mounted in a QSFP
port via a QSA module.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18282
2018-11-30 07:09:58 +00:00
Andrew Rybchenko
cf94ca3704 sfxge(4): add accessor to whole link status
Add a function which makes an MCDI GET_LINK request and
packages up the results. Currently, the get-link function
is triggered from several entry points which then pass
on or store selected parts of the data. When the driver
needs to obtain the current link state, it is more
efficient to do this in a single call.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18281
2018-11-30 07:09:46 +00:00
Andrew Rybchenko
5a51b32e4c sfxge(4): guard Rx scale code with corresponding option
Previously only some of the code was guarded by this which caused
a build error when EFSYS_OPT_RX_SCALE is 0 (e.g. in manftest).

Submitted by:   Tom Millington <tmillington at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18280
2018-11-30 07:09:34 +00:00
Andrew Rybchenko
109f5727a0 sfxge(4): infer port mode bandwidth from max link speed
Limit the port mode bandwidth calculations by the maximum
reported link speed. This system detects 25G vs 10G cards,
and 100G port modes vs 40G.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18279
2018-11-30 07:09:23 +00:00
Andrew Rybchenko
c42b6a3560 sfxge(4): support improvements to bandwidth calculations
Change the interface to ef10_nic_get_port_mode_bandwidth()
so more NIC information can be used to infer bandwidth
requirements. Huntington calculations separated out
completely.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18278
2018-11-30 07:09:11 +00:00
Andrew Rybchenko
7e370b0ea7 sfxge(4): add X2 port modes to bandwidth calculator
Add cases for the new port modes supported by X2 NICs.
Lane bandwidth is calculated for pre-X2 cards so is an
underestimate for X2 in 25G/100G modes.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18277
2018-11-30 07:09:00 +00:00
Andrew Rybchenko
e12a751b0d sfxge(4): update to current port mode terminology
>From Medford onwards, the newer constants enumerating
port modes should be used.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18276
2018-11-30 07:08:50 +00:00
Andrew Rybchenko
3c3b954225 sfxge(4): adjust PHY module info interface
Adjust data types in interface to permit the complete
module information buffer to be obtained in a single
call.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18275
2018-11-30 07:08:38 +00:00
Andrew Rybchenko
0bc522c2d6 sfxge(4): expose PHY module device address constants
Rearrange so the valid addresses are visible to the caller.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18274
2018-11-30 07:08:27 +00:00
Andrew Rybchenko
a10539aa18 sfxge(4): make last byte of module information available
Adjust bounds so the interface supports reading
the last available byte of data.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18273
2018-11-30 07:08:16 +00:00
Andrew Rybchenko
1ad53cbf7e sfxge(4): add helper API to make Geneve filter spec
Submitted by:   Vijay Srivastava <vijays at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18272
2018-11-30 07:08:05 +00:00
Andrew Rybchenko
fe094a7cbc sfxge(4): fix MAC Tx stats for less or equal to 64 bytes
This statistic should include 64byte and smaller frames.
Fix EF10 calculation to match Siena code.

Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18271
2018-11-30 07:07:54 +00:00
Andrew Rybchenko
3f3f5d85c5 sfxge(4): modify phy caps to indicate FEC request
The capability bits to request FEC modes are implicitly valid
when the corresponding FEC mode is a supported capability.
Drivers expect that it is only valid to advertise those
capabilities explicitly marked as supported. The capabilities
reported by firmware is modified with the implicit capabilities
to present the explicit model to drivers.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18270
2018-11-30 07:07:43 +00:00
Andrew Rybchenko
6ddb48de77 sfxge(4): improve handling of legacy RSS hash flags
Client drivers may use either legacy flags, for example,
EFX_RX_HASH_TCPIPV4, or generalised flags, for example,
EFX_RX_HASH(IPV4_TCP, 4TUPLE), to configure RSS hash.
The libefx is able to recognise what scheme is used.

Legacy flags may be consumed directly by a chip-specific handler to
configure the NIC, that is, on EF10, these flags can be used to fill
in legacy RSS mode field in MCDI request. Generalised flags can also
be directly used in EF10-specific handler as they are fully compatible
with additional fields of the same MCDI request.

Legacy flags undergo conversion to generalised flags before they
are consumed by a chip-specific handler. This conversion is used to
make sure that chip-specific handlers expect only generalised flags
in the input for the sake of clarity of the code.

Depending on firmware capabilities, a chip-specififc handler either
supplies the input to the NIC directly, for example,
EFX_RX_HASH(IPV4_TCP, 4TUPLE) flag will enable 4 bits in
RSS_CONTEXT_SET_FLAGS_IN_TCP_IPV4_RSS_MODE field on EF10, or takes
the opportunity to translate the input to enable bits which don't map
to the generic flag, like setting
RSS_CONTEXT_SET_FLAGS_IN_TOEPLITZ_TCPV4_EN on EF10 when the firmware
claims no support for additional modes.

However, this approach has introduced a severe problem which can be
reproduced with ultra-low-latency firmware variant. In order to enable
IP hash, EF10-specific handler requires the user to request 2-tuple
hash for IP-other, TCP and UDP traffic classes, unconditionally.
In example, IPv4 hash can be enabled using the following input:
EFX_RX_HASH(IPV4_TCP, 2TUPLE) | EFX_RX_HASH(IPV4_UDP, 2TUPLE) |
EFX_RX_HASH(IPV4, 2TUPLE).
At the same time, on ultra-low-latency firmware, the common code will
never report support for any UDP tuple to the client driver. That is,
in the same example, the driver will use EFX_RX_HASH(IPV4_TCP, 2TUPLE) |
EFX_RX_HASH(IPV4, 2TUPLE). This input will not be recognised by
EF10-specific handler, and RSS_CONTEXT_SET_FLAGS_IN_TOEPLITZ_IPV4_EN
bit will not be set in the MCDI request.

In order to solve the problem, the patch removes conversion code
from chip-specific handlers and adds appropriate code to convert
EFX_RX_HASH() flags to their legacy counterparts to the common scale
mode set function. If the firmware does not support additional modes,
the function will convert generalised flags to legacy flags correctly
without any demand for UDP flags and pass the result to a chip-specific
handler.

Submitted by:   Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18269
2018-11-30 07:07:31 +00:00
Andrew Rybchenko
d085cfff22 sfxge(4): simplify the code to parse RSS hash type
RSS mode bits can be accessed a lot easier in the hash
type value provided that the variable type is uint32_t.
The macro helper can be removed to enhance readability.

Submitted by:   Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18268
2018-11-30 07:07:20 +00:00
Andrew Rybchenko
94a7dab5bf sfxge(4): check buffer size for hash flags
The efx_rx_scale_hash_flags_get interface is unsafe, as it does not
have an argument for the size of the output buffer used to return
the flags. While the only caller currently supplies a sufficiently
large buffer, this should be checked at runtime to avoid writing
past the end of the buffer.

Submitted by:   Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18267
2018-11-30 07:07:09 +00:00
Andrew Rybchenko
bd1be89ebe sfxge(4): use simpler code to check hash algorithm type
The API which is used to list supported hash flags verifies
hash algorithm choice before writing the output. This check
is based on a switch() statement which has only two options
and no distinctive actions to be conducted for each of them.
Use simpler code instead of switch() to improve readability.

Submitted by:   Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18266
2018-11-30 07:06:58 +00:00
Andrew Rybchenko
a8c1489fbb sfxge(4): add support to get active FEC type
Submitted by:   Vijay Srivastava <vijays at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18265
2018-11-30 07:06:46 +00:00
Andrew Rybchenko
5c6609f6f4 sfxge(4): fix a typo in unicast filter insertion comment
Submitted by:   Ivan Malov <ivan.malov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18264
2018-11-30 07:06:35 +00:00
Andrew Rybchenko
aea82ebf8a sfxge(4): prevent access to the NIC config before probe
NIC config is initialized during NIC probe.

Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18263
2018-11-30 07:06:24 +00:00
Andrew Rybchenko
39e58a98ba sfxge(4): fix ID retrieval in v3 licensing
Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18262
2018-11-30 07:06:13 +00:00
Andrew Rybchenko
b2053d8025 sfxge(4): add API to inform libefx of hardware removal
The efx_nic_hw_unavailable() checks ensure that if the NIC hardware
has failed or has been physically removed then libefx will stop
further attempts to access the hardware.

Add an interface for libefx clients to force unavailability, so the
hardware is treated as dead or removed even if still physically present.

Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18261
2018-11-30 07:06:01 +00:00
Andrew Rybchenko
c6d5e85dbe sfxge(4): add routine to check for hardware presence
Add efx_nic_hw_unavailable() routine to check for hardware presence
before continuing with NIC operations.

Submitted by:   Andy Moreton <amoreton at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18260
2018-11-30 07:05:49 +00:00
Andrew Rybchenko
315bbbaa7c sfxge(4): fix out of bounds read when dereferencing sdup
Introduce and use macro to make sure that MCDI buffers allocated
on stack are rounded up properly.

Submitted by:   Gautam Dawar <gdawar at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18259
2018-11-30 07:05:36 +00:00
Andrew Rybchenko
5037810f7e sfxge(4): add information if TSO workaround is required
In SF bug 61297 it's been confirmed that the hardware does not always
calculate the TCP checksum correctly with TSO sends.

The value of the Total Length field (IPv4) or Payload Length field
(IPv6) is the critical factor. We're sufficiently confident that if
these fields are zero then the checksum will be calculated correctly.

The information may be used by the drivers to check if the workaround is
required when FATSOv2 is implemented.

Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18258
2018-11-30 07:05:23 +00:00
Andrew Rybchenko
6b231fec92 sfxge(4): avoid usage of too big arrays on stack
Found by PreFAST static analysis.

Submitted by:   Martin Harvey <mharvey at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18257
2018-11-30 07:05:12 +00:00
Andrew Rybchenko
e919b7ec20 sfxge(4): generalise EF10 NVRAM buffer interface
The SFN driver's PartitionControl WMI object requires an API to parse
and filter partition data in TLV format, particularly for the Dynamic
Config partition. The ef10_nvram_buffer functions provide this
functionality but are tied to use with license partition only.
Modify functions so they are applicable to all TLV partitions and add
functions to support in-place tag modification.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18256
2018-11-30 07:05:00 +00:00
Andrew Rybchenko
cd5e337110 sfxge(4): add accessor for default port mode
Extend efx_mcdi_get_port_modes() to optionally pass on the default
port mode field. This provides a more direct way of handling the case
where the dynamic config does not specify the port mode than the
alternative of a lookup table indexed by MCFW subtype.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18255
2018-11-30 07:04:48 +00:00
Andrew Rybchenko
d86bef48a5 sfxge(4): add buffer editing functions to boot config
Functions to process the DHCP option list format used by the expansion
ROM config buffers, to support extracting and updating of individual
options.
The initial use case is the driver presenting the global and per-PF
options as separate items, with the driver implementing the
synchronization of global options across the configuration buffers
for all PFs.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18254
2018-11-30 07:04:37 +00:00
Andrew Rybchenko
b4d3f02ea2 sfxge(4): add API to retrieve sensor limits
Submitted by:   Martin Harvey <mharvey at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18253
2018-11-30 07:04:25 +00:00
Andrew Rybchenko
9d3487a623 sfxge(4): check size of memory to read sensors data to
Size of provided memory should be consistent with specified size.

Submitted by:   Martin Harvey <mharvey at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18252
2018-11-30 07:04:13 +00:00
Andrew Rybchenko
d515a203d8 sfxge(4): add generated description of sensors
Description of sensors is generated from firmware sources.

Submitted by:   Martin Harvey <mharvey at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18251
2018-11-30 05:54:30 +00:00
Andrew Rybchenko
4c7d5ddbe4 sfxge(4): remove probes when a Tx queue is too full
No need for probe messages when a TxQ is too full for a post to be done.

Existing drivers check if there is room in the queue before posting
descriptors, even though efx_tx_qdesc_post() does the check itself.

The new SFN Windows driver doesn't perform the check before calling
efx_tx_qdesc_post(), but that means these probes can get frequently
printed out. It's normal driver behaviour so there's no need to print
an error.

Submitted by:   Mark Spender <mspender at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18250
2018-11-30 05:54:19 +00:00
Andrew Rybchenko
21b72677d9 sfxge(4): refactor monitors support
Remove obsolete monitor types since Falcon SFN4000 series adapters
no longer supported by libefx.
Rename MCDI monitors to be consistent with YML.
The code may be simplified and generalized since only MCDI monitors
remain.

Submitted by:   Martin Harvey <mharvey at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18249
2018-11-30 05:54:07 +00:00
Andrew Rybchenko
383a1cce7a sfxge(4): move empty efsys definitions to EFX headers
Move empty definitions for platform-specific annotations from efsys.h
to EFX headers.

Submitted by:   Martin Harvey <mharvey at solarflare.com>
Submitted by:   Andrew Lee <alee at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18248
2018-11-30 05:50:01 +00:00
Eric van Gyzen
5e38e3f5eb Include path for tmpfs objects in vm.objects sysctl
This applies the fix in r283924 to the vm.objects sysctl
added by r283624 so the output will include the vnode
information (i.e. path) for tmpfs objects.

Reviewed by:	kib, dab
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:59:43 +00:00
Eric van Gyzen
0951bd362c Add assertions and comment to vm_object_vnode()
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D2724
2018-11-30 04:18:31 +00:00
Cy Schubert
a85c2efc11 Clean up a rather useless conditional structure member definition.
MFC after:	1 week
2018-11-30 04:15:56 +00:00
Cy Schubert
a9af9f073c Clean up a redundant non-redefinition of IFNAMSIZ. IFNAMSIZ
is defined in net/if.h, therefore the condition is never met and
confusing to those who follow.

MFC after:	1 month
2018-11-30 04:15:42 +00:00
Mateusz Guzik
2847cfce54 amd64: remove stale attribution for memmove work
While the routine started as expanded bcopy, it is now entirely rewritten.

Sponsored by:	The FreeBSD Foundation
2018-11-30 00:47:36 +00:00
Mateusz Guzik
dd219e5ea5 amd64: tidy up copying backwards in memmove
For non-ERMS case the code used handle possible trailing bytes with
movsb first and then followed it up with movsq. This also happened
to alter how calculations were done for other cases.

Handle the tail with regular movs, just like when copying forward.
Use leaq to calculate the right offset from the get go, instead of
doing separate add and sub.

This adjusts the offset for non-rep cases so that they can be used
to handle the tail.

The routine is still a work in progress.

Sponsored by:	The FreeBSD Foundation
2018-11-30 00:45:10 +00:00
John Baldwin
31562c4440 Make most of the CLIP code conditional on #ifdef INET6.
This fixes builds of kernels without INET6 such as LINT-NOINET6.

Reported by:	arybchik
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D18384
2018-11-29 23:14:54 +00:00
Emmanuel Vadot
a69c1d42cc arm64: allwinner: Add a dtbo to have cpu operating points
This enables cpufreq on A64 boards.

MFC after:	1 month
2018-11-29 22:35:23 +00:00
Cy Schubert
f0fea829d0 Remove an old comment/code and replace with a comment that
directly references a NetBSD commit.

MFC after:	1 week
2018-11-29 21:20:53 +00:00
Brooks Davis
f373437a01 Add helper functions to copy strings into struct image_args.
Given a zeroed struct image_args with an allocated buf member,
exec_args_add_fname() must be called to install a file name (or NULL).
Then zero or more calls to exec_args_add_env() followed by zero or
more calls to exec_args_add_env(). exec_args_adjust_args() may be
called after args and/or env to allow an interpreter to be prepended to
the argument list.

To allow code reuse when adding arg and env variables, begin_envv
should be accessed with the accessor exec_args_get_begin_envv()
which handles the case when no environment entries have been added.

Use these functions to simplify exec_copyin_args() and
freebsd32_exec_copyin_args().

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15468
2018-11-29 21:00:56 +00:00
Konstantin Belousov
7d2b0bd7d7 If BENEATH is specified, always latch the topping directory vnode.
It is possible that we started with a relative path but during the
lookup, found an absolute symlink.  In this case, BENEATH handling
code needs the latch, but it is too late to calculate it.

While there, somewhat improve the assertions.  Clear the NI_LCF_LATCH
flag when the latch vnode is released, so that asserts know the state.
Assert that there is a latch if we entered beneath+abs path mode,
after the starting point is processed.

Reported by:	wulf
With more input from:	pho
Sponsored by:	The FreeBSD Foundation
2018-11-29 19:13:10 +00:00
Emmanuel Vadot
fc06a872ec arm64: rockchip: armclk: Do not change parent freq if CLK_SET_DRYRUN is set
MFC after:	3 days
2018-11-29 19:11:35 +00:00
Emmanuel Vadot
baec4d5985 extres: clk: Fix clk_set_assigned
ofw_bus_parse_xref_list_get_length doesn't returns the number of elements, fix this.
While here when setting the clock to the assigned freqeuncy, allow the clock
driver to round down or up the frequency as sometimes the exact frequency cannot
be obtain.
2018-11-29 19:06:05 +00:00
Mark Johnston
e31fc3ab13 Update the free page count when blacklisting pages.
Otherwise the free page count will not accurately reflect the physical
page allocator's state.  On 11 this can trigger panics in
vm_page_alloc() since the allocator state and free page count are
updated atomically and we expect them to stay in sync.  On 12 the
bug would manifest as threads looping in vm_page_alloc().

PR:		231296
Reported by:	mav, wollman, Rainer Duffner, Josh Gitlin
Reviewed by:	alc, kib, mav
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18374
2018-11-29 16:31:01 +00:00
Mateusz Guzik
1f6ad48c76 vfs: fix i386 build after r341220 2018-11-29 09:54:27 +00:00
Mateusz Guzik
22443809ff cache: retire cache_enter compat schim
It was added over 6 years ago for binary compat. cache_enter macro remains
as it expands to cache_enter_time.

Sponsored by:	The FreeBSD Foundation
2018-11-29 09:32:59 +00:00
Mateusz Guzik
c0282e1e07 audit: predict AUDITING_TD as false
By default it is compiled in and disabled.

Sponsored by:	The FreeBSD Foundation
2018-11-29 09:19:48 +00:00
Mateusz Guzik
712775843f vfs: drop spurious memcpy in stat
Sponsored by:	The FreeBSD Foundation
2018-11-29 09:04:10 +00:00
Mateusz Guzik
d47f3fdb0a fd: unify fd range check across the routines
While here annotate out of range as unlikely.

Sponsored by:	The FreeBSD Foundation
2018-11-29 08:53:39 +00:00
Mateusz Guzik
159dcc30a5 audit: change audit_syscalls_enabled type to bool
So that it fits better in __read_frequently.

Sponsored by:	The FreeBSD Foundation
2018-11-29 08:37:33 +00:00
Andrew Rybchenko
e9bc5a34f4 sfxge(4): add more definitions of partitions
Add definitions of dynamic config and expansion ROM backup
partitions.

Submitted by:   Paul Fox <pfox at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18247
2018-11-29 06:47:41 +00:00
Andrew Rybchenko
b288270efc sfxge(4): fix build because of no declaration
Functions declared in mcdi_mon.h are implemented in mcdi_mon.c.
The build fails if compiler options require declaration before definition.

Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18246
2018-11-29 06:47:30 +00:00
Andrew Rybchenko
dbcc3c8f70 sfxge(4): fix SAL annotation for input buffers
Submitted by:   Martin Harvey <mharvey at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18245
2018-11-29 06:47:19 +00:00
Andrew Rybchenko
e4ddd4ccb3 sfxge(4): fix PreFAST warnings because of unused return
Submitted by:   Martin Harvey <mharvey at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18244
2018-11-29 06:47:06 +00:00
Andrew Rybchenko
aea9d093f2 sfxge(4): add Medford2 head-of-line blocking stats
These stats are availble on Medford2 DPDK firmware variant
which support equal stride super-buffer Rx mode. RXDP_HLB_IDLE
capability bit is set when the stats are available.

Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18243
2018-11-29 06:46:55 +00:00
Andrew Rybchenko
ef8967c7d2 sfxge(4): support RxDP scatter disabled truncate counter
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18242
2018-11-29 06:46:44 +00:00
Andrew Rybchenko
c27e7228d5 sfxge(4): generate Medford2 RxDP stats
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18241
2018-11-29 06:46:33 +00:00
Andrew Rybchenko
5c2f9d6a49 sfxge(4): get max supported value for action MARK
The mark value for MATCH_ACTION_MARK has a maximum value.
Requesting a value larger than the maximum will cause the
filter insertion to fail with EINVAL. This patch allows the
driver to check the value at the filter validation.

Submitted by:   Roman Zhukov <roman.zhukov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18240
2018-11-29 06:46:21 +00:00
Andrew Rybchenko
fc9798c79a sfxge(4): support MARK and FLAG actions in filters
This patch adds support for DPDK rte_flow "MARK" and "FLAG" filter
actions to filters on EF10 family NICs.

Submitted by:   Roman Zhukov <roman.zhukov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18239
2018-11-29 06:46:10 +00:00
Andrew Rybchenko
6e1ebbe9e2 sfxge(4): get actions MARK and FLAG support
Filter actions MARK and FLAG are supported on Medford2 by DPDK
firmware variant.

Submitted by:   Roman Zhukov <roman.zhukov at oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18238
2018-11-29 06:46:01 +00:00
Andrew Rybchenko
d222b61743 sfxge(4): add equal stride super-buffer prefix layout
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18237
2018-11-29 06:45:50 +00:00
Andrew Rybchenko
04381b5e29 sfxge(4): support equal stride super-buffer Rx mode
Equal stride super-buffer Rx mode is supported by DPDK firmware
variant. One Rx descriptor provides many Rx buffers to firmware.
Rx buffers follow each other with specified stride.
Also it supports head of line blocking with timeout to address
drops when no Rx descriptors are available. So it gives extra time
to the driver to provide Rx descriptors before drop.

Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18236
2018-11-29 06:45:38 +00:00
Andrew Rybchenko
ceeff9b1a1 sfxge(4): detect equal stride super-buffer support
Equal stride super-buffer Rx mode is supported on Medford2 by
DPDK firmware variant.

Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18235
2018-11-29 06:45:26 +00:00
Andrew Rybchenko
2a726a7f94 sfxge(4): make RxQ type data an union
The type is an internal interface. Single integer is insufficient
to carry RxQ type-specific information in the case of equal stride
super-buffer Rx mode (packet buffers per bucket, maximum DMA length,
packet stride, head of line block timeout).

Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18234
2018-11-29 06:45:15 +00:00
Andrew Rybchenko
aed78107bb sfxge(4): update autogenerated MCDI and TLV headers
Equal stride super-buffer is a new name instead of deprecated equal
stride packed stream to avoid confusion with previous packed stream.

Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18233
2018-11-29 06:45:04 +00:00