Commit Graph

241912 Commits

Author SHA1 Message Date
Marcin Wojtas
32f63fa7f9 Split ENA reset routine into restore and destroy stages
For alignment with Linux driver and better handling ena_detach(), the
reset is now calling ena_device_restore() and ena_device_destroy().

The ena_device_destroy() is also being called on ena_detach(), so the
code will be more readable.

The watchdog is now being activated after reset only, if it was active
before.

There were added additional checks to ensure, that there is no race with
the link state change AENQ handler.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:39:25 +00:00
Marcin Wojtas
fd43fd2af0 Use bitfield for storing global ENA device states
As the ENA can have multiple states turned on/off, it is more convenient
to store them in single bitfield instead of multiple boolean variables.

The bitset FreeBSD API was used for the bitfield implementation, as it
provides flexible structure together with API which also supports atomic
bitfield operations.

For better readability basic macros from API were wrapped into custom
ENA_FLAG_* macros, which are filling up common parameters for all calls.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:37:15 +00:00
Marcin Wojtas
804402a54e Fix error handling when ENA reset fails
Before the patch, error handling was not releasing all resources and
was not issuing device reset if the reset task failed.

That could cause memory leak and fault of the device.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:35:43 +00:00
Marcin Wojtas
460212715f Fill bdf field of the host_info structure in ENA
The host info bdf field is the abbreviation for the bus, device,
function of the PCI on which the device is being attached to.

Now the driver is filling information about that using FreeBSD RID
resource.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:35:02 +00:00
Marcin Wojtas
af66d7d029 Add additional doorbells on ENA Tx path
The new ENA HAL is introducing API, which can determine on Tx path if
the doorbell is needed.

That way, it can tell the driver, that it should call an doorbell.
The old threshold value wasn't removed, as not all HW is supporting this
feature - so it was reworked to also work with the new API.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:33:31 +00:00
Marcin Wojtas
82f5a7921c Limit maximum size of Rx refill threshold in ENA
The Rx ring size can be as high as 8k. Because of that we want to limit
the cleanup threshold by maximum value of 256.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:31:35 +00:00
Marcin Wojtas
4fa9e02d9b Add support for the LLQv2 and WC in ENA
LLQ (Low Latency Queue) is the feature, that allows pushing header
directly to the device through PCI before even DMA is triggered.

It reduces latency, because device can start preparing packet before
payload is sent through DMA.

To speed up sending data through PCI, the Write Combining is enabled,
which allows hardware to buffer data before sending them on the PCI - it
allows to reduce number of PCI IO operations.

ENAv2 is using special descriptor for the negotiation of the LLQ.
Currently, only the default configuration is supported.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:30:52 +00:00
Marcin Wojtas
5cb9db0706 Lock optimization in ENA
Handle IO interrupts using filter routine. That way, the main cleanup
task could be moved to the separate thread using taskqueue.

The deferred Rx cleanup task was removed, and now the cleanup task is
begin called instead. That way, the Rx lock could be removed.

In addition, Queue management (wake up and stop TX ring) was added, so
the TX cleanup task can be performed mostly lockless.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:29:24 +00:00
Marcin Wojtas
6064f2899f Add tuneable drbr ring size and hw queues depth for ENA
The driver now supports per adapter tuning of buffer ring size and HW Rx
ring size.

It can be achieved using sysctl node dev.ena.X.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:28:03 +00:00
Marcin Wojtas
4e30699966 Fix error in validate_tx_req_id() in ENA
If the requested ID was out of range, the tx_info structure was NULL and
the function was trying to access the field of the NULL object.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:26:18 +00:00
Marcin Wojtas
c115a1e258 Change attach order to prevent crash upon failure in ENA
The if_detach was causing crash if the MSI-x configuration in the attach
failed. To prevent this issue, the ifnet is being configured at the end
of the attach function.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:24:47 +00:00
Marcin Wojtas
9151c55d02 Change order of ifp release on ENA detach
In rare case, when the ifconfig is called just before kldunload, it is
possible, that ena_up routine will be called after queue locks are
released.

To prevent that, ifp is detached before the last ena_down is called and
further, the ifp is freed at the end of the function.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:22:53 +00:00
Marcin Wojtas
2b5b60fe0d Check for number of MSI-x upon partial allocation in ENA
The ENA driver needs at least 2 MSI-x - one for admin queue, and one for
IO queues pair. If there were not enough resources to allocate more than
one MSI-x, the device should not be attached.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:22:12 +00:00
Marcin Wojtas
469a84079c Set error value when allocation of IO irq fails in ENA
bus_alloc_resource_any() is not returning error value in case of an
error.
If the function call fails, the error value was not passed to the
ena_up() function.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:20:42 +00:00
Marcin Wojtas
5b14f92e6c Set vaddr and paddr as NULL when DMA alloc fails in ENA
To prevent errors from assigning values from the DMA structure in case
of an error, zero the vaddr and paddr values upon failure.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:19:32 +00:00
Marcin Wojtas
e80737381f Fix DMA synchronization in the ENA driver Tx and Rx paths
The DMA in FreeBSD requires explicit synchronization. ENA driver was
only doing PREREAD and PREWRITE synchronizations. Missing
bus_dmamap_sync() calls were added.

It is also required to synchronize DMA engine before unloading DMA map.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:18:23 +00:00
Marcin Wojtas
d12f7bfc17 Check for missing MSI-x and Tx completions in ENA
If the first MSI-x won't be executed, then the timer service will detect
that and trigger device reset.

The checking for missing Tx completion was reworked, so it will also
check for missing interrupts. Checking number of missing Tx completions
can be performed after loop, instead of checking it every iteration.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:16:56 +00:00
Marcin Wojtas
8ece6b25de Fill number of CPUs field on ENA host_info structure
The new ena_com allows the number of CPUs to be passed to the device in
the host info structure as a hint.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:15:38 +00:00
Marcin Wojtas
e3cecf70c3 Print ENA Tx error conditionally
Information about Tx error should be only displayed, if packet
preparation failed due to error other than out of memory.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:14:58 +00:00
Marcin Wojtas
c9b099ec94 Trigger reset in ENA if there are too many Rx descriptors
Whenever the driver will receive too many descriptors from the device,
it should trigger the device reset, as it is indicating that the device
is in invalid state.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:13:15 +00:00
Marcin Wojtas
277f11c401 Remove RSS support in ENA
Receive Side Scaling is optional feature that could be enabled in kernel
configuration by defining flag RSS.

Kernel uses hash to store and find protocol control block which is
stored in hash tables.
Kernel and NIC hash functions must be consistent. Otherwise case lookup
fails.

To achieve this kernel provides API to set proper hash key to NIC.
As it is not possible to change key for virtual ENA NIC, this driver
cannot support RSS function.

ENA is designed to work in virtual environments so supporting hardware
version of this card is unnecessary.

Submitted by:  Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:12:14 +00:00
Marcin Wojtas
40621d71fd Add notification AENQ handler for ENA
Notification AENQ handler is responsible for handling requests from ENA
device. Missing Tx threshold, Tx timeout and keep alive timeout can be
set using hints from the aenq descriptor which can be delivered in the
ENA admin notification.

The queue suspending and resuming tasks are not supported by the
driver.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:09:53 +00:00
Marcin Wojtas
e6de9a8384 Print information when ENA admin error occurs
ENA_ADMIN_FATAL_ERROR and ENA_ADMIN_WARNING aenq groups were indicated
as supported, so the unimplemented_aenq_handler() will print out error
message, whenever an error will occur within the ENA admin context.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:08:00 +00:00
Marcin Wojtas
b8ca5dbe9e Do not specify active media type in ENA
As the ENA is working only in virtualized environment, the active media
is not specified. Instead, the active link type is set as unknown.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:06:07 +00:00
Marcin Wojtas
67ec48bb3a Adjust ENA driver to the new ena-com
Recent HAL change preparing to support ENAv2 required minor driver
modifications.

The ena_com_sq_empty_space() is not available in this ena-com, so it had
to be replaced with ena_com_free_desc().

Moreover, the ena_com_admin_init() is no longer using 3rd argument
indicating if the spin lock should be initialized, so it was removed.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:01:46 +00:00
Marcin Wojtas
b6ee6cf7ab Update ena-com for the ENAv2 version
The new ena-com is required for the ENAv2 release of the driver.

Genmask macro was fixed in the platform file.

The GENMASK macro is intended to work on 32-bit values. Before this
patch the, calling GENMASK(31, 0) would cause 32-bit value to be
shifted by 32. This could lead to unexpected behavior, as shifting by
the number greater or equal to word length is undefined operation.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 12:27:55 +00:00
Xin LI
12d62cc2d7 Unexpand be32dec().
MFC after:	2 weeks
2019-05-30 02:23:57 +00:00
Jayachandran C.
33e4f8169d arm64 gicv3_its: Fix a typo
Fix 'Cavium' spelling in errata description.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D20418
2019-05-30 01:39:07 +00:00
Jayachandran C.
55d9048856 gicv3_its: do LPI init only once per CPU
The initialization required for LPIs (setting up pending tables etc.)
has to be done just once per CPU, even in the case where there are
multiple ITS blocks associated with the CPU.

Add a flag lpi_enabled in the per-cpu distributor info for this and
use it to ensure that we call its_init_cpu_lpi() just once.

This enables us to support platforms where multiple GIC ITS blocks
can generate LPIs to a CPU.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19844
2019-05-30 01:32:00 +00:00
Jayachandran C.
db359ad3d6 gicv3_its: refactor LPI init into a new function
Move the per-cpu LPI intialization to a separate function. This is
in preparation for a commit that does LPI init only once for a CPU,
even when there are multiple ITS blocks associated with the CPU.

No functional changes in this commit.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19843
2019-05-30 01:24:47 +00:00
Jayachandran C.
a9b702ddde gic_v3: consolidate per-cpu redistributor information
Update 'struct gic_redists' to consolidate all per-cpu redistributor
information into a new 'struct redist_pcpu'. Provide a new interface
(GICV3_IVAR_REDIST) for the GIC driver, which can be used to retrieve
the per-cpu data.

This per-cpu redistributor struct will be later used to improve the
GIC ITS setup.

While there, remove some unused fields in gic_v3_var.h interface.
No functional changes.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19842
2019-05-30 01:21:08 +00:00
Ravi Pokala
ffac5d814c Add bits related to SANITIZE, SED, and form-factor to (struct ata_params)
Based on ATA-ACS-4, recognize several bit-fields related to the ATA SANITIZE
feature-set, Self-Encrypting Drives, and form-factor identification.

As part of this change, the name of word 48 of (struct ata_params) is being
changed. The previous name, "usedmovsd" does not appear to be related to the
previous definition of the word ("double-word IO supported"). The word was
defined that way in ATA-1 (1994), but it was marked "Reserved" (meaning
"unused, but might be used in the future") in ATA-2 (1996). It stayed that
way until ATA-8 (2008), which re-defined it as implemented in this change.
The field is not used in-tree.

Reviewed by:	mav
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D20455
2019-05-29 23:50:31 +00:00
Gleb Smirnoff
4a9f6ba75b In r343857 the referred comment moved to uma_vm_zone_stats(). 2019-05-29 22:33:37 +00:00
Eric Joyner
668d6dbb4c iflib: provide probe wrapper for vendor drivers
From Jake:
Vendor drivers that exist out-of-tree generally should return
BUS_PROBE_VENDOR from their device probe functions. This helps ensure
that a vendor replacement driver will supersede the in-kernel driver for
a given device.

Currently, if a vendor wants to implement a driver based on iflib, it
will always report BUS_PROBE_DEFAULT.

Add a wrapper function, iflib_device_probe_vendor() which can be used in
place of iflib_device_probe(). This function will just return
BUS_PROBE_VENDOR whenever iflib_device_probe() would return
BUS_PROBE_DEFAULT.

While vendor drivers can already implement such a wrapper themselves,
providing it in the iflib.h header makes it easier for the vendor driver
to do the right thing.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, marius@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D20221
2019-05-29 22:24:10 +00:00
Allan Jude
ad579d984f Fix assertion in ZFS TRIM code
Due to an attempt to check two conditions at once in a macro not designed
as such, the assertion would always evaluate to true.

#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) do { \
        const TYPE __left = (TYPE)(LEFT); \
        const TYPE __right = (TYPE)(RIGHT); \
        if (!(__left OP __right)) \
                assfail3(#LEFT " " #OP " " #RIGHT, \
                        (uintmax_t)__left, #OP, (uintmax_t)__right, \
                        __FILE__, __LINE__); \
_NOTE(CONSTCOND) } while (0)
#define ASSERT3U(x, y, z)       VERIFY3_IMPL(x, y, z, uint64_t)

Mean that we compared:
left = (type == ZIO_TYPE_FREE || psize)
OP = "<="
right = (SPA_MAXBLOCKSIZE)

If the type was not FREE, 0 is less than SPA_MAXBLOCKSIZE (16MB)
If the type is ZIO_TYPE_FREE, 1 is less than SPA_MAXBLOCKSIZE
The constraint on psize (physical size of the FREE operation) is never
checked against SPA_MAXBLOCKSIZE

Reported by:	Ka Ho Ng <khng300@gmail.com>
Reviewed by:	kevans
MFC after:	2 weeks
Sponsored by:	Klara Systems
2019-05-29 20:34:35 +00:00
Li-Wen Hsu
6c9e56b231 Add the likely missing braces in ips(4). This is found by gcc warning that
the code is not guarded by the if clause and has misleading indentation.

Approved by:	scottl
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20427
2019-05-29 18:11:17 +00:00
Ruslan Bukin
33da49cd2e Don't copy the data from bounce buffer back to the mbuf if channel does
not use bounce buffering.

Sponsored by:	DARPA, AFRL
2019-05-29 16:01:34 +00:00
Ruslan Bukin
7b4ec8d2fc Pass pci_base address instead of physical address to rman_manage_region().
This should had been part of r347930 ("pci: ecam: Correctly parse memory
and IO region").

Sponsored by:	DARPA, AFRL
2019-05-29 15:53:33 +00:00
Konstantin Belousov
bbc08c6116 Remove "All rights reserved." from FF copyright.
Requested by:	rgrimes
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-05-29 14:26:35 +00:00
Konstantin Belousov
ab74c84333 Do not go into sleep in sleepq_catch_signals() when SIGSTOP from
PT_ATTACH was consumed.

In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is
non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero
return code to EINTR.

Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was
delivered right after the thread was added to the sleepqueue but not
yet really sleep, and cursig() caused debugger attach, the thread
sleeps instead of returning to the userspace boundary with EINTR.

PR: 231445
Reported by:	Efi Weiss <valmarelox@gmail.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20381
2019-05-29 14:05:27 +00:00
Konstantin Belousov
21db5fcc9d Add posixshmcontrol(1) page.
Reviewed by:	emaste
With input by:	danfe
Sponsored by:	The FreeBSD Foundation (kib)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D20430
2019-05-29 13:51:18 +00:00
Andriy Gapon
fec2f12ebd revert r273728 and parts of r306589, iicbus no-stop by default feature
Since drm2 removal, there has not been any consumer of the feature in the
tree.  I am also unaware of any out-of-tree consumer.
More importantly, the feature has been broken from the very start, both
before and after r306589, because the ivar was set on a device that does
not support it and it was read from another device that also does not
support it.

A bus-wide no-stop flag cannot be implemented as an ivar as iicbus
attaches as a child of various drivers.  Implementing the ivar in each
and every I2C driver is just impractical.

If we ever want to implement this feature properly, then probably the
easiest way to do it would be via a flag in the softc of iicbus.
In fact, we might have to do that in the stable branches if we want to
fix the code for them.

Reported by:	ian (long time ago)
MFC after:	1 month (maybe)
X-MFC-note:	cannot just merge the change, must keep drm2 happy
2019-05-29 09:08:20 +00:00
Toomas Soome
93a2d4c92f loader: malloc+memset is calloc in spa_create
Replace malloc + memset pair with calloc.
2019-05-29 07:33:51 +00:00
Toomas Soome
f28f385b9c boot1.efi should also provide Calloc
boot1.efi does provide Malloc and Free, we also need Calloc.
2019-05-29 07:32:43 +00:00
Toomas Soome
51e5c6b89e loader: zfs_alloc and zfs_free should use panic
The zfs alloc and free code print out the error and get stuck in infinite loop; use panic() instead.
2019-05-29 07:24:10 +00:00
Gleb Smirnoff
bec2d7e9a2 The KVM code also needs a fix similar to r344269.
Reported by:	pho
2019-05-29 03:14:46 +00:00
Justin Hibbits
78473c580b Update __FreeBSD_version and Makefile check for r348347
libdwarf needs forcibly rebuilt after r348347.
2019-05-29 02:26:15 +00:00
Pedro F. Giffuni
ec845b07c6 typo: suppported. 2019-05-29 02:08:23 +00:00
Justin Hibbits
d9a48fc632 Add missing powerpc64 relocation support to libdwarf
Summary:
Due to missing relocation support in libdwarf for powerpc64, handling of dwarf
info on unlinked objects was bogus.

Examining raw dwarf data on objects compiled on ppc64 with a modern compiler
(in-tree gcc tends to hide the issue, since it only rarely generates relocations
in .debug_info and uses DW_FORM_str instead of DW_FORM_strp for everything), you
will find that the dwarf data appears corrupt, with repeated references to the
compiler version where things like types and function names should appear.

This happens because the 0 offset of .debug_str contains the compiler version,
and without applying the relocations, *all* indirect strings in .dwarf_info will
end up pointing to it.

This corruption then propogates to the CTF data, as ctfconvert relies on
libdwarf to read the dwarf info, for every compiled object (when building a
kernel.)

However, if you examine the dwarf data on a compiled executable, it will appear
correct, because during final link the relocations get applied and baked in by
the linker.

Submitted by:	Brandon Bergren
Reviewed By:	emaste
Differential Revision: https://reviews.freebsd.org/D20367
2019-05-29 02:02:56 +00:00
Kyle Evans
d8b985430c if_bridge(4): Complete bpf auditing of local traffic over the bridge
There were two remaining "gaps" in auditing local bridge traffic with
bpf(4):

Locally originated outbound traffic from a member interface is invisible to
the bridge's bpf(4) interface. Inbound traffic locally destined to a member
interface is invisible to the member's bpf(4) interface -- this traffic has
no chance after bridge_input to otherwise pass it over, and it wasn't
originally received on this interface.

I call these "gaps" because they don't affect conventional bridge setups.
Alas, being able to establish an audit trail of all locally destined traffic
for setups that can function like this is useful in some scenarios.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19757
2019-05-29 01:08:30 +00:00