In r349154, random device reads of size < 16 bytes (AES block size) were
accidentally broken to loop forever. Correct the loop condition for small
reads.
Reported by: pho
Reviewed by: delphij
Approved by: secteam(delphij)
Differential Revision: https://reviews.freebsd.org/D20686
This adds ACPI device path on devinfo(8) output and
show value of _UPC(usb port capabilities), _PLD (physical location of device)
when hw.usb.debug >= 1 .
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D20630
Add experimental feature to increase concurrency in Fortuna. As this
diverges slightly from canonical Fortuna, and due to the security
sensitivity of random(4), it is off by default. To enable it, set the
tunable kern.random.fortuna.concurrent_read="1". The rest of this commit
message describes the behavior when enabled.
Readers continue to update shared Fortuna state under global mutex, as they
do in the status quo implementation of the algorithm, but shift the actual
PRF generation out from under the global lock. This massively reduces the
CPU time readers spend holding the global lock, allowing for increased
concurrency on SMP systems and less bullying of the harvestq kthread.
It is somewhat of a deviation from FS&K. I think the primary difference is
that the specific sequence of AES keys will differ if READ_RANDOM_UIO is
accessed concurrently (as the 2nd thread to take the mutex will no longer
receive a key derived from rekeying the first thread). However, I believe
the goals of rekeying AES are maintained: trivially, we continue to rekey
every 1MB for the statistical property; and each consumer gets a
forward-secret, independent AES key for their PRF.
Since Chacha doesn't need to rekey for sequences of any length, this change
makes no difference to the sequence of Chacha keys and PRF generated when
Chacha is used in place of AES.
On a GENERIC 4-thread VM (so, INVARIANTS/WITNESS, numbers not necessarily
representative), 3x concurrent AES performance jumped from ~55 MiB/s per
thread to ~197 MB/s per thread. Concurrent Chacha20 at 3 threads went from
roughly ~113 MB/s per thread to ~430 MB/s per thread.
Prior to this change, the system was extremely unresponsive with 3-4
concurrent random readers; each thread had high variance in latency and
throughput, depending on who got lucky and won the lock. "rand_harvestq"
thread CPU use was high (double digits), seemingly due to spinning on the
global lock.
After the change, concurrent random readers and the system in general are
much more responsive, and rand_harvestq CPU use dropped to basically zero.
Tests are added to the devrandom suite to ensure the uint128_add64 primitive
utilized by unlocked read functions to specification.
Reviewed by: markm
Approved by: secteam(delphij)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20313
rename the source to gsb_crc32.c.
This is a prerequisite of unifying kernel zlib instances.
PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20193
names. I.e., everything related to pwm now goes in /dev/pwm. This will
make it easier for userland tools to turn an unqualified name into a fully
qualified pathname, whether it's the base pwmcX.Y name or a label name.
At a basic level, remove assumptions about the underlying algorithm (such as
output block size and reseeding requirements) from the algorithm-independent
logic in randomdev.c. Chacha20 does not have many of the restrictions that
AES-ICM does as a PRF (Pseudo-Random Function), because it has a cipher
block size of 512 bits. The motivation is that by generalizing the API,
Chacha is not penalized by the limitations of AES.
In READ_RANDOM_UIO, first attempt to NOWAIT allocate a large enough buffer
for the entire user request, or the maximal input we'll accept between
signal checking, whichever is smaller. The idea is that the implementation
of any randomdev algorithm is then free to divide up large requests in
whatever fashion it sees fit.
As part of this, two responsibilities from the "algorithm-generic" randomdev
code are pushed down into the Fortuna ra_read implementation (and any other
future or out-of-tree ra_read implementations):
1. If an algorithm needs to rekey every N bytes, it is responsible for
handling that in ra_read(). (I.e., Fortuna's 1MB rekey interval for AES
block generation.)
2. If an algorithm uses a block cipher that doesn't tolerate partial-block
requests (again, e.g., AES), it is also responsible for handling that in
ra_read().
Several APIs are changed from u_int buffer length to the more canonical
size_t. Several APIs are changed from taking a blockcount to a bytecount,
to permit PRFs like Chacha20 to directly generate quantities of output that
are not multiples of RANDOM_BLOCKSIZE (AES block size).
The Fortuna algorithm is changed to NOT rekey every 1MiB when in Chacha20
mode (kern.random.use_chacha20_cipher="1"). This is explicitly supported by
the math in FS&K §9.4 (Ferguson, Schneier, and Kohno; "Cryptography
Engineering"), as well as by their conclusion: "If we had a block cipher
with a 256-bit [or greater] block size, then the collisions would not
have been an issue at all."
For now, continue to break up reads into PAGE_SIZE chunks, as they were
before. So, no functional change, mostly.
Reviewed by: markm
Approved by: secteam(delphij)
Differential Revision: https://reviews.freebsd.org/D20312
Add some basic regression tests to verify behavior of both uint128
implementations at typical boundary conditions, to run on all architectures.
Test uint128 increment behavior of Chacha in keystream mode, as used by
'kern.random.use_chacha20_cipher=1' (r344913) to verify assumptions at edge
cases. These assumptions are critical to the safety of using Chacha as a
PRF in Fortuna (as implemented).
(Chacha's use in arc4random is safe regardless of these tests, as it is
limited to far less than 4 billion blocks of output in that API.)
Reviewed by: markm
Approved by: secteam(gordon)
Differential Revision: https://reviews.freebsd.org/D20392
Previously, there was a pwmc instance for each instance of pwm hardware
regardless of how many pwm channels that hardware supported. Now there
will be a pwmc instance for each channel when the hardware supports
multiple channels. With a separate instance for each channel, we can have
"named channels" in userland by making devfs alias entries in /dev/pwm.
These changes add support for ivars to pwmbus, and use an ivar to track the
channel number for each child. It also adds support for hinted children.
In pwmc, the driver checks for a label hint, and if present, it's used to
create an alias for the cdev in /dev/pwm. It's not anticipated that hints
will be heavily used, but it's easy to do and allows quick ad-hoc creation
of named channels from userland by using kenv to create hint.pwmc.N.label=
hints. Upcoming changes will add FDT support, and most labels will
probably be specified that way.
is nothing left in the file that related to pwmbus at all. It just contains
prototypes for the functions implemented in dev/pwm.ofw_pwm.c, so name it
accordingly and fix the include protect wrappers to match.
A new pwmbus.h will be coming along in a future commit.
The pwm and pwmbus interfaces were nearly identical, this merges them into a
single pwmbus interface. The pwmbus driver now implements the pwmbus
interface by simply passing all calls through to its parent (the hardware
driver). The channel_count method moves from pwm to pwmbus, and the
get_bus method is deleted (just no longer needed).
The net effect is that the interface for doing pwm stuff is now the same
regardless of whether you're a child of pwmbus, or some random driver
elsewhere in the hierarchy that is bypassing the pwmbus layer and is talking
directly to the hardware driver via cross-hierarchy connections established
using fdt data.
The pwmc driver is now a child of pwmbus, instead of being its sibling
(that's why the get_bus method is no longer needed; pwmc now gets the
device_t of the bus using device_get_parent()).
ioctl definitions and related datatypes that allow userland control of pwm
hardware via the pwmc device. The new name and location better reflects its
assocation with a single device driver.
part of ciss_detach. It's a left-over debug that isn't needed and also
discloses a kernel address. Only root could provoke as part of a
devctl or kldunload.
Submitted by: Fuqian Huang
MFC After: 1 week
asserted. Some development boards for example will reset on DTR,
and some radio interfaces will transmit on RTS.
This patch allows "stty -f /dev/ttyu9.init -rtsdtr" to prevent
RTS and DTR from being asserted on open(), allowing these devices
to be used without problems.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D20031
between each byte either sent or received). However, most transitions
actually complete in 2-3 microseconds.
By polling the status register with a delay of 4us with exponential
backoff, the performance of most IPMI operations is significantly
improved:
- A BMC update on a Supermicro x9 or x11 motherboard goes from ~1 hour
to ~6-8 minutes.
- An ipmitool sensor list time improves by a factor of 4.
Testing showed no significant improvements on a modern server by using
a lower delay.
The changes should also generally reduce the total amount of CPU or
I/O bandwidth used for a given IPMI operation.
Submitted by: Loic Prylli <lprylli@netflix.com>
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20527
detection pins to the Marvell Xenon SDHCI controller.
These features are enable by 'vqmmc-supply' and 'cd-gpios' properties in the
DTS.
This fixes the SD Card detection on espressobin.
Sponsored by: Rubicon Communications, LLC (Netgate)
Enable synaptics and elantech touchpads, as well as IBM/Lenovo TrackPoints
by default, instead of having users find and toggle a loader tunable.
This makes things like two finger scroll and other modern features work out
of the box with X. By enabling these settings by default, we get a better
desktop experience in X, since xserver and evdev can make use of the more
advanced synaptics and elantech features.
Reviewed by: imp, wulf, 0mp
Approved by: imp
Sponsored by: B3 Init (zeising)
Differential Revision: https://reviews.freebsd.org/D20507
Add strict checks for unused bit states in Elantech trackpoint packet
parser to filter out spurious events produces by some hardware which
are detected as trackpoint packets. See comment on r328191 for example.
Tested by: Andrey Kosachenko <andrey.kosachenko@gmail.com>
Sign bits for X and Y motion data were taken from wrong places.
PR: 238291
Reported by: Andrey Kosachenko <andrey.kosachenko@gmail.com>
Tested by: Andrey Kosachenko <andrey.kosachenko@gmail.com>
MFC after: 2 weeks
Add a CAM-Newbus SDIO support module. This works provides a newbus
infrastructure for device drivers wanting to use SDIO. On the lower end
while it is connected by newbus to SDHCI, it talks CAM using the MMCCAM
framework to get to it.
This also duplicates the usbdevs framework to equally create sdiodev
header files with #defines for "vendors" and "products".
Submitted by: kibab (initial work, see https://reviews.freebsd.org/D12467)
Reviewed by: kibab, imp (comments on earlier version)
MFC after: 6 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19749
Currently slot_printf() uses two printf() calls to print the
device-slot name, and actual message. When other printf()s are
ongoing in parallel this can lead to interleaved message on the console,
which is especially unhelpful for debugging or error messages.
Take a hit on the stack and vsnprintf() the message to the buffer.
This way it can be printed along with the device-slot name in one go
avoiding console gibberish.
Reviewed by: marius
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19747
Add cam_sim_alloc_dev() as a wrapper to cam_sim_alloc() which takes
a device_t instead of the unit_number (which we can derive from the
dev again).
Add device_t sim_dev to struct cam_sim. It will be used to pass through
the bus for cases when both sides of CAM speak newbus already and we want
to link them (yet make the calls through CAM for now).
SDIO will be the first consumer of this. For that make use of
cam_sim_alloc_dev() in sdhci under MMCCAM.
This will also allow people to start iterating more on the idea
to newbus-ify CAM without changing 50+ device drivers from the start.
Also to be clear there are callers to cam_sim_alloc() which do not
have a device_t (e.g., XPT) or provide their own unit number so we cannot
simply switch the KPI entirely.
Submitted by: kibab (original idea, see https://reviews.freebsd.org/D12467)
Reviewed by: imp, chuck
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19746
Differentiate between PCI Express Endpoint devices and Root Complex
Integrated Endpoints in the nda driver. The Link Status and Capability
registers are not valid for Integrated Endpoints and should not be
displayed. The bhyve emulated NVMe device will advertise as being an
Integrated Endpoint.
Reviewed by: imp
Approved byL imp (mentor)
Differential Revision: https://reviews.freebsd.org/D20282
These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone. But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place. No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20470
This fixes a panic in Espressobin when gpioregulator fails to allocate the
GPIO pin (the GPIO controller is not there).
Sponsored by: Rubicon Communications, LLC (Netgate)
Provide the acpi handle path as the location string for the nvdimm
children of the nvdimm_root device.
Reviewed by: kib
Approved by: jhb (mentor)
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D20528
Prior to this commit, if PCIEM_SLOT_STA_ABP and PCIEM_SLOT_STA_PDC are
asserted simultaneously, FreeBSD sets a 5 second "hardware going away" timer
and then processes the "presence detect" change. In the (physically
challenging) case that someone presses the "attention button" and inserts
a new PCIe device at exactly the same moment, this results in FreeBSD
recognizing that the device is present, attaching it, and then detaching it
5 seconds later.
On EC2 "bare metal" hardware this is the precise sequence of events which
takes place when a new EBS volume is attached; virtual machines have no
difficulty effecting physically implausible simultaneity.
This patch changes the handling of PCIEM_SLOT_STA_ABP to only detach a
device if the presence of a device was detected *before* the interrupt
which reports the Attention Button push.
Reported by: Matt Wilson
Reviewed by: jhb
MFC after: 1 week
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D20499
if a USB transfer is cancelled that we need to fake a completion event.
Implement missing support in ugen_fs_copy_out() to handle this.
This fixes issues with webcamd(8) and firefox.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Register MODULE_PNP_INFO for virtio devices using the newbus PNP information
provided by the previous commit. Matching can be quite simple; existing
probe routines only matched on bus (implicit) and device_type. The same
matching criteria are retained exactly, but is now also available to
devmatch(8).
Reviewed by: bryanv, markj; imp (earlier version)
Differential Revision: https://reviews.freebsd.org/D20407
Expose the same fields and widths from both vtio buses, even though they
don't quite line up; several virtio drivers can attach to both buses,
and sharing a PNP info table for both seems more convenient.
In practice, I doubt any virtio driver really needs to match on anything
other than bus and device_type (eliminating the unused entries for
vtmmio), and also in practice device_type is << 2^16 (so far, values
range from 1 to 20). So it might be fine to only expose a 16-bit
device_type for PNP purposes. On the other hand, I don't see much harm
in overkill here.
Reviewed by: bryanv, markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D20406
random(4) masks unregistered entropy sources. Prior to this revision,
virtio_random(4) did not correctly register a random_source and did not
function as a source of entropy.
Random source registration for loadable pure sources requires registering a
poll callback, which is invoked periodically by random(4)'s harvestq
kthread. The periodic poll makes virtio_random(4)'s periodic entropy
collection redundant, so this revision removes the callout.
The current random source API is somewhat limiting, so simply fail to attach
any virtio_random devices if one is already registered as a source. This
scenario is expected to be uncommon.
While here, handle the possibility of short reads from the hypervisor random
device gracefully / correctly. It is not clear why a hypervisor would
return a short read or if it is allowed by spec, but we may as well handle
it.
Reviewed by: bryanv (earlier version), markm
Security: yes (note: many other "pure" random sources remain broken)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20419
This change enables natural scrolling with two finger scroll enabled
and when user is using a trackpad (mouse and trackpoint are not affected).
Depending on trackpad model it can be activated with setting of
hw.psm.synaptics.natural_scroll or hw.psm.elantech.natural_scroll sysctl
values to 1.
Evdev protocol is not affected by this change too. Tune userland client
e.g. libinput to enable natural scrolling in that case.
Submitted by: nyan_myuji.xyz
Reviewed by: wulf
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20447
completions are not in a consistent state. Cope with the different
places the normal I/O completion polling thread can be interrupted and
then re-entered during a kernel panic + dump.
Reviewed by: jhb and markj (both prior versions)
Differential Revision: https://reviews.freebsd.org/D20478
When starting a command also print the opcode and flags.
More consitently print flags as hex.
Use slot_printf rather than printf in one case.
MFC after: 6 weeks
Reviewed by: marius, kibab, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19748
receive sockbuf's high water mark.
Calculate rx credits on the spot instead of tracking sbused/sb_cc and
rx_credits in the toepcb. The previous method worked when the high
water mark changed due to SB_AUTOSIZE but not when it was adjusted
directly (for example, by the soreserve in nfsrvd_addsock).
This fixes a connection hang while running iozone over an NFS mounted
share where nfsd's TCP sockets are being handled by t4_tom.
MFC after: 3 days
Sponsored by: Chelsio Communications
I introduced an obvious compiler error in r346282, so this change fixes
that.
Unfortunately, RANDOM_LOADABLE isn't covered by our existing tinderbox, and
it seems like there were existing latent linking problems. I believe these
were introduced on accident in r338324 during reduction of the boolean
expression(s) adjacent to randomdev.c and hash.c. It seems the
RANDOM_LOADABLE build breakage has gone unnoticed for nine months.
This change correctly annotates randomdev.c and hash.c with !random_loadable
to match the pre-r338324 logic; and additionally updates the HWRNG drivers
in MD 'files.*', which depend on random_device symbols, with
!random_loadable (it is invalid for the kernel to depend on symbols from a
module).
(The expression for both randomdev.c and hash.c was the same, prior to
r338324: "optional random random_yarrow | random !random_yarrow
!random_loadable". I.e., "random && (yarrow || !loadable)." When Yarrow
was removed ("yarrow := False"), the expression was incorrectly reduced to
"optional random" when it should have retained "random && !loadable".)
Additionally, I discovered that virtio_random was missing a MODULE_DEPEND on
random_device, which breaks kld load/link of the driver on RANDOM_LOADABLE
kernels. Address that issue as well.
PR: 238223
Reported by: Eir Nym <eirnym AT gmail.com>
Reviewed by: delphij, markm
Approved by: secteam(delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20466
misleading indentation. This is found by gcc -Wmisleading-indentation
Approved by: erj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20428
it hasn't been initialized.
This fixes a bug in r346570 that could cause a panic when servicing
TCP_INFO for offloaded connections.
MFC after: 3 days
Sponsored by: Chelsio Communications
ENAv2 introduces many new features, bug fixes and improvements.
Main new features are LLQ (Low Latency Queues) and independent queues
reconfiguration using sysctl commands.
The year in copyright notice was updated to 2019.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
For easier debugging, the reset is being triggered and the reset reason is
being set only in case it is done for the first time. Such approach will
ensure that the first reset reason is not going to be overwritten and
will make it easier for debugging.
Also, add a reset trigger upon invalid Tx requested ID.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If the call to ena_up() in ena_restore_device() fails, next usage of
`ifconfig up` will cause NULL pointer dereference.
This patch adds additional checks to prevent that.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Some messages were missing new line character and traces were not having
unified behavior. To fix that, each trace and printout should add new
line character at the end of each string - that should improve
readability.
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If the headers of the packets are split into multiple segments of the
mbuf chain, the previous version of ena_tx_csum which was assuming,
that all segments will lay in the first mbuf, will eventually fail to
map the headers properties to meta descriptor.
That will cause Tx checksum offload to do not work and was leading to
memory corruption. It could even cause the crash of the system.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
For alignment with Linux driver and better handling ena_detach(), the
reset is now calling ena_device_restore() and ena_device_destroy().
The ena_device_destroy() is also being called on ena_detach(), so the
code will be more readable.
The watchdog is now being activated after reset only, if it was active
before.
There were added additional checks to ensure, that there is no race with
the link state change AENQ handler.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
As the ENA can have multiple states turned on/off, it is more convenient
to store them in single bitfield instead of multiple boolean variables.
The bitset FreeBSD API was used for the bitfield implementation, as it
provides flexible structure together with API which also supports atomic
bitfield operations.
For better readability basic macros from API were wrapped into custom
ENA_FLAG_* macros, which are filling up common parameters for all calls.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Before the patch, error handling was not releasing all resources and
was not issuing device reset if the reset task failed.
That could cause memory leak and fault of the device.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The host info bdf field is the abbreviation for the bus, device,
function of the PCI on which the device is being attached to.
Now the driver is filling information about that using FreeBSD RID
resource.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The new ENA HAL is introducing API, which can determine on Tx path if
the doorbell is needed.
That way, it can tell the driver, that it should call an doorbell.
The old threshold value wasn't removed, as not all HW is supporting this
feature - so it was reworked to also work with the new API.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The Rx ring size can be as high as 8k. Because of that we want to limit
the cleanup threshold by maximum value of 256.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
LLQ (Low Latency Queue) is the feature, that allows pushing header
directly to the device through PCI before even DMA is triggered.
It reduces latency, because device can start preparing packet before
payload is sent through DMA.
To speed up sending data through PCI, the Write Combining is enabled,
which allows hardware to buffer data before sending them on the PCI - it
allows to reduce number of PCI IO operations.
ENAv2 is using special descriptor for the negotiation of the LLQ.
Currently, only the default configuration is supported.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Handle IO interrupts using filter routine. That way, the main cleanup
task could be moved to the separate thread using taskqueue.
The deferred Rx cleanup task was removed, and now the cleanup task is
begin called instead. That way, the Rx lock could be removed.
In addition, Queue management (wake up and stop TX ring) was added, so
the TX cleanup task can be performed mostly lockless.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The driver now supports per adapter tuning of buffer ring size and HW Rx
ring size.
It can be achieved using sysctl node dev.ena.X.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If the requested ID was out of range, the tx_info structure was NULL and
the function was trying to access the field of the NULL object.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The if_detach was causing crash if the MSI-x configuration in the attach
failed. To prevent this issue, the ifnet is being configured at the end
of the attach function.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
In rare case, when the ifconfig is called just before kldunload, it is
possible, that ena_up routine will be called after queue locks are
released.
To prevent that, ifp is detached before the last ena_down is called and
further, the ifp is freed at the end of the function.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The ENA driver needs at least 2 MSI-x - one for admin queue, and one for
IO queues pair. If there were not enough resources to allocate more than
one MSI-x, the device should not be attached.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
bus_alloc_resource_any() is not returning error value in case of an
error.
If the function call fails, the error value was not passed to the
ena_up() function.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
To prevent errors from assigning values from the DMA structure in case
of an error, zero the vaddr and paddr values upon failure.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The DMA in FreeBSD requires explicit synchronization. ENA driver was
only doing PREREAD and PREWRITE synchronizations. Missing
bus_dmamap_sync() calls were added.
It is also required to synchronize DMA engine before unloading DMA map.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If the first MSI-x won't be executed, then the timer service will detect
that and trigger device reset.
The checking for missing Tx completion was reworked, so it will also
check for missing interrupts. Checking number of missing Tx completions
can be performed after loop, instead of checking it every iteration.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The new ena_com allows the number of CPUs to be passed to the device in
the host info structure as a hint.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Information about Tx error should be only displayed, if packet
preparation failed due to error other than out of memory.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Whenever the driver will receive too many descriptors from the device,
it should trigger the device reset, as it is indicating that the device
is in invalid state.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Receive Side Scaling is optional feature that could be enabled in kernel
configuration by defining flag RSS.
Kernel uses hash to store and find protocol control block which is
stored in hash tables.
Kernel and NIC hash functions must be consistent. Otherwise case lookup
fails.
To achieve this kernel provides API to set proper hash key to NIC.
As it is not possible to change key for virtual ENA NIC, this driver
cannot support RSS function.
ENA is designed to work in virtual environments so supporting hardware
version of this card is unnecessary.
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Notification AENQ handler is responsible for handling requests from ENA
device. Missing Tx threshold, Tx timeout and keep alive timeout can be
set using hints from the aenq descriptor which can be delivered in the
ENA admin notification.
The queue suspending and resuming tasks are not supported by the
driver.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
ENA_ADMIN_FATAL_ERROR and ENA_ADMIN_WARNING aenq groups were indicated
as supported, so the unimplemented_aenq_handler() will print out error
message, whenever an error will occur within the ENA admin context.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
As the ENA is working only in virtualized environment, the active media
is not specified. Instead, the active link type is set as unknown.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Recent HAL change preparing to support ENAv2 required minor driver
modifications.
The ena_com_sq_empty_space() is not available in this ena-com, so it had
to be replaced with ena_com_free_desc().
Moreover, the ena_com_admin_init() is no longer using 3rd argument
indicating if the spin lock should be initialized, so it was removed.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
the code is not guarded by the if clause and has misleading indentation.
Approved by: scottl
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20427
Since drm2 removal, there has not been any consumer of the feature in the
tree. I am also unaware of any out-of-tree consumer.
More importantly, the feature has been broken from the very start, both
before and after r306589, because the ivar was set on a device that does
not support it and it was read from another device that also does not
support it.
A bus-wide no-stop flag cannot be implemented as an ivar as iicbus
attaches as a child of various drivers. Implementing the ivar in each
and every I2C driver is just impractical.
If we ever want to implement this feature properly, then probably the
easiest way to do it would be via a flag in the softc of iicbus.
In fact, we might have to do that in the stable branches if we want to
fix the code for them.
Reported by: ian (long time ago)
MFC after: 1 month (maybe)
X-MFC-note: cannot just merge the change, must keep drm2 happy
As I see, different NICs in different configurations may have different
numbers of TX and RX queues. The code was assuming 1:1 mapping between
event queues (interrupts) and TX/RX queues. Since number of interrupts
is set to maximum of TX and RX queues, when those two are different, the
system is doomed.
I have no documentation or deep knowledge about this hardware, so this
change is based on general observations and code reading. If some of my
guesses are wrong, please do better. I just confirmed HP NC550SFP NICs
are working now.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Prior to this revision, vtpci's BUS_READ_IVAR method on VIRTIO_IVAR_SUBVENDOR
accidentally returned the PCI subdevice.
The typo seems to have been introduced with the original commit adding
VIRTIO_IVAR_{{SUB,}DEVICE,{SUB,}VENDOR} to virtio_pci. The commit log and code
strongly suggest that the ivar was intended to return the subvendor rather than
the subdevice; it was likely just a copy/paste mistake.
Go ahead and rectify that.
- Perform ifp mismatch checks (to determine if a send tag is allocated
for a different ifp than the one the packet is being output on), in
ip_output() and ip6_output(). This avoids sending packets with send
tags to ifnet drivers that don't support send tags.
Since we are now checking for ifp mismatches before invoking
if_output, we can now try to allocate a new tag before invoking
if_output sending the original packet on the new tag if allocation
succeeds.
To avoid code duplication for the fragment and unfragmented cases,
add ip_output_send() and ip6_output_send() as wrappers around
if_output and nd6_output_ifp, respectively. All of the logic for
setting send tags and dealing with send tag-related errors is done
in these wrapper functions.
For pseudo interfaces that wrap other network interfaces (vlan and
lagg), wrapper send tags are now allocated so that ip*_output see
the wrapper ifp as the ifp in the send tag. The if_transmit
routines rewrite the send tags after performing an ifp mismatch
check. If an ifp mismatch is detected, the transmit routines fail
with EAGAIN.
- To provide clearer life cycle management of send tags, especially
in the presence of vlan and lagg wrapper tags, add a reference count
to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
Provide a helper function (m_snd_tag_init()) for use by drivers
supporting send tags. m_snd_tag_init() takes care of the if_ref
on the ifp meaning that code alloating send tags via if_snd_tag_alloc
no longer has to manage that manually. Similarly, m_snd_tag_rele
drops the refcount on the ifp after invoking if_snd_tag_free when
the last reference to a send tag is dropped.
This also closes use after free races if there are pending packets in
driver tx rings after the socket is closed (e.g. from tcpdrop).
In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
Drivers now also check this flag instead of checking snd_tag against
NULL. This avoids false positive matches when a forwarded packet
has a non-NULL rcvif that was treated as a send tag.
- cxgbe was relying on snd_tag_free being called when the inp was
detached so that it could kick the firmware to flush any pending
work on the flow. This is because the driver doesn't require ACK
messages from the firmware for every request, but instead does a
kind of manual interrupt coalescing by only setting a flag to
request a completion on a subset of requests. If all of the
in-flight requests don't have the flag when the tag is detached from
the inp, the flow might never return the credits. The current
snd_tag_free command issues a flush command to force the credits to
return. However, the credit return is what also frees the mbufs,
and since those mbufs now hold references on the tag, this meant
that snd_tag_free would never be called.
To fix, explicitly drop the mbuf's reference on the snd tag when the
mbuf is queued in the firmware work queue. This means that once the
inp's reference on the tag goes away and all in-flight mbufs have
been queued to the firmware, tag's refcount will drop to zero and
snd_tag_free will kick in and send the flush request. Note that we
need to avoid doing this in the middle of ethofld_tx(), so the
driver grabs a temporary reference on the tag around that loop to
defer the free to the end of the function in case it sends the last
mbuf to the queue after the inp has dropped its reference on the
tag.
- mlx5 preallocates send tags and was using the ifp pointer even when
the send tag wasn't in use. Explicitly use the ifp from other data
structures instead.
- Sprinkle some assertions in various places to assert that received
packets don't have a send tag, and that other places that overwrite
rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.
Reviewed by: gallatin, hselasky, rgrimes, ae
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20117
The point of r345008 was to reset the Command Reference Number (CRN)
in some situations where a device stayed in the topology, but had
changed somehow.
This can include moving from a switch connection to a direct
connection or vice versa, or a device that temporarily goes away
and comes back. (e.g. moving to a different switch port)
There were a couple of bugs in that change:
- We were reporting that a device had not changed whenever the
Establish Image Pair bit was not set. That is not quite correct.
Instead, if the Establish Image Pair bit stays the same (set or
not), the device hasn't changed in that way.
- We weren't setting PRLI Word0 in the port database when a new
device arrived, so comparisons with the old value for the
Establish Image Pair bit weren't really possible. So, make sure
PRLI Word0 is set in the port database for new devices.
- We were resetting the CRN whenever the Establish Image Pair bit
was set for a device, even when the device had stayed the same
and the value of the bit hadn't changed. Now, only reset the
CRN for devices that have changed, not devices that sayed the
same.
The result of all of this was that if we had a single FC device on
an FC port and it went away and came back, we would wind up
correctly resetting the CRN.
But, if we had multiple devices connected via a switch, and there
was any change in one or more of those devices, all of the devices
that stayed the same would also have their CRN values reset.
The result, from a user standpoint, is that the tape drives, etc.
would all start to time out commands and the initiator would send
aborts.
sys/dev/isp/isp.c:
In isp_pdb_add_update(), look at whether the Establish
Image Pair bit has changed as part of the check to
determine whether a device is still the same. This was
causing erroneous change notifications. Also, when
creating a new port database entry, initialize the
PRLI Word 0 values.
sys/dev/isp/isp_freebsd.c:
In isp_async(), in the changed/stayed case, instead of
looking at the Establish Image Pair bit to determine
whether to reset the CRN, look at the command value.
(Changed vs. Stayed.) Only reset the CRN for devices
that have changed.
Sponsored by: Spectra Logic
MFC after: 3 days
That made, for example, gpioc -l output quite hard to read and parse.
Also, fix formatting of a nearby statement with too long lines.
MFC after: 2 weeks
Pull the responsibility for zeroing events, which is general to any
conceivable implementation of a random device algorithm, out of the
algorithm-specific Fortuna code and into the callers. Most callers
indirect through random_fortuna_process_event(), so add the logic there.
Most callers already explicitly bzeroed the events they provided, so the
logic in Fortuna was mostly redundant.
Add one missing bzero in randomdev_accumulate(). Also, remove a redundant
bzero in the same function -- randomdev_hash_finish() is obliged to bzero
the hash state.
Reviewed by: delphij
Approved by: secteam(delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20318
This takes the SPCR code currently in uart_cpu_arm64.c, moves it into
a new uart_cpu_acpi.c (with some associated refactoring), and uses it
from both arm64 and x86.
An SPCR serial port address AccessWidth field value of 0 ("reserved")
is now treated as 1 ("byte access") in order to work around a buggy
SPCR table on Amazon EC2 i3.metal instances.
Reviewed by: manu, Greg V
MFC after: 3 days
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D20357
Pnpinfo is bus-specific and requires the bus name. The FDTCOMPAT_PNP_INFO()
macro makes it easier to define new FDT-based pnpinfo for busses other than
simplebus.
Differential Revision: https://reviews.freebsd.org/D20382
Many i2c slave drivers are in modules that can be unloaded. If they detach
while IO is in progress the bus would be hung forever. Conversely,
lower-layer drivers (iicbus and the hardware driver) also live in modules
and other kinds of bad things happen if they get detached while IO is in
progress. Because device_busy() propagates up to parents, marking the slave
device busy while it owns the bus solves both kinds of problems that come
with detaching i2c devices while IO is in progress.
It should be safer to flush controller and disk caches on the shutdown.
And to gracefully shut down the controller as well.
It seems that the Linux driver has been doing that for a long time.
Discussed with: scottl
Reviewed by: imp, Sumit Saxena <sumit.saxena@broadcom.com>
(both earlier version)
MFC after: 3 weeks
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D19817
hint.gpioled.%d.state determines the initial state of the LED when the
driver takes control over it:
0 - the LED is off
1 - the LED is on
-1 - the LED is kept as it was
While here, add a module version declaration.
MFC after: 2 weks
This is a curious small widget for which I might write a driver.
It is bridge between USB HID interface and I2C interface plus some
GPIO pins.
MFC after: 2 weeks
This fixes a regress introduced in r339754.
After that change the code required that there is a HPET device
in the ACPI namespace.
The problem has been noticed on an PC Engines apu2 system.
While here, fix a small formatting issue.
has been in the PR system for 5 months and then on reviews for another 5.
Nobody came with any cases where it fails, while many people cried for
it to be commited & merged.
PR: 209468
Submitted by: Prasad B M <prasad.munirathnam@microsemi.com>
Reported by: Steven Peterson <scp@mainstream.net>
Approved by: scottl
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D18408
Previously, if a system had multiple batteries, the remaining life
percentage was calculated as the average of each battery's percent
remaining. This results in rather incorrect values when you consider the
case of the Thinkpad X270 that has a small 3 cell internally battery, and
a hot-swappable 9 cell battery that is used first. Battery 0 is at 100%,
but battery 1 is at 10%, you do not infact have 55% of your capacity
remaining.
The new method calculates the percentage based on remaining capacity
out of total capacity, giving a much more accurate reading.
PR: 229818
Submitted by: Keegan Drake H.P. <kd-dev@pm.me>
MFC after: 2 weeks
Sponsored by: Klara Systems
Event: Waterloo Hackathon 2019
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes. Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone. Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.
Like r348026, these CUs did not show up in tinderbox as missing the include.
Reported by: peterj (arm64/mp_machdep.c)
X-MFC-With: r347984
Sponsored by: Dell EMC Isilon
Using the latest NVIDIA driver, upon resuming from suspend with X
running the display remained blank. Additionally OpenGL applications
that were running triggered a number of error messages from the NVIDIA
driver.
This occurred because the vt efifb back-end did not signal the X server
to release the display before suspending (or to re-acquire it after
resuming). The NVIDIA driver includes code for smoothly shutting down
and re-initializing the GPU, which was not getting called.
Since the NVIDIA driver doesn't currently support framebuffer devices
and vt is forced to fall back to the efifb back-end, add vd_suspend and
vd_resume members to connect the suspend/resume path. This ensures the
X server is properly able to re-initialize the display.
PR: 237050
Submitted by: Erik Kurzinger <ekurzinger@nvidia.com>
Reviewed by: markj
MFC after: 2 weeks
Event: Waterloo Hackathon 2019
This was enumerated with exhaustive search for sys/eventhandler.h includes,
cross-referenced against EVENTHANDLER_* usage with the comm(1) utility. Manual
checking was performed to avoid redundant includes in some drivers where a
common os_bsd.h (for example) included sys/eventhandler.h indirectly, but it is
possible some of these are redundant with driver-specific headers in ways I
didn't notice.
(These CUs did not show up as missing eventhandler.h in tinderbox.)
X-MFC-With: r347984
These are obviously missing from the .c files, but don't show up in any
tinderbox configuration (due to latent header pollution of some kind). It
seems some configurations don't have this pollution, and the includes are
obviously missing, so go ahead and add them.
Reported by: Peter Jeremy <peter AT rulingia.com>
X-MFC-With: r347984
all-ones then carving out blocks of zeroes where specified values go, init
it to all-zeroes, put in ones where values need to be masked, then use it
as value &= ~sc_led_modes_mask. In addition to being more idiomatic, this
means everything related to FDT data is initialized to zero along with the
rest of the softc, and that allows removing some #ifdef FDT sections and
wrapping the whole muge_set_leds() function in a single ifdef block.
This also deletes the early-out from muge_set_leds() when an eeprom exists.
Even if there is an eeprom with led config in it, the fdt data (if present)
should override that, because the user is in control of the fdt data.
OTP registers (because the user is in control of the fdt data). Remove the
early returns from the code that tries to find a good mac address, so that
the execution always flows through the routine to get an address from FDT
data last, when on FDT-enabled systems.
the device instance, and to get the MAC address for the device instance.
The ad-hoc code this replaces could find the wrong instance if multiple
devices were present.
Also use LED mode settings from the FDT to set the PHY.
From v3 of the patch submitted in the PR.
I moved the sc_led_modes and sc_led_modes_mask default setting outside
of the #ifdef FDT case.
PR: 237325
Submitted by: Ralf <iz-rpi03@hs-karlsruhe.de>
Reviewed by: ian
MFC after: 2 weeks
MFC with: r348001
Event: Waterloo Hackathon 2019
Differential Revision: https://reviews.freebsd.org/D20325
Also apply some style(9) and remove the message about EEPROM configuration
(if there's an EEPROM the hardware handles LED configuration itself).
PR: 237325
Reviewed by: ian
MFC after: 2 weeks
Submitted by: Ralf <iz-rpi03@hs-karlsruhe.de>
Summary:
PowerPC kernels are fully position independent, just like kernel modules.
The same fixups that are done for modules therefore need to be done to the
kernel, else symbol resolution in, e.g., DTrace, cannot resolve the kernel
symbols, so only addresses in the kernel are printed, while kernel module
symbols are printed.
Test Plan:
Run lockstat on powerpc64. Note symbols are resolved for kernel and
modules.
Reviewed By: markj
Differential Revision: https://reviews.freebsd.org/D20316
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
FDT data is sometimes used to configure usb devices which are hardwired into
an embedded system. Because the devices are instantiated by the usb
enumeration process rather than by ofwbus iterating through the fdt data, it
is somewhat difficult for a usb driver to locate fdt data that belongs to
it. In the past, various ad-hoc methods have been used, which can lead to
errors such applying configuration that should apply only to a hardwired
device onto a similar device attached by the user at runtime. For example,
if the user adds an ethernet device that uses the same driver as the builtin
ethernet, both devices might end up with the same MAC address.
These changes add a new usb_fdt_get_node() helper function that a driver can
use to locate FDT data that belongs to a single unique instance of the
device. This function locates the proper FDT data using the mechanism
detailed in the standard "usb-device.txt" binding document [1].
There is also a new usb_fdt_get_mac_addr() function, used to retrieve the
mac address for a given device instance from the fdt data. It uses
usb_fdt_get_node() to locate the right node in the FDT data, and attempts to
obtain the mac-address or local-mac-address property (in that order, the
same as linux does it).
The existing if_smsc driver is modified to use the new functions, both as an
example and for testing the new functions. Rpi and rpi2 boards use this
driver and provide the mac address via the fdt data.
[1] https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/usb/usb-device.txt
Differential Revision: https://reviews.freebsd.org/D20262
We need to make the find_veriexec_file() function available publicly, so
rename it to mac_veriexec_metadata_find_file_info() and make it non-static.
Bump the version of the veriexec device interface so user space will know
the labelized version of fingerprint loading is available.
Approved by: sjg
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D20295
priv. This allows for MAC/veriexec to prevent apps that are not "trusted"
from using these commands.
Obtained from: Juniper Networks, Inc.
MFC after: 1 week
When activating a resource do not compare the resource id to the adress.
Treat IO region as MEMORY region too.
Submitted by: Tuan Phan <tphan@amperecomputing.com> (Original Version)
Sponsored by: Ampere Computing, LLC
Differential Revision: https://reviews.freebsd.org/D20214
We cannot know the bus end number before parsing the MCFG table
so don't set the bus_end before that. If the MCFG table doesn't
exist we will set the configuration base address based on the _CBA
value and set the bus_end to the maximal number allowed by PCI.
Sponsored by: Ampere Computing, LLC
Differential Revision: https://reviews.freebsd.org/D20213
In all practical situations, the resolver visibility is static.
Requested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: so (emaste)
Differential revision: https://reviews.freebsd.org/D20281
In xdma_handle_mem_node(), vmem_size_t and vmem_addr_t pointers were passed to
an FDT API that emits u_long values to the output parameter pointer. This
broke on systems with both xdma and 32-bit vmem size/addr types (SOCFPGA).
Reported by: tinderbox
Sponsored by: Dell EMC Isilon
Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure. An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).
Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security: CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security: FreeBSD-SA-19:07.mds
Reviewed by: jhb
Tested by: emaste, lwhsu
Approved by: so (gtetlow)
(1) We may have had sufficient entropy to consider Fortuna seeded, but the
random_fortuna_seeded() function would produce a false negative if
fs_counter was still zero. This condition could arise after
random_harvestq_prime() processed the /boot/entropy file and before any
read-type operation invoked "pre_read()." Fortuna's fs_counter variable is
only incremented (if certain conditions are met) by reseeding, which is
invoked by random_fortuna_pre_read().
is_random_seeded(9) was introduced in r346282, but the function was unused
prior to r346358, which introduced this regression. The regression broke
initial seeding of arc4random(9) and broke periodic reseeding[A], until something
other than arc4random(9) invoked read_random(9) or read_random_uio(9) directly.
(Such as userspace getrandom(2) or read(2) of /dev/random. By default,
/etc/rc.d/random does this during multiuser start-up.)
(2) The conditions under which Fortuna will reseed (including initial seeding)
are: (a) sufficient "entropy" (by sheer byte count; default 64) is collected
in the zeroth pool (of 32 pools), and (b) it has been at least 100ms since
the last reseed (to prevent trivial DoS; part of FS&K design). Prior to
this revision, initial seeding might have been prevented if the reseed
function was invoked during the first 100ms of boot.
This revision addresses both of these issues. If random_fortuna_seeded()
observes a zero fs_counter, it invokes random_fortuna_pre_read() and checks
again. This addresses the problem where entropy actually was sufficient,
but nothing had attempted a read -> pre_read yet.
The second change is to disable the 100ms reseed guard when Fortuna has
never been seeded yet (fs_lasttime == 0). The guard is intended to prevent
gratuitous subsequent reseeds, not initial seeding!
Machines running CURRENT between r346358 and this revision are encouraged to
refresh when possible. Keys generated by userspace with /dev/random or
getrandom(9) during this timeframe are safe, but any long-term session keys
generated by kernel arc4random consumers are potentially suspect.
[A]: Broken in the sense that is_random_seeded(9) false negatives would cause
arc4random(9) to (re-)seed with weak entropy (SHA256(cyclecount ||
FreeBSD_version)).
PR: 237869
Reported by: delphij, dim
Reviewed by: delphij
Approved by: secteam(delphij)
X-MFC-With: r346282, r346358 (if ever)
Security: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20239
Instead of precalculating the different speed, respect the bus frequency
and calculate the clock register parameter based on it.
If the platform didn't register the core clk, fallback on the precomputed
values (This is likely do be the case on Marvell boards).
We do this for FDT systems but not for ACPI ones.
Check the presence of the _CCA attribute.
Sponsored by: Ampere Computing, LLC
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D20144
DTrace expects kernel function symbols of a non-zero size to have an
implementation, which is a reasonable invariant to preserve.
Reported and tested by: ler
Reviewed by: cem, kib
Approved by: so (delphij)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20218
When the module is unloaded, the tty devices are destroyed. That requires
implementing the tsw_free callback to avoid a panic. This driver requires
no particular cleanup to be done from the callback, but the module itself
must remain in memory until the deferred tsw_free callbacks are invoked.
These changes implement that by incrementing a reference count variable in
the detach routine, and decrementing it in the tsw_free callback. The
MOD_UNLOAD event handler doesn't return until the count drops to zero.
PR: 237758
Maintain symmetry with nvme_ctrlr_create_qpairs, making it easier to
match init/uninit scenarios.
Signed-off-by: John Meneghini <johnm@netapp.com>
Submitted by: Michael Hordijk <hordijk@netapp.com>
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D19781
loaded OS.
This should prevent at least some theoretical issues whith code
execution on HT sibling of the core where the update is loaded.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20201
dme(4) is the built-in NIC on a couple non-expandable mips platforms and
thus should remain. The FCP has been updated to reflect this fact.
Discussed with: imp
Ampere eMAG systems have XHCI just described in ACPI, not on PCI.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: andrew
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D19986
Grab device reserved physical memory regions from FDT using standard
"memory-region" property and use vmem(9) to allocate buffers from it.
The same vmem could be used by DMA engine drivers to allocate memory for
DMA descriptors.
This is required for platforms that provide uncached memory region
reserved exclusively for DMA operations.
o Change sleepable sx(9) lock type to non-sleepable mutex(9) since
network drivers usually hold mutex during DMA operations. So we don't
take sleepable lock after non-sleepable.
Tested on U.S. Government Furnished Equipment (GFE) 64-bit RISC-V cores.
Sponsored by: DARPA, AFRL
Mjg@ reports that RDSEED (r347239) causes a lot of logspam from this printf,
and I don't feel that it is especially useful (even ratelimited). There are
many other quality/quantity checks we're not performing on entropy sources;
lack of high frequency availability does not disqualify a good entropy
source.
There is some discussion in the linked Differential about what logging might
be appropriate and/or polling policy for slower TRNG sources. Please feel
free to chime in if you have opinions.
Reported by: mjg
Reviewed by: markm, delphij
Approved by: secteam(delphij)
X-MFC-With: r347239
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20195
There is no reason to re-create the command workqueue during healthcare.
This also fixes an issue where a previous work struct may refer to a
destroyed workqueue.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Present code uses lock-less accesses to the dump data to prevent top
level ioctls from blocking bottom-level call to dump. Unfortunately, this
depends on the type stability of the dump data structure, which makes it
non-functional during driver teardown.
Switch to the mutex locking scheme where top levels use the mutex in the
bound regions, while copyouts and drain for completion utilize condvars.
The mutex lifetime is guaranteed to be strictly larger than the time
interval where driver can initiate dump, and most of the control fields
of the old struct mlx5_dump_data are directly embedded into struct
mlx5_core_dev.
Submitted by: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Avoid race for command completion when triggering a command completions event.
Serialize operation by queueing all commands on the same work queue.
This can happen when healthcare triggers.
MFC after: 3 days
Sponsored by: Mellanox Technologies
The command timeout is terribly long, whole two hours. Make it 60s so if
things do go wrong, the user gets feedback in relatively short time, so
they can take corrective actions and/or investigate using tools and such.
Linux commit:
6b6c07bdcdc97ccac2596063bfc32a5faddfe884
MFC after: 3 days
Sponsored by: Mellanox Technologies
Function 'mlx5e_alloc_rx_wqe' can never be inlined because it uses alloca
(override using the always_inline attribute)
MFC after: 3 days
Sponsored by: Mellanox Technologies
Allow more macro arguments and split the variable type and name into
separate arguments. This allows simple and powerful copy and extraction
of values from IFC based structures into SYSCTLs with the use of a single
macro.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Implement a watchdog as part of the healtcare subsystem which
reads the PCI power status during startup and upon the PCI
power status change event and store it into the core device
structure. This value is then exported to user-space via a
read-only SYSCTL. A dmesg print has been added to inform
the admin about the PCI power status.
MFC after: 3 days
Sponsored by: Mellanox Technologies
CM layer calls ib_modify_port() regardless of the link layer.
For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.
Linux Commit:
ec2558796d25e6024071b6bcb8e11392538d57bf
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
mlx5en(4).
IFM_10G_LR and IFM_40G_ER4 media should be added only if the device
has the needed capability bit set for it.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
If generation is cleared due to hardware clock failure, check for it before
the divisor is used. Actually clear generation when failure occurs.
While there, stop doing the calculations inside the generation loop. Since
all members of mlx5e_clbr_point are used for calculations, get the
local copy of the structure and use it after generation stabilized.
Submitted by: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
This fixes counting issues when the firmware resets the counter during
allocation of the counter set where the counter belongs.
MFC after: 3 days
Sponsored by: Mellanox Technologies
The physical- and virtual- port counters might not reflect the amount
of data received after address filtering. Use the software counters
instead for rx_packets and rx_bytes to know exactly how much data
was received.
MFC after: 3 days
Sponsored by: Mellanox Technologies
mlx5_pci_disable_device is calling pci_disable_device which disables
bus master. No need to explicitly call pci_clear_master.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Add mlx5 implementation for the ones defined by the mlxfw
shared module to be used while flashing the device firmware.
The callbacks do their job through the MCQI, MCC and MCDA registers.
Linux commit:
62bd22cf326dc4ac5be673c11cef4602dc1f5e47
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
The current mapping of driver counters to netstat counters is wrong.
For example, a single jabber packet, will cause the Ierrs counter to
count three times.
The work for mapping the hardware and software counters to their right
place in netstat counters were already done in Linux, take that as is
to the FreeBSD driver.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
To be used by the mlx5 callbacks exposed to the mlxfw module.
Linux commit:
d2ad488b0073bd1a2c3f5d2ea50a7eb632103e5d
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Enhance MCAM to allow the driver to query which access regs are
supported. For now, expose the regs needed for FW flashing.
Linux commit:
0ab87743cc8c5bcd482daf71961ed5fc45349e01
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
MCC (Management Component Control) allows to control a firmware
component update.
MCDA (Management Component Data Access) allows to read and write
a firmware component.
MCQI (Management Component Query Information) allows to query
information about firmware components.
Linux commit:
4717628938423fcba0aa8fa889e9fed4eb6a655f
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
On load_one, we now cache our capabilities registers internally, similar
to QUERY_HCA_CAP. Capabilities can later be queried using macros
introduced in this patch.
Linux commit:
71862561f3a62015a11de16d1c306481e8415c08
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Introduced registers will expose capabilities of new registers and
features related to port/management.
Driver will query MCAM and PCAM in order to avoid failing on old
firmwares with lack of support.
Linux commit:
c835ad64683bd3e2d1b31ed2cb1ff4366932edb1
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
PCAM: Ports capabilities mask register.
MCAM: Management capabilities mask register.
PCAM and MCAM registers will provide information regarding firmware
support for different features, in order to avoid cases where new driver
combined with old firmware results in syndromes (for ex. PCIe counters
before this patchset).
Linux commit:
cfdcbceaeffc669b70d904d80a2df9c86c232566
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been
scattered to memory
Linux commit:
fcd29ad17c6ff885dfae58f557e9323941e63ba2
MFC after: 3 days
Sponsored by: Mellanox Technologies
Else the SQs won't be properly released when closing rate-limited connections
leading to wrong state transitions on the SQ.
MFC after: 3 days
Sponsored by: Mellanox Technologies
When using CQE zipping, one can choose between RX hash and Checksum.
This will indicate the parameter on which a zipping session should be
stopped.
While porting the Linux code, Checksum was chosen. However, the value
of Checksum is not being used anywhere.
For the FreeBSD driver, we prefer to use the RX hash format which will
guarantee the RX hash value for all the mini CQEs.
While at it, make sure to initialize the Checksum value in the
decompressed CQE.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
After doing performance measurements, it seems like CQE zipping doesn't
have any significant benefit.
Moreover, we know that this feature is disabled by default on other
operating systems (Linux for example).
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Split the function into the mlx5e_update_stats_locked() core and make
mlx5e_update_stats_work() call the _locked helper, similar to many other
places in the kernel. This improves the code structure, making the
locking clean.
Submitted by: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Instead of waiting for all jobs to be cancelled, simply close the completion
queue to prevent more completion events and let mlx5e_destroy_rq() cleanup
the remaining mbufs.
MFC after: 3 days
Sponsored by: Mellanox Technologies
The number of priorities is always 8, while the number of traffic classes
supported can vary. While at it convert the sysctl node into an array.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Instead of reading Ethernet RFC 2819 pXtoYoctets counters from
hardware which counts RX octets, count tx_stat_pXtoYoctets from
Ethernet extended counters which counts TX octets.
TX jumbo counters should be accumulated only after the PPCNT
counters were fetched from hardware with their latest value.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Avoid an infinite software firmware reset loop that may be caused by a
hardware bug by limiting the maximum number of resets.
The counter between resets is reset by request for reset, and not by a
successful reset.
The interval between two resets can be configured via sysctl:
hw.mlx5.sw_reset_timeout
which is global to all mlx5 devices in the system.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Make sure the interrupt handlers don't race with the fast unload one
code in the shutdown handler.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
Linux commit:
1865ea9adbfaf341c5cd5d8f7d384f19948b2fe9
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
While at it remove unused interface state bits. This also fixes and issue
during shutdown:
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
Linux commit:
10a8d00707082955b177164d4b4e758ffcbd4017
b3cb5388499c5e219324bfe7da2e46cbad82bfcf
MFC after: 3 days
Sponsored by: Mellanox Technologies