While very useful by itself, it also makes `nvmecontrol` not depend on
hardcoded device names parsing, that in its turn makes simple to take
nvdX (and potentially any other) device names as arguments.
Also added IOCTL bypass from nvdX to respective nvmeYnsZ makes them
interchangeable for management purposes.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
an updated rack depend on having access to the new
ratelimit api in this commit.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D20953
with an eventual goal to convert all legacl zlib callers to the new zlib
version:
* Move generic zlib shims that are not specific to zlib 1.0.4 to
sys/dev/zlib.
* Connect new zlib (1.2.11) to the zlib kernel module, currently built
with Z_SOLO.
* Prefix the legacy zlib (1.0.4) with 'zlib104_' namespace.
* Convert sys/opencrypto/cryptodeflate.c to use new zlib.
* Remove bundled zlib 1.2.3 from ZFS and adapt it to new zlib and make
it depend on the zlib module.
* Fix Z_SOLO build of new zlib.
PR: 229763
Submitted by: Yoshihiro Ota <ota j email ne jp>
Reviewed by: markm (sys/dev/zlib/zlib_kmod.c)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D19706
Terasic DE10-Pro (an Intel Stratix 10 GX/SX FPGA Development Kit).
The Altera EMAC is an instance of Synopsys DesignWare Gigabit MAC.
This driver sets correct clock range for MDIO interface on Intel Stratix 10
platform.
This is required due to lack of support for clock manager device for
this platform that could tell us the clock frequency value for ethernet
clock domain.
Sponsored by: DARPA, AFRL
Substitute driver-defined IS_P2ALIGNED() with EFX_IS_P2ALIGNED()
defined in libefx.
Add type argument and cast value and alignment to one specified type.
Reported by: Andrea Valsania <andrea.valsania at answervad.it>
Reviewed by: philip
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D21076
Substitute driver-defined P2ALIGN() with EFX_P2ALIGN() defined in
libefx.
Cast value and alignment to one specified type to guarantee result
correctness.
Reported by: Andrea Valsania <andrea.valsania at answervad.it>
Reviewed by: philip
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D21075
Substitute driver-defined P2ROUNDUP() h with EFX_P2ROUNDUP()
defined in libefx.
Cast value and alignment to one specified type to guarantee result
correctness.
Reported by: Andrea Valsania <andrea.valsania at answervad.it>
Reviewed by: philip
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D21074
We want to allocate a contiguous memory block anywhere in memory, but
expressed this as having to be between 0 and 0xffffffff. This limits us
on 64-bit machines, and outright breaks on machines where memory is
mapped above that address range.
Allow the full address range to be used for this allocation.
Sponsored by: Axiado
The timeout field in the CAPS register is defined to be 8 bits, so its type was
uint8_t. We recently started adding 1 to it to cope with rogue devices that
listed 0 timeout time (which is impossible). However, in so doing, other devices
that list 0xff (for a 2 minute timeout) were broken when adding 1
overflowed. Widen the type to be uint32_t like its source register to avoid the
issue.
Reported by: bapt@
- Wrong order of casting and bit shift caused that enabling and disabling
queues didn't work properly for queues number larger than 32. Use literals
with right suffix instead.
- TX ring tail address was not updated during reinitiailzation of TX
structures. It could block sending traffic.
- Also remove unused variables 'eims' and 'active_queues'.
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: erj@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D20826
o Add an experimental IOMMU support to xDMA framework
The BERI IOMMU device is the part of CHERI device-model project [1]. It
translates memory addresses for various BERI peripherals modelled in
software. It accepts FreeBSD/mips64 page directories format and manages
BERI TLB.
1. https://github.com/CTSRD-CHERI/device-model
Sponsored by: DARPA, AFRL
features offered by the chips.
For 2127 and 2129 chips, fix the detection of when chip-init is needed. The
chip config needs to be reset whenever power was lost, but the logic was
wrong for 212x chips (it only worked for 8523). Now the "oscillator
stopped" bit rather than the power manager mode is used to detect startup
after powerfail.
For all chips, disable the clock output pin.
For chips that have a timestamp/tamper-monitor feature, turn off monitoring
of the timestamp trigger pin.
The 8523, 2127, and 2129 chips have a "power manager" feature that offers
several options. We've been using the default mode which enables
everything. Now the code sets the power manager options to
- direct-switch (when Vdd < Vbat, without extra threshold check)
- no battery monitor
- no external powerfail monitor
This reduces the current draw while running on battery from 1930nA to 880nA,
which should roughly double the lifespan of the battery under load.
Because battery checking is a nice thing to have, the code now does a check
at startup, and then once a day after that, instead of checking continuously
(but only actually reporting at startup). The battery check is now done by
setting the power manager back to default mode, sleeping briefly while it
makes a voltage measurement, then switching back to power-saving mode.
While we print failure messages on the console, sometimes logs are lost or
overwhelmed. Keeping a count of how many times we've failed retriable commands
helps get a magnitude of the problem.
Retried commands can indicate a performance degredation of an nvme drive. Keep
track of the number of retries and report it out via sysctl, just like number of
commands an interrupts.
Also convert it to a bool. While the rest of the driver isn't yet bool clean,
this will help.
Reviewed by: cem@
Differential Revision: https://reviews.freebsd.org/D20988
The nvme drive dumps only the most relevant details about a command when it
fails. However, there are times this is not sufficient (such as debugging weird
issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1
in loader.conf will enable more complete debugging information about each
command that fails.
Reviewed by: rpokala
Sponsored by: Netflix
Differential Version: https://reviews.freebsd.org/D20988
These macros make places where we extract these easier to read. The shift and
mask stuff is also a bit tedious and error prone. Start with the CAP_LO and
CAP_HI registers since their scope is somewhat constrained. This is style
chagne only, no functional changes.
Reviewed by: chuck
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20979
This affects the detection of 24-hour vs AM/PM mode... the ampm bit is in a
different location on 2127 and 2129 chips compared to other nxp rtc chips.
I noticed the 2127 case wasn't being handled correctly when I accidentally
misconfiged my system by claiming my PCF2129 was a 2127.
with various laptops using hdaa(4) sound devices. We don't seem to know
the "correct" configurations for these devices and the defaults are far
superiour, e.g. they work if you don't nuke the default configs.
PR: 200526
Differential Revision: https://reviews.freebsd.org/D17772
Neither the 1.3 or 1.4 standards say this number is 1's based, but adding 1
costs little and copes with those NVMe drives that report '0' in this field
cheaply. This is consistent with what the Linux driver does as well.
This fixes the following panic on powerpc:
pci_get_vendor failed for pcib1 on bus ofwbus0, error = 2
PR: 238730
Reported by: Dennis Clarke <dclarke@blastwave.org>
Tested by: Dennis Clarke <dclarke@blastwave.org>
MFC after: 2 weeks
on PCx2129 chips too.
The datasheet for the PCx2129 chips says that there is only a watchdog
timer, no countdown timer. It turns out the countdown timer hardware is
there and works just the same as it does on a PCx2127 chip, except that you
can't use it to trigger an interrupt or toggle an output pin. We don't need
interrupts or output pins, we only need to read the timer register to get
sub-second resolution. So start treating the 2129 chips the same as 2127.
An obscure footnote in the datasheets for the PCx2127, PCx2129, and
PCF8523 rtc chips states that the chips do not support i2c repeat-start
operations. When the driver was originally written and tested, the i2c
bus on that system also didn't support repeat-start and just quietly
turned repeat-start operations into a stop-then-start, making it appear
that the nxprtc driver was working properly.
The repeat-start situation only comes up on reads, so instead of using
the standard iicdev_readfrom(), use a local nxprtc_readfrom(), which is
just a cut-and-pasted copy of iicdev_readfrom(), modified to send two
separate start-data-stop sequences instead of using repeat-start.
The driver used to log any non-zero cause and when running with a single
line interrupt it would spam the console/logs with reports of interrupts
that are of no interest to anyone.
MFC after: 1 week
Sponsored by: Chelsio Communications
Enable this for the NovAtel OEMv2 GPS receiver.
Not fixed: The receiver shows up as "<Interface 0>" in the device
tree, because that is literally what the descriptor-string is.
Reviewed by: hselasky@
When a command is finished running, we must transition it from INQUEUE
to busy state. We were failing to do that, so we hit a panic when the
commands were freed. This only affects mpr, mps already did simmilar
things. Now both the polling and interrupt paths properly set BUSY as
appropriate.
This device cannot cross a 4GB boundary with DMA. Removing the
boundary in r346386 resulted in low frequency memory corruption on
machines with isci(4) controllers.
Submitted by: gallatin@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20910
Add VMBus protocol version 4.0. and 5.0 to support Windows 10 and newer HyperV hosts.
For VMBus 4.0 and newer HyperV, the netvsc gpadl teardown must be done after vmbus close.
Submitted by: whu
MFC after: 2 weeks
Sponsored by: Microsoft
r348164 added code to iicbus_request_bus/iicbus_release_bus to automatically
call device_busy()/device_unbusy() as part of aquiring exclusive use of the
bus (so modules can't be unloaded while the bus is exclusively owned and/or
IO is in progress). That broke the ability to do i2c IO from a slave device
probe method, because the slave isn't attached yet, so calling device_busy()
triggers a sanity-check panic for trying to busy a non-attached device.
Now we check whether the device status is < DS_ATTACHING, and if so we busy
the iicbus rather than the slave device. I think this leaves a small window
where a module could be unloaded while probing is in progress. But I think
that's true of all devices, and probably should be fixed by introducing a
DS_PROBING state for devices, and handling that at various points in the
newbus code.
Eliminate the TIMEDOUT state. This state really conveyed two different
concepts: I timed out during recovery (and my command got put on the
recovery queue), and I timed out diring discovery (which doesn't).
Separate those two concepts into two flags. Use the TIMEDOUT flag to
fail requests as timed out. Use the on queue flag to remove them from
the queue.
In mps_intr_locked for MPI2_RPY_DESCRIPT_FLAGS_ADDRESS_REPLY message
type, when completing commands, ignore the ones that are not in state
INQUEUE. They were already completed as part of the recovery
process. When we complete them twice, we wind up with entries on the
free queue that are marked as busy, trigging asserts.
Reviewed by: scottl (earlier version, just for mpr)
Differential Revision: https://reviews.freebsd.org/D20785
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.
This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.
No functional change is intended. __FreeBSD_version is bumped.
Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247
table VCTRL registers.
Unconditionally program the MSI-X vector control Mask field for MSI-X
table entries without regarud for Mask's previous value. Some devices
return all zeros on reads of the VCTRL registers, which would cause us
to skip disabling interrupts. This fixes the Samsung SM961/PM961 SSDs
which are return zero starting from offset 0x3084 within the memory
region specified by BAR0, even when they are active MSI-X vectors.
The Illumos kernel writes these unconditionally to 0 or 1. However,
section 6.8.2.9 of the PCI Local Bus 3.0 spec (dated Feb 3, 2004)
states for bits 31::01:
After reset, the state of these bits must be 0. However, for
potential future use, software must preserve the value of
these reserved bits when modifying the value of other Vector
Control bits. If software modifies the value of these reserved
bits, the result is undefined."
so we always set or clear the Mask bit, but otherwise preserves the
old value.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713
Reviewed By: imp, jhb
Submitted by: Ka Ho Ng
MFC After: 1 week
Differential Revision: https://reviews.freebsd.org/D20873
While at it fix an invalid memory access issue when attaching external
USB HUBs, which are not mapped by ACPI, due to missing status check
when calling AcpiGetObjectInfo() from acpi_usb_hub_port_probe_cb().
Sponsored by: Mellanox Technologies
When the system has no graphical console, such as bhyve in common
configurations, ignore kern.vt.splash_cpu, instead of panicking
on INVARIANTS kernels.
Reviewed by: cem dumbbell
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20877
Print the adapter name rather than the address of the adapter
to avoid kernel address leakage.
PR: Bug 238642
Submitted by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed by: vmaffione
MFC after: 1 week
All MMCBR bridges have to implement all the MMCBR variables. This
implements them for everybody that currently doesn't.
A common routine for this should be written.
XCHAN_CAP_BOUNCE.
The only application that uses bounce buffering for now is the Government
Furnished Equipment (GFE) P2's dma core (AXIDMA) with its own dedicated
cacheless bounce buffer.
Sponsored by: DARPA, AFRL
Otherwise there is a window where they may be rescheduled. This
typically manifested as a page fault shortly after unloading if_iwm.ko.
Close the race by draining callouts after calling iwm_stop_device(),
which is also what Dragonfly does.
Change whitespace to reduce gratuitous diffs with Dragonfly.
Reported and tested by: seanc
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Previously the TOE code used its own custom unmapped mbufs via
EXT_FLAG_VENDOR1. The old version always wired the entire AIO request
buffer first for the duration of the AIO operation and constructed
multiple mbufs which used the wired buffer as an external buffer.
The new version determines how much room is available in the socket
buffer and only wires the pages needed for the available room building
chains of M_NOMAP mbufs. This means that a large AIO write will now
limit the amount of wired memory it uses to the size of the socket
buffer.
Reviewed by: gallatin, np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20839
that node is also compatible with syscon. For instance,
Rockchip RK3399's GRF (General Register Files) is compatible
with simple-mfd as well as syscon and has devices like
usb2-phy, emmc-phy and pcie-phy etc. under it.
Reviewed by: manu
This patch is the driver for NTB hardware in AMD SoCs (ported from Linux)
and enables the NTB infrastructure like Doorbells, Scratchpads and Memory
window in AMD SoC. This driver has been validated using ntb_transport and
if_ntb driver already available in FreeBSD.
Submitted by: Rajesh Kumar <rajesh1.kumar@amd.com>
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D18774
otherwise we panic.
dwmmc don't handle VCCQ (voltage for the IO line of the SD/eMMC) or
TIMING.
Add the needed accessor in the {read,write}_ivar functions.
Reviewed by: imp (previous version)
This patch fixes 2 panics. The first one is due to the current VNET not
being set in the emulated adapter transmission path. The second one
is caused by the M_PKTHDR flag not being set when preallocated mbufs
are recycled in the transmit path.
Submitted by: aleksandr.fedorov@itglobal.com
Reviewed by: vmaffione
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20824
The goal of this driver is consolidate information about SuperIO chips
and to provide for peaceful coexistence of drivers that need to access
SuperIO configuration registers.
While SuperIO chips can host various functions most of them are
discoverable and accessible without any knowledge of the SuperIO.
Examples are: keyboard and mouse controllers, UARTs, floppy disk
controllers. SuperIO-s also provide non-standard functions such as
GPIO, watchdog timers and hardware monitoring. Such functions do
require drivers with a knowledge of a specific SuperIO.
At this time the driver supports a number of ITE and Nuvoton (fka
Winbond) SuperIO chips.
There is a single driver for all devices. So, I have not done the usual
split between the hardware driver and the bus functionality. Although,
superio does act as a bus for devices that represent known non-standard
functions of a SuperIO chip. The bus provides enumeration of child
devices based on the hardcoded knowledge of such functions. The
knowledge as extracted from datasheets and other drivers.
As there is a single driver, I have not defined a kobj interface for it.
So, its interface is currently made of simple functions.
I think that we can the flexibility (and complications) when we actually
need it.
I am planning to convert nctgpio and wbwd to superio bus very soon.
Also, I am working on itwd driver (watchdog in ITE SuperIO-s).
Additionally, there is ithwm driver based on the reverted sensors
import, but I am not sure how to integrate it given that we still lack
any sensors interface.
Discussed with: imp, jhb
MFC after: 7 weeks
Differential Revision: https://reviews.freebsd.org/D8175
That is, instead of the current GPIO00 - GPIO15 the names will be GPIO00
- GPIO07, GPIO10 - GPIO17. The first digit is a GPIO "bank" / group
number and the second one is a pin number within the bank. Alternative
view is that the pin names are changed from decimal numbering scheme to
octal one (as there are 8 pins per bank).
Discussed with: cem, gonzo
MFC after: 2 weeks
With more ports, some of the registers are shifted a bit to accommodate.
This switch also adds two high speed Serdes/SGMII interfaces (2.5 Gb/s).
Sponsored by: Rubicon Communications, LLC (Netgate)
Since cxgbe(4) uses sglist instead of bus_dma, this required updates
to the code that generates scatter/gather lists for packets. Also,
unmapped mbufs are always sent via DMA and never as immediate data in
the payload of a work request.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Discussed with: np
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
address before returning it to the user. Some of the least significant
bits have special meaning and should be masked away.
Discussed with: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
handle_ddp_close.
This eliminates a bad race where an aio_ddp_requeue that happened to run
after handle_ddp_close could bump up the active count.
Discussed with: jhb@
MFC after: 3 days
Sponsored by: Chelsio Communications
t_maxseg was changed in r293284 to not have any adjustment for TCP
timestamps. t4_tom inadvertently went back to pre-r293284 semantics
in r332506.
Sponsored by: Chelsio Communications
This fixes (userspace) console on the Marvell MACCHIATObin in ACPI mode with
latest TianoCore EDK2 firmware.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: mw, bcran
Differential Revision: https://reviews.freebsd.org/D20765
Previously, the aiotx task relied on the aio jobs in the queue to hold
a reference on the socket. However, when the last job is completed,
there is nothing left to hold a reference to the socket buffer lock
used to check if the queue is empty. In addition, if the last job on
the queue is cancelled, the task can run with no queued jobs holding a
reference to the socket buffer lock the task uses to notice the queue
is empty.
Fix these races by holding an explicit reference on the socket when
the task is queued and dropping that reference when the task
completes.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20539
The format to use depends on hardware configuration (synthesis-time),
so make it compile-time kernel option.
Extended format allows DMA engine to operate with 64-bit memory addresses.
Sponsored by: DARPA, AFRL
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
Reviewed by: mizhka
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20459
Use it to indicate whether the page may be safely freed following
its removal from the object. Also change vm_page_remove() to assume
that the page's object pointer is non-NULL, and have callers perform
this check instead.
This is a step towards an implementation of an atomic reference counter
for each physical page structure.
Reviewed by: alc, dougm, kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20758
"fdt" is removed from the driver module name as the driver does not
require FDT and can work very well on hints based systems.
A module dependency is added for gpiobus. Without that owc cannot
resolve symbols in gpiobus if both are loaded as kernel modules.
Finally, a driver module module version is added.
Reviewed by: imp
MFC after: 11 days
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.
Numerous posts to arch@ and other locations have found no actual users
for this software.
Relnotes: Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
Since SES specs do not define mechanism to map enclosure slots to SATA
disks, AHCI EM code I written many years ago appeared quite useless,
that always bugged me. I was thinking whether it was a good idea, but
if LSI HBAs do that, why I shouldn't?
This change introduces simple non-standard mechanism for the mapping
into both AHCI EM and SES code, that makes AHCI EM on capable controllers
(most of Intel's) a first-class SES citizen, allowing it to report disk
physical path to GEOM, show devices inserted into each enclosure slot in
`sesutil map` and `getencstat`, control locate and fault LEDs for specific
devices with `sesutil locate adaX on` and `sesutil fault adaX on`, etc.
I've successfully tested this on Supermicro X10DRH-i motherboard connected
with sideband cable of its S-SATA Mini-SAS connector to SAS815TQ backplane.
It can indicate with LEDs Locate, Fault and Rebuild/Remap SES statuses for
each disk identical to real SES of Supermicro SAS2 backplanes.
MFC after: 2 weeks