The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.
Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.
t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.
t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.
VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.
Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.
Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.
Reviewed by: np
MFC after: 2 months
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7599
pmap_early_io_map()/pmap_early_io_unmap(), if used in pairs, should be used in
the form:
pmap_early_io_map()
..do stuff..
pmap_early_io_unmap()
Without other allocations in the middle. Without reclaiming memory this can
leave large holes in the device space.
While here, make a simple change to the unmap loop which now permits it to unmap
multiple TLB entries in the range.
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714
Summary:
1) Attach problem - mpc85xx_probe() relies on fact that 0xfff0 mask matches all
QorIQ CPUs what is not true since e6500. This shall be reworked to match against
all supported CPUs.
2) There is no any reason for operating system to re-program or anyhow else
touch the LAWs programmed by firmware (u-boot). Right now mpc85xx_attach()
removes all LaW entries except for DRAM. This causes MCE to be generated when
later any of driver maps DTB-provided hardware addresses which do not exist
anymore because corresponding LaWs were removed.
Submitted by: Ivan Krivonos <int0dster_AT_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7663
Summary:
First time BSS is cleared in booke_init(), Second time it's cleared in
powerpc_init(). Any variable initialized between two those guys gets wiped out
what is wrong. In particular it wipes tlb1_entries initialized by tlb1_init(),
which was fine when tlb1_init() was called a second time, but this was removed
in r304656.
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7638
Summary:
Kernel maps only one page of FDT. When FDT is more than one page in size, data
TLB miss occurs on memmove() when FDT is moved to kernel storage
(sys/powerpc/booke/booke_machdep.c, booke_init())
This introduces a pmap_early_io_unmap() to complement pmap_early_io_map(), which
can be used for any early I/O mapping, but currently is only used when mapping
the fdt.
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7605
Summary:
There is no need to call tlb1_init() twice. Now it is called first time from
booke_init() and second time from powerpc_init() (where it is under BOOKE
switch). Although this does not cause immediate problems in the mainline kernel,
this can lead to undesirable side effects like two TLB entries with the same VA
in the TLB1. Presence of two TLB entries with the same VA can hang CPU.
Test Plan:
Add initial mapping for UART to the tlb1_init(), build and boot the kernel,
ensure that mapping presents only once (most convinient way - through Lauterbah
or similar hardware debugger)
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7607
Summary: Current booke/pmap code ignores mas7 and mas8 on e6500 CPU.
Submitted by: Ivan Krivonos <int0dster_gmail.com>
Differential Revision: https://reviews.freebsd.org/D7606
mpc85xx_map_dcsr() returns a vm_offset_t, not an error code.
mpc85xx_fix_errata() will gracefully exit if mpc85xx_map_dcsr() returns 0, as
that indicates an error (NULL pointer).
__syncicache() only syncs the icache on the current CPU, it doesn't touch the
cache on any other core. Replace the call with cpu_flush_dcache() instead.
Since bp_kernload is not touched again by the boot CPU in this code path, dcbf
is no less efficient than the dcbst from __syncicache() by invalidating the
cache line.
Summary:
There is often a need at the debugger to print arbitrary special
purpose registers (SPRs) on PowerPC. Using a rewritable asm stub, print any SPR
provided on the command line.
Note, as there is no checking in this, attempting to print a nonexistent SPR
may cause a Program exception (illegal instruction, or boundedly undefined).
Note also that this relies on the kernel text pages being writable. If in the
future this is made not the case, this will need to be reworked.
Test Plan:
Printing the Processor Version Register (PVR, SPR 287):
db> show spr 11f
SPR 287(11f): 80240012
Differential Revision: https://reviews.freebsd.org/D7403
Summary:
u-boot, following the ePAPR specification, puts secondary cores into a
spinloop at boot, rather than leaving them shut off. It then relies on the host
OS to write the correct values to a special spin table, located in coherent
memory (on newer implementations), or noncoherent memory (older
implementations).
This supports both implementations of ePAPR, as well as continuing to support
non-ePAPR booting, by first attempting to use the spintable, and falling back to
expecting non-started CPUs.
Test Plan:
Booted on a P5020 board. Tested before and after the changes.
Before the changes, prints the error "SMP: CPU 1 already out of hold-off state!"
and panics shortly thereafter. After the changes, same boot method lets it
complete boot.
Reviewed by: nwhitehorn
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D7494
Several files use the internal name of `struct device` instead of
`device_t` which is part of the public API. This patch changes all
`struct device *` to `device_t`.
The remaining occurrences of `struct device` are those referring to the
Linux or OpenBSD version of the structure, or the code is not built on
FreeBSD and it's unclear what to do.
Submitted by: Matthew Macy <mmacy@nextbsd.org> (previous version)
Approved by: emaste, jhibbits, sbruno
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7447
Without enabling this bit, tlbre and tlbsx don't update the MAS7 register,
resulting in garbage in the register after a read (rather, the previous setting
of it for a tlbwe). This can result in mmu_booke_mapdev_attr() thinking
mappings that should match actually don't, because tlb1_read_entry() can't
determine the correct address of a given entry.
MFC after: 11-RELEASE
bouncing of unmapped buffers. Also treat userspace buffers as unmapped, to
avoid borrowing the UVA for copies. This allows sync'ing userspace buffers
outside the context of the owning process, and sync'ing bounced maps in
non-sleepable contexts.
This change is equivalent to r286787 for x86.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D3989
Summary:
MPC85XX and QorIQ are very similar. When the DPAA dTSEC driver was
added, QORIQ_DPAA was brought in as a config option to support the differences
in hardware register settings between QorIQ (e500mc-, e5500- based) SoCs and
QUICC (e500v1/e500v2-based) SoCs, particularly in the Local Access Window (LAW)
target settings.
Unify these settings using macros to hide details and ease porting, and use a
new function (mpc85xx_is_qoriq()) to distinguish between QorIQ and QUICC SoCs at
runtime.
An alternative to using the function could be to use a variable initialized at
platform attach time, which may incur less overhead at runtime. Since it's not
in the critical path once booted, this optimization doesn't seem necessary at
first pass.
Reviewed by: nwhitehorn
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D7294
Though the chances of the code in these sections changing are low, future-proof
the sections and use label math.
Renumber the surrounding areas to avoid duplicate label numbers.
L3 cache is not defined by Book-E, so is platform specific. Since it was
already moved for e500-based devices into mpc85xx in r292903, just eliminate it
altogether. Any device that supports L3 cache should have its own platform
means to enable it.
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.
There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.
PR: kern/210106
Reviewed by: jhb
Approved by: re (glebius)
Remove the use of fdt_data_to_res(), and instead construct the resources
manually. Additionally, avoid the 32-bit size limitation of fdt_data_get(), by
building physical addresses manually from the lbc ranges property.
Approved by: re@(gjb)
late boot: enable it explicitly after installing the page tables. If booting
from an FDT, also make sure to escape the firmware's MMU context early
before overwriting firmware page tables.
Approved by: re (gjb)
Most of the effect of setting MSR[SF] is that the CPU will stop ignoring
the high 32 bits of registers containing addresses in load/store
instructions. As such, the kernel was setting it only when it began to
need access to high memory. MSR[SF] also affects the operation of some
conditional instructions, however, and so setting it at late times could
subtly break code at very early times. This fixes use of the FDT mode in
loader, and FDT boot more generally, on 64-bit PowerPC systems.
Hardware provided by: IBM LTC
Approved by: re (kib)
threads, to make it less confusing and using modern kernel terms.
Rename the functions to reflect current use of the functions, instead
of the historic KSE conventions:
cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads)
cpu_set_upcall -> cpu_copy_thread (for forks)
cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation)
Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
Differential revision: https://reviews.freebsd.org/D6731
After r285994, sysctl(8) was fixed to use 273.15 instead of 273.20 as 0C
reference and as result, the temperature read in sysctl(8) now exibits a
+0.1C difference.
This commit fix the kernel references to match the reference value used in
sysctl(8) after r285994.
Sponsored by: Rubicon Communications (Netgate)
PCI-express HotPlug support is implemented via bits in the slot
registers of the PCI-express capability of the downstream port along
with an interrupt that triggers when bits in the slot status register
change.
This is implemented for FreeBSD by adding HotPlug support to the
PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges
representing downstream ports on HotPlug slots. The PCI-PCI bridge
driver registers an interrupt handler to receive HotPlug events. It
also uses the slot registers to determine the current HotPlug state
and drive an internal HotPlug state machine. For simplicty of
implementation, the PCI-PCI bridge device detaches and deletes the
child PCI device when a card is removed from a slot and creates and
attaches a PCI child device when a card is inserted into the slot.
The PCI-PCI bridge driver provides a bus_child_present which claims
that child devices are present on HotPlug-capable slots only when a
card is inserted. Rather than requiring a timeout in the RC for
config accesses to not-present children, the pcib_read/write_config
methods fail all requests when a card is not present (or not yet
ready).
These changes include support for various optional HotPlug
capabilities such as a power controller, mechanical latch,
electro-mechanical interlock, indicators, and an attention button.
It also includes support for devices which require waiting for
command completion events before initiating a subsequent HotPlug
command. However, it has only been tested on ExpressCard systems
which support surprise removal and have none of these optional
capabilities.
PCI-express HotPlug support is conditional on the PCI_HP option
which is enabled by default on arm64, x86, and powerpc.
Reviewed by: adrian, imp, vangyzen (older versions)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6136
This allows the PCI-PCI bridge driver to save a reference to the child
device in its softc.
Note that this required moving the "pci" device creation out of
acpi_pcib_attach(). Instead, acpi_pcib_attach() is renamed to
acpi_pcib_fetch_prt() as it's sole action now is to fetch the PCI
interrupt routing table.
Differential Revision: https://reviews.freebsd.org/D6021
Rescanning a PCI bus uses the following steps:
- Fetch the current set of child devices and save it in the 'devlist'
array.
- Allocate a parallel array 'unchanged' initalized with NULL pointers.
- Scan the bus checking each slot (and each function on slots with a
multifunction device).
- If a valid function is found, look for a matching device in the 'devlist'
array. If a device is found, save the pointer in the 'unchanged' array.
If a device is not found, add a new device.
- After the scan has finished, walk the 'devlist' array deleting any
devices that do not have a matching pointer in the 'unchanged' array.
- Finally, fetch an updated set of child devices and explicitly attach any
devices that are not present in the 'unchanged' array.
This builds on the previous changes to move subclass data management into
pci_alloc_devinfo(), pci_child_added(), and bus_child_deleted().
Subclasses of the PCI bus use custom rescan logic explicitly override the
rescan method to disable rescans.
Differential Revision: https://reviews.freebsd.org/D6018
When ORing in a register_t to a wider integer (vm_paddr_t), it gets sign
extended, so high addresses overwrite the upper word with all 0xf. Cast to the
unsigned form (u_register_t), to avoid this problem, and get correct addresses
printed.
With this, a static environment can be compiled in via config(5). This allows,
among other things, the use of a compiled-in debug console (hw.uart.dbgport) for
kgdb.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.