Some early PCIe chipsets are explicitly listed in the white-list to
enable use of the MMIO config space accesses, perhaps because ACPI
tables were not reliable source of the base MCFG address at that time.
For that chipsets, MCFG base was read from the known chipset MCFGbase
config register.
During very early stage of boot, when access to the PCI config space
is performed (see e.g. pci_early_quirks.c), we cannot map 255MB of
registers because the method used with pre-boot pmap overflows initial
kernel page tables.
Move fallback to read MCFGbase to the attachment method of the
x86/legacy device, which removes code duplication, and results in the
use of io accesses until MCFG is parsed or legacy attach called.
For amd64, pre-initialize cfgmech with CFGMECH_1, right now we
dynamically assign CFGMECH_1 to it anyway, and remove checks for
CFGMECH_NONE.
There is a mention in the Intel documentation for corresponding
chipsets that OS must use either io port or MMIO access method, but we
already break this rule by reading MCFGbase register, so one more
access seems to be innocent.
Reported by: longwitz@incore.de
PR: 236838
Reviewed by: avg (other version), jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19833
Apparently AMD machines cannot tolerate this. This was uncovered by
r339386, where cache flush started really flushing the requested range.
Introduce pmap_mapdev_pciecfg(), which simply does not flush cache
comparing with pmap_mapdev(). It assumes that the MCFG region was
never accessed through the cacheable mapping, which is most likely
true for machine to boot at all.
Note that i386 does not need the change, since the architecture
handles access per-page due to the KVA shortage, and page remapping
already does not flush the cache.
Reported and tested by: mjg, Mike Tancsa <mike@sentex.net>
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D17612
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
shifts into the sign bit. Instead use (1U << 31) which gets the
expected result.
This fix is not ideal as it assumes a 32 bit int, but does fix the issue
for most cases.
A similar change was made in OpenBSD.
Discussed with: -arch, rdivacky
Reviewed by: cperciva
AMD BKDG for CPU families 10h and later requires that the memory
mapped config is always read into or written from al/ax/eax register.
Discussed with: kib, alc
Reviewed by: kib (earlier version)
MFC after: 25 days
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
unsigned before comparison with maximum value to cut off negative
values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
values are already checked in the subsequent switch
Reviewed by: jhb
MFC after: 1 week
the requested PCI bus falls outside of the bus range given in the ACPI
MCFG table. Several BIOSes seem to not include all of the PCI busses in
systems in their MCFG tables. It maybe that the BIOS is simply buggy and
does support all the busses, but it is more conservative to just fall back
to the old method unless it is certain that memory accesses will work.
memory-mapped config access. Add a workaround for these systems by
checking the first function of each slot on bus 0 using both the
memory-mapped config access and the older type 1 I/O port config access.
If we find a slot that is only visible via the type 1 I/O port config
access, we flag that slot. Future PCI config transactions to flagged
slots on bus 0 use type 1 I/O port config access rather than memory mapped
config access.
- Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the
rest of the kernel. It now also accepts parameters via function
arguments rather than global variables.
- Add a notion of minimum and maximum bus numbers and reject requests for
an out of range bus.
- Add more range checks on slot/func/reg/bytes parameters to the cfg reg
read/write routines. Don't panic on any invalid parameters, just fail
the request (writes do nothing, reads return -1). This matches the
behavior of the other cfg mechanisms.
- Port the memory mapped configuration space access to amd64. On amd64
we simply use the direct map (via pmap_mapdev()) for the memory mapped
window.
- During acpi_attach() just after loading the ACPI tables, check for a
MCFG table. If it exists, call pciereg_cfgopen() on each subtable
(memory mapped window). For now we only support windows for domain 0
that start with bus 0. This removes the need for more chipset-specific
quirks in the MD code.
- Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets
since these machines should all have MCFG tables via ACPI.
- Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen()
earlier.
MFC after: 2 weeks
- On amd64, just assume type #1 is always used. PCI 2.0 mandated
deprecated type #2 and required type #1 for all future bridges which
was well before amd64 existed.
- For i386, ignore whatever value was in 0xcf8 before testing for type #1
and instead rely on the other tests to determine if type #1 works. Some
newer machines leave garbage in 0xcf8 during boot and as a result the
kernel doesn't find PCI at all (which greatly confuses ACPI which expects
PCI to exist when PCI busses are in the namespace).
MFC after: 3 days
Discussed with: scottl
other OSes (Solaris, Linux, VxWorks). It's not necessary to write a 0
to the config address register when using config mechanism 1 to turn
off config access. In fact, it can be downright troublesome, since it
seems to confuse the PCI-PCI bridge in the AMD8111 chipset and cause
it to sporadically botch reads from some devices. This is the cause
of the missing USP ports problem I was experiencing with my Sun Opteron
system.
Also correct the case for mechanism 2: it's only necessary to write
a 0 to the ENABLE port.
a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to
attempt to get a stable base to start from. There is a lot missing still.
Worth noting:
- The kernel runs at 1GB in order to cheat with the pmap code. pmap uses
a variation of the PAE code in order to avoid having to worry about 4
levels of page tables yet.
- It boots in 64 bit "long mode" with a tiny trampoline embedded in the
i386 loader. This simplifies locore.s greatly.
- There are still quite a few fragments of i386-specific code that have
not been translated yet, and some that I cheated and wrote dumb C
versions of (bcopy etc).
- It has both int 0x80 for syscalls (but using registers for argument
passing, as is native on the amd64 ABI), and the 'syscall' instruction
for syscalls. int 0x80 preserves all registers, 'syscall' does not.
- I have tried to minimize looking at the NetBSD code, except in a couple
of places (eg: to find which register they use to replace the trashed
%rcx register in the syscall instruction). As a result, there is not a
lot of similarity. I did look at NetBSD a few times while debugging to
get some ideas about what I might have done wrong in my first attempt.
#if'ed out for a while. Complete the deed and tidy up some other bits.
We need to be able to call this stuff from outer edges of interrupt
handlers for devices that have the ISR bits in pci config space. Making
the bios code mpsafe was just too hairy. We had also stubbed it out some
time ago due to there simply being too much brokenness in too many systems.
This adds a leaf lock so that it is safe to use pci_read_config() and
pci_write_config() from interrupt handlers. We still will use pcibios
to do interrupt routing if there is no acpi.. [yes, I tested this]
Briefly glanced at by: imp
o It turns out that we always need to try to route the interrupts for
the case where the $PIR tells us there can be only one. Some machines
require this, while others fail when we try to do this (bogusly, imho).
Since we have no apriori way of knowing which is which, we always try to
do the routing and hope for the best if things fail.
o Add some additional comments that state the obvious, but amplify it in
non-obvious ways (judging from the questions I've gotten).
This should un-break older laptops that still have to use PCIBIOS to route
interrupts.
Tested by: sam
Use exact width types, since this is a MD file and won't be used elsewhere.
Fix a couple of resulting printf breakages
Bug found by: phk using Flexlint
there are some strange machines that seem to need this.
o delete bogus comment.
o don't use the the bios for read/writing config space. They interact badly
with SMP and being called from ISR. This brings -current in line with
-stable.
# make the latter #ifdef on USE_PCI_BIOS_FOR_READ_WRITE in case we
# need to go back in a hurry.
IRQ for an entry in a PCIBIOS interrupt routing ($PIR) table.
- Change pci_cfgintr() to except the current IRQ of a device as a fourth
argument and to use that IRQ for the device if it is valid.
- If an intpin entry in a $PIR entry has a link of 0, it means that that
intpin isn't connected to anything that can trigger an interrupt. Thus,
test the link against 0 to find invalid entries in the table instead of
implicitly relying on the irqs field to be zero. In the machines I have
looked at, intpin entries with a link of 0 often have the bits for all
possible interrupts for PCI devices set.
not the 'entry' member. The entry point is formed from both a base and
a relative entry point. 'entry' is that relative offset. It is perfectly
valid to have an entry point with a relative offset of 0. PCIbios.ventry
is the virtual address of the entry point that takes both 'base' and
'entry' into account, thus it is the proper variable to test to see if we
have an entry point or not.
Don't require pin be non-zero before we map bogus intlines, always do it.
This fixes a number of problems on HP Omnibook computers.
Tested/Reviewed by: Brooks Davis
2, but that's not the case. This fixes the case where there were slots
in the PIR table that had no bits set, but we assumed they did and used
strange results as a result.
o Map invalid INTLINE registers to 255 in pci_cfgreg.c. This should allow
us to remove the bogus checks in MI code for non-255 values.
I put these changes out for review a while ago, but no one responded
to them, so into current they go.
This should help us work better on machines that don't route
interrupts in the traditional way.
MFC After: 4286 millifortnights
older PCI BIOSes hate this and this leads to panics when it is done. Also,
assume that a uniquely routed interrupt is already routed. This also
seems to help some older laptops with feable BIOSes cope.
This typo keeps us from properly routing an interrupt for CardBus
bridges on this machine. So, now we look for $PIR and then _PIR to
cope. With these changes, the Libretto L1 now works properly.
Evidentally, the idea comes from patch that the Japanese version of
RedHat (or against a Japanese version of Red Hat), but my Japanese
isn't good enough to to know for sure.
Reported by: Hiroyuki Aizu-san <eyes@navi.org>
# This may be an MFC candidate, but I'm not yet sure.
Merge in the irq 0 detection. Add comment about why.
If we have irq 0, ignore it like we do irq 255. Some BIOS writers aren't
careful like they should be.
multiple times, others do. The last strategy, which was to assume
that already routed interrupts were good and just return them doesn't
work for some laptops. So, instead, we have a new strategy: we notice
that we have an interrupt that's already routed. We go ahead and try
to route it, none the less. We will assume that it is correctly
routed, even if the route fails. We still assume that other failures
in the bios32 call are because the interrupt is NOT routed.
Note: some laptops do not support the bios32 interface to PCI BIOS and
we need to call it via the INT 2A interface. That is another windmill
to till at later.
Also correct a minor typo and minor whitespace nits.
Strong MFC candidate.