Pollution from counter.h made __pcpu visible in amd64/pmap.c. Delete
the existing extern decl of __pcpu in amd64/pmap.c and avoid referring
to that symbol, instead accessing the pcpu region via PCPU_SET macros.
Also delete an unused extern decl of __pcpu from mp_x86.c.
Reviewed by: kib
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11666
The mutex protecting access to the registered realtime clock should not be
overloaded to protect access to the atrtc hardware, which might not even be
the registered rtc. More importantly, the resettodr mutex needs to be
eliminated to remove locking/sleeping restrictions on clock drivers, and
that can't happen if MD code for amd64 depends on it. This change moves the
protection into what's really being protected: access to the atrtc date and
time registers.
This change also adds protection when the clock is accessed from
xentimer_settime(), which bypasses the resettodr locking.
Differential Revision: https://reviews.freebsd.org/D11483
--Remove special-case handling of sparc64 bus_dmamap* functions.
Replace with a more generic mechanism that allows MD busdma
implementations to generate inline mapping functions by
defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This
is currently useful for sparc64, x86, and arm64, which all
implement non-load dmamap operations as simple wrappers
around map objects which may be bus- or device-specific.
--Remove NULL-checked bus_dmamap macros. Implement the
equivalent NULL checks in the inlined x86 implementation.
For non-x86 platforms, these checks are a minor pessimization
as those platforms do not currently allow NULL maps. NULL
maps were originally allowed on arm64, which appears to have
been the motivation behind adding arm[64]-specific barriers
to bus_dma.h, but that support was removed in r299463.
--Simplify the internal interface used by the bus_dmamap_load*
variants and move it to bus_dma_internal.h
--Fix some drivers that directly include sys/bus_dma.h
despite the recommendations of bus_dma(9)
Reviewed by: kib (previous revision), marius
Differential Revision: https://reviews.freebsd.org/D10729
Do not queue dmar_map_entries with zeroed gseq to
dmar_qi_invalidate_locked(). Zero gseq stops the processing in the qi
task. Do not assign possibly uninitialized on-stack gseq to map
entries when requeuing them on unit tlb_flush queue. Random garbage
in gsec is interpreted as too high invalidation sequence number and
again stop the processing in the task.
Make the sequence numbers generation completely contained in
dmar_qi_invalidate_locked() and dmar_qi_emit_wait_seq(). Upper code
directly passes boolean requesting emiting wait command instead of
trying to provide hint to avoid it by passing NULL gseq pointer.
Microoptimize the requeueing to tlb_flush queue by doing it for the
whole queue.
Diagnosed and tested by: Brett Gutstein <bgutstein@rice.edu>
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
All interrupts are routed to the sole CPU in that case implicitly.
This is a regression in EARLY_AP_STARTUP. Previously the 'assign_cpu'
variable was only set when a multi-CPU system finished booting, so
it's value both meant that interrupts could be assigned and that
there was more than one CPU.
PR: 219882
Reported by: ota@j.email.ne.jp
MFC after: 3 days
Do not try to set LMA bit while CPU is still in legacy mode.
Apparently Intel CPUs ignore non-id writes to LMA, while AMD's
(over-)react with #GP.
Reported and tested by: danfe
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
An extra copy of the system call gate was added to the default LDT back
in 1996 (r18513 / r18514). However, the ability to run BSD/OS 2.1
i386 binaries under FreeBSD's native ABI is most likely no longer
needed.
Discussed with: kib
This patch improves the boundary checks in busdma to allow more cases
using the regular page based kernel memory allocator. Especially in
the case of having a non-zero boundary in the parent DMA tag. For
example AMD64 based platforms set the PCI DMA tag boundary to
PCI_DMA_BOUNDARY, 4GB, which before this patch caused contiguous
memory allocations to be preferred when allocating more than PAGE_SIZE
bytes. Even if the required alignment was less than PAGE_SIZE bytes.
This patch also fixes the nsegments check for using kmem_alloc_attr()
when the maximum segment size is less than PAGE_SIZE bytes.
Updated some comments describing the code in question.
Differential Revision: https://reviews.freebsd.org/D10645
Reviewed by: kib, jhb, gallatin, scottl
MFC after: 1 week
Sponsored by: Mellanox Technologies
operation after processor is configured to allow all required
features.
In particular, NX must be enabled in EFER, otherwise load of page
table element with nx bit set causes reserved bit page fault. Since
malloc uses direct mapping for small allocations, in particular for
the suspension pcbs, and DMAP is nx after r316767, this commit tripped
fault on resume path.
Restore complete state of EFER while wakeup code is still executing
with custom page table, before calling resumectx, instead of trying to
guess which features might be needed before resumectx restored EFER on
its own.
Bisected and tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
In exceptional circumstances, an MCA exception will trigger when the
freelist is exhausted. In such a case, no error will be logged on the list
and 'mca_count' will not be incremented.
Prior to this patch, all CPUs that received the exception would spin
forever.
With this change, the CPU that detects the error but finds the freelist
empty will proceed to panic the machine, ending the deadlock.
A follow-up to r260457.
Reported by: Ryan Libby <rlibby at gmail.com>
Reviewed by: jhb@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10536
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.
Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.
This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html
Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().
Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313
This is applicable only to the older processors that do not have the AMD
Topology extension.
Opteron 6100-series "Magny-Cours" processors had multiple nodes within a
package and didn't have the Topology extension. Without this change
FreeBSD would assume that those processors have a single L3 cache shared
by all cores while, in fact, each node has its own L3 cache.
Many thanks to Freddie Cash <fjwcash@gmail.com> for providing valuable
hardware information.
MFC after: 2 weeks
The change introduced a dependency between genassym.c and header files
generated from .m files, but that dependency is not specified in the
make files.
Also, the change could be not as useful as I thought it was.
Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others
The change seems to be more in the nomenclature than in the way the
topology is advertised by the hardware.
Tested by: truckman (earlier version of the change)
MFC after: 2 weeks
Implement timeouts for register-based DMAR commands. Tunable/sysctl
hw.dmar.timeout specifies the timeout in nanoseconds, set it to zero
to allow infinite wait. Default is 1ms.
Runtime modification of the sysctl is not safe, it is allowed for
debugging.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Kernel environment variable hw.busdma.default can take values 'bounce'
and 'dmar' and selects corresponding busdma backend as default.
Per-device environment variable hw.busdma.pci<domain>.<bus>.<slot>.<func>
takes the same values and overrides hw.busdma.default for the given device.
Note that even with hw.busdma.default=bounce, DMA translation engines
are still started if DMARs are enabled, to disable them use
hw.dmar.dma tunable, as before.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The change is more intrusive than I would like because the feature
requires that a vector number is written to a special register.
Thus, now the vector number has to be provided to lapic_eoi().
It was readily available in the IO-APIC and MSI cases, but the IPI
handlers required more work.
Also, we now store the VMM IPI number in a global variable, so that it
is available to the justreturn handler for the same reason.
Reviewed by: kib
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D9880
As noted in the comment, nothing special needs to be done to destroy
the unneeded context after the allocation race, but the context memory
itself still should to be freed.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It does not make sense since identity mapping already provides the
required mapping for RMRR ranges. More, since identity page tables do
not reflect content of map entries for id domains, creating RMRR
entries makes domain data inconsistent.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Ignore them like it's done in the MADT parser. This allows booting on a box
with SRAT and APIC IDs > 255.
Reported by: Wei Liu <wei.liu2@citrix.com>
Tested by: Wei Liu <wei.liu2@citrix.com>
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Citrix Systems R&D
1. There a was a typo in one place where the processor family is
checked (16 vs 0x16). Now the checks are consolidated in a single
function.
2. Instead of an array of struct amd_et_state objects the code allocated
an array of pointers. That was no problem on amd64 where the sizes
are the same, but could be a problem on i386.
Reported by: tuexen and others
Tested by: tuexen (earlier version of the fix)
Pointyhat to: avg
MFC after: 5 days
X-MFC with: r314636
Currently the feature is implemented only for a subset of errors
reported via Bank 4. The subset includes only DRAM-related errors.
The new code builds upon and reuses the Intel CMC (Correctable MCE
Counters) support code. However, the AMD feature is quite different
and, unfortunately, much less regular.
For references please see AMD BKDGs for models 10h - 16h.
Specifically, see MSR0000_0413 NB Machine Check Misc (Thresholding)
Register (MC4_MISC0).
http://developer.amd.com/resources/developer-guides-manuals/
Reviewed by: jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9613
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
The extended LVT entries can be used to configure interrupt delivery
for various events that are internal to a processor and can use this
feature.
All current processors that support the feature have four of such entries.
The entries are all masked upon the processor reset, but it's possible
that firmware may use some of them.
BIOS and Kernel Developer's Guides for some processor models do not assign
any particular names to the extended LVTs, while other BKDGs provide names
and suggested usage for them.
However, there is no fixed mapping between the LVTs and the processor
events in any processor model that supports the feature. Any entry can be
assigned to any event. The assignment is done by programming an offset
of an entry into configuration bits corresponding to an event.
This change does not expose the flexibility that the feature offers.
The change adds just a single method to configure a hardcoded extended LVT
entry to deliver APIC_CMC_INT. The method is designed to be used with
Machine Check Error Thresholding mechanism on supported processor models.
For references please see BKDGs for families 10h - 16h and specifically
descriptions of APIC30, APIC400, APIC[530:500] registers.
For a description of the Error Thresholding mechanism see, for example,
BKDG for family 10h, section 2.12.1.6.
http://developer.amd.com/resources/developer-guides-manuals/
Thanks to jhb and kib for their suggestions.
Reviewed by: kib
Discussed with: jhb
MFC after: 5 weeks
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D9612
The fixed is used only to fix up buggy MPTable information and the
trigger mode is probably ignored for the relevant interrupt types
anyway. Still, it's better to be standards compliant and have the code
do what it says it does.
Discussed with: jhb
MFC after: 5 days
The ifdefs were '#if !defined(__i386__) || !defined(PC98)' previously,
so cpu_idle_acpi was enabled both i386 and amd64 except PC98.
I was obfuscated by '#if !defined(__i386__)' condition.
Submitted by: bde
Reported by: bde
Convert PCIe hot plug support over to asking the firmware, if any, for
permission to use the HotPlug hardware. Implement pci_request_feature
for ACPI. All other host pci connections to allowing all valid feature
requests.
Sponsored by: Netflix
We have an original panic. Then, instead of writing the core to the dump
device, the kernel has a second panic: "smp_targeted_tlb_shootdown:
interrupts disabled". This change is an attempt to fix that second panic.
When the other CPUs are stopped, we can't notify them of the TLB shootdown,
so we skip that operation. However, when the CPUs come back up, we
invalidate the TLB to ensure they correctly observe any changes to the
page mappings.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9786
On Core2 and older Intel CPUs, where TSC stops in C2, system does not
allow C2 entrance if timecounter hardware is TSC. This is done by
tc_windup() which tests for TC_FLAGS_C2STOP flag of the new
timecounter and increases cpu_disable_c2_sleep if flag is set. Right
now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but
TSC is initialized too early for this variable to be set by
acpi_cpu.c.
There is no reason to require that ACPI reported C2 and deeper states
to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from
init_TSC_tc() condition. And since this is the only use of the
variable, remove it at all.
Reported and submitted by: Jia-Shiun Li <jiashiun@gmail.com>
Suggested by: jhb
MFC after: 2 weeks
Use large enough type for calculation of mtrr physmask. Typical
cpu_maxphyaddr is 36 or larger.
Reported and tested by: sbruno
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
and there is no reason to check cpu family or vendor.
Noted by: royger
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D9657
code. Also fix cast and remove unneeded XXX in comment.
Noted and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D9657
compile options. Remove doxygen pointers to now deleted files. Remove
EISA and VME as examples in bus_space.9.
Retained EISA mode code for IO PIC and MPTABLES because that's not
EISA bus, per se, and some people have abused EISA to mean "EISA-like
behavior as opposed to ISA" rather than using it for EISA add-in
cards.
Relnotes: yes
machines, only a few 486 machines that used it, and those haven't had
enough memory to run FreeBSD for quite some time (often limited to
16MB).
Not to be confused with the Machine Check Architecture, which is still
very much alive and used (and untouched by this commit).
No Objection From: arch@