pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.
XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.
Tested by: csjp
Reviewed by: alc
MFC after: 3 days
are currently two problems with the use of 1GB page mappings by the direct
map. First, at least one device driver uses pmap_extract() rather than
DMAP_TO_PHYS() to translate a direct map address to a physical address.
Unfortunately, neither pmap_extract() nor pmap_kextract() yet support 1GB
page mappings. Second, pmap_bootstrap() needs to interrogate the MTRRs to
ensure that a 1GB page mapping doesn't span two MTRRs of different types.
Reported and tested by: Daniel O'Connor
MFC after: 3 days
devices that we also support, just not by default (thus only LINT or
module builds by default).
While currently there is only "/dev/full" [2], we are planning to see more
in the future. We may decide to change the module/dependency logic in the
future should the list grow too long.
This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.
Suggested by: rwatson [1] (name)
Submitted by: ed [2]
Discussed with: markm, ed, rwatson, kib (weeks ago)
Reviewed by: rwatson, brueffer (prev. version)
PR: kern/68961
MFC after: 6 weeks
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
unsigned before comparison with maximum value to cut off negative
values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
values are already checked in the subsequent switch
Reviewed by: jhb
MFC after: 1 week
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
mapping the RSDT or XSDT and searching for a table with a given signature
out into acpica_machdep.c for both amd64 and i386.
amd64 similar to i386. This fixes a bug on amd64 where overlapping
entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
instead of an append to support systems with out-of-order SMAP entries.
PR: amd64/138220
Reported by: James R. Van Artsdalen james of jrv org
MFC after: 3 days
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.
Do panic regardeless of execution mode at the moment of trap.
Reviewed by: jhb
MFC after: 1 month
- Add vesa kernel options for amd64.
- Connect libvgl library and splash kernel modules to amd64 build.
- Connect manual page dpms(4) to amd64 build.
- Remove old vesa/dpms files.
Submitted by: paradox <ddkprog yahoo com> [1], swell k at gmail.com
(with some minor tweaks)
based Intel Macs. Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason. For these platforms, we just
use the old PAT layout. Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.
Reported by: rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by: jhb
when removing an interrupt handler from an IRQ during shutdown. During
shutdown we are already bound to CPU 0 and this was triggering a panic.
MFC after: 3 days
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.
Discussed with: alc
MFC after: 3 days
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.
Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.
The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo was matched first, as the fallback to ld-elf32.so.1 does
not exist in that case.
Reported and tested by: ticso
In collaboration with: kib
MFC after: 3 days
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.
Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.
Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].
In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].
These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).
PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.
Tested by: bz
Approved by: re (kib)
when pmap_demote_pde() allocates a page table page to implement a
user-space demotion, it must increment the pmap's resident page count.
Not doing so, can lead to an underflow during address space termination
that causes pmap_remove() to exit prematurely, before it has destroyed
all of the mappings within the specified range. The ultimate effect or
symptom of this error is an assertion failure in vm_page_free_toq()
because the page being freed is still mapped.
This error is only possible when superpage promotion is enabled. Thus,
it only affects FreeBSD versions greater than 7.2.
Tested by: pho, alc
Reviewed by: alc
Approved by: re (rwatson)
MFC after: 1 week
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
routines in the local APIC code that the hwpmc(4) driver can use to
manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly
enables the interrupt when it is succesfully initialized and disables
the interrupt when it is unloaded. This avoids enabling the interrupt
on unsupported CPUs which may result in spurious NMIs.
Reported by: rnoland
Reviewed by: jkoshy
Approved by: re (kib)
MFC after: 2 weeks
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding
This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.
Please don't forget to update your config file with the STOP_NMI
option removal
Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.
Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.
Proposed and reviewed by: alc
Approved by: re (kensmith)
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().
The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.
Add pmap_page_set_memattr() to the Xen pmap.
Approved by: re (kib)
configuring machine-dependent memory attributes...":
Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.
Eliminate pointless code from pmap_cache_bits() on amd64.
Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.
Approved by: re (kib)
dependent memory attributes:
Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.
Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.
Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:
kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.
vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.
Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.
Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.
In collaboration with: jhb
Approved by: re (kib)
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.
Authentication and encryption are not implemented.
There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).
A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.
Drivers that support mesh networks right now are: ath, ral and mwl.
More information at: http://wiki.freebsd.org/WifiMesh
Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.
Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.
Reviewed by: sam
Approved by: re (kensmith)
Obtained from: projects/mesh11s
usermode, it generates GPF, that is mirrored to user mode as SIGSEGV.
The offending register in mcontext should contain the value loading of
which generated the GPF, and it is so on i386. On amd64, we currently
report segment descriptor in tf_err, while segment register contains the
corrected value loaded by trap handler.
Fix the issue by behaving like i386, reloading segment register in trap
frame after signal frame is pushed onto user stack.
Noted and tested by: pho
Approved by: re (kensmith)
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.
Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.
The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.
Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)
to simultaneously change the PAT setting for the same pages within the
direct map region. This may require the demotion of a 2MB page mapping and
the allocation of a page table page. This revision gives the highest
possible priority (VM_ALLOC_INTERRUPT) to this page allocation, so that
pmap_change_attr() is less likely to fail. (In general, kernel page table
page allocations have the highest priority, so this is not creating a new
precedent.)
(Demotion of 1GB page mappings within the direct map already specifies
VM_ALLOC_INTERRUPT to vm_page_alloc(), so only pmap_demote_pde() must be
changed.)
Approved by: re (kib)
when the interrupt was moved from one CPU to another. If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled. This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range. However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled. The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.
Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method. While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.
Approved by: re (kensmith)
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent
Reviewed by: bde, imp, marcel
Approved by: re (kensmith)
More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.
Proposed by: miwi
Patch by: Florian Smeets <flo kasimir com>
Approved by: re (kib)
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.
Approved by: re (kib)