sys/vmmeter.h: warning: shadowed declaration is here
machine/cpufunc.h: In function 'insw':
machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration
..snip..
This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.
Purge d_mmap2().
All driver modules will need to be rebuilt since D_VERSION is also
bumped.
Reviewed by: jhb@
MFC after: Not in this lifetime...
- use utility macros for CPU family/model checking
- limit Intel P6 quirk to pre-Nehalem models (taken from OpenSolaris)
- add AMD GartTblWkEn quirk for families 0Fh and 10h; I haven't experienced
any problems without the quirk but both Linux and OpenSolaris do this
- slightly re-arrange quirk code to provide for the future generalization
and separation of vendor-specific quirk functions
Reviewed by: jhb
MFC after: 1 week
- directly print mca information in case we fail to allocate memory
for a record
- include bank number into mca record
- print raw mca status value for extended information
Reviewed by: jhb
MFC after: 10 days
The hardware is compliant with WDRT specification, so I originally
considered including generic WDRT watchdog support, but decided
against it, because I couldn't find anyone to the code for me.
WDRT seems to be not very popular.
Besides, generic WDRT porbably requires a slightly different driver
approach.
Reviewed by: des, gavin, rpaulo
MFC after: 3 weeks
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9). It has no
performance benefit over malloc(9)/free(9)[2].
Requested by: rwatson[1]
Pointed out by: rwatson, jhb, alc[2]
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.
I have found that it is not only desktop CPUs problem. but mobile also.
Probably AP on laptops just started initially at lower frequency, hiding
the problem.
Disable frequency validation by default, for systems with more then one CPU,
until we can implement it properly. It looks like making more harm now then
benefits. Add 'hw.est.strict' loader tunable to control it.
Now my iXsystems Invincibook is able to run at 800MHz lowest frequency,
instead of 1200MHz before, when 800MHz was incorrectly reported invalid.
CPU core, only pair of them. As result, both cores are running on highest
one of requested frequencies, and that is reported by status register.
Such behavior confuses frequency validation logic, as it runs on only
one core, as SMP is not yet launched, making EIST completely unusable.
To workaround this, add check for validation result. If we haven't found
at least two usable frequencies, then probably we are looking bad and have
to trust data provided by BIOS as-is.
functions are selfcontained (ie. they touch only isa_dma.c static variables
and hardware) so a private lock is sufficient to prevent races. This changes
only i386/amd64 while there are also isa_dma functions for ia64/sparc64.
Sparc64 are ones empty stubs and ia64 ones are unused as ia64 does not
have isa (says marcel).
This patch removes explicit locking of Giant from a few drivers (there
are some that requires this but lack ones - this patch fixes this) and
also removes the need for implicit locking of Giant from attach routines
where it's provided by newbus.
Approved by: ed (mentor, implicit)
Reviewed by: jhb, attilio (glanced by)
Tested by: Giovanni Trematerra <giovanni.trematerra gmail com>
IA64 clue: marcel
map_invalidate_cache_range() even if CPU is not Intel.
- This tunable can be set to -1 (default), 0 and 1. -1 is same as
current behavior, which automatically disable CLFLUSH on Intel CPUs
without CPUID_SS (should be occured on Xen only). You can specify 1
when this panic happened on non-Intel CPUs (such as AMD's). Because
disabling CLFLUSH may reduce performance, you can try with setting 0
on Intel CPUs without SS to use CLFLUSH feature.
Reviewed by: kib
Reported by: karl, kuriyama
Related to: kern/138863
ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).
As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.
Pointed out by: Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by: jeff
MFC: 3 days
There is no need to use the lower 4 bits of the unit number to store the
device type number. Just use 0 and 1 to distinguish them. devfs also
guarantees that there can never be an open call on a device that has a
unit number different to 0 and 1, so there is no need to check for this
in open().
partially fixed on amd64 earlier. Rather than forcing linux_mmap_common()
to use a 32-bit offset, have it accept a 64-bit file offset. This offset
is then passed to the real mmap() call. Rather than inventing a structure
to hold the normal linux_mmap args that has a 64-bit offset, just pass
each of the arguments individually to linux_mmap_common() since that more
closes matches the existing style of various kern_foo() functions.
Submitted by: Christian Zander @ Nvidia
MFC after: 1 week
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.
Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.
Reviewed by: davidxu
Tested by: pho
MFC after: 1 month
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).
The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
the nexus(4) driver supply a custom bus_describe_intr method that invokes
a new intr_describe() MD routine which in turn looks up the associated
interrupt event and invokes intr_event_describe_handler().
Requested by: many
Reviewed by: scottl
MFC after: 2 weeks
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.
Discussed with: bz
Reviewed by: kan
Tested by: bz (i386, amd64), bsam (linux)
MFC after: some time
specify their own version of atomic_cmpset_* which could have been
different than the membar version.
Right now, however, FreeBSD is bound mostly to GCC-like compilers and
it is desired to add new support and compat shim mostly when there is
a real necessity, in order to avoid too much compatibility bloats.
In this optic, bring back atomic_cmpset_{acq, rel}_* to be the same as
atomic_cmpset_* and unwind the atomic_cmpset_barr_* introduction.
Requested by: jhb
Reviewed by: jhb
Tested by: Giovanni Trematerra <giovanni dot trematerra at
gmail dot com>
not defined through macros or similar) in order to be later compiled in
the kernel and offer this way the support for modules (and
compatibility among the UP case and SMP case).
Fix this for the newly introduced atomic_cmpset_barr_* cases by defining
and specifying a template. Note that the new DEFINE_CMPSET_GEN()
template save more typing on amd64 than the current code. [1]
- Fix the style for memory barriers on amd64.
[1] Reported by: Paul B. Mahol <onemda at gmail dot com>
memory barriers should also ensure that the compiler doesn't reorder paths
where they are used. GCC, however, does that aggressively, even in
presence of volatile operands. The most reliable way GCC offers for avoid
instructions reordering is clobbering "memory" even if that is
theoretically an heavy-weight operation, flushing the content of all
the registers and forcing reload of them (We could rely, however, on
gcc DTRT by just understanding the purpose as this is a well-known
pattern for many modern operating-systems).
Not all our memory barriers, right now, clobber memory for GCC-like
compilers. The most notable cases are IA32 and amd64 where the memory
barrier are treacted the same as normal atomic instructions.
Fix this by offering the possibility to implement atomic instructions
with memory barriers separately from the normal version and implement
the GCC-like specific one using memory clobbering.
Thanks to Chris Lattner (@apple) for his discussion on llvm specifics.
Reported by: jhb
Reviewed by: jhb
Tested by: rdivacky, Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.
The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.
Reviewed by: kib
MFC after: 1 month
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.
XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.
Tested by: csjp
Reviewed by: alc
MFC after: 3 days
devices that we also support, just not by default (thus only LINT or
module builds by default).
While currently there is only "/dev/full" [2], we are planning to see more
in the future. We may decide to change the module/dependency logic in the
future should the list grow too long.
This is not part of linux.ko as also non-linux binaries like kFreeBSD
userland or ports can make use of this as well.
Suggested by: rwatson [1] (name)
Submitted by: ed [2]
Discussed with: markm, ed, rwatson, kib (weeks ago)
Reviewed by: rwatson, brueffer (prev. version)
PR: kern/68961
MFC after: 6 weeks
o introduce PCIE_REGMAX and use it instead of ad-hoc constant
o where 'reg' parameter/variable is not already unsigned, cast it to
unsigned before comparison with maximum value to cut off negative
values
o use PCI_SLOTMAX in several places where 31 or 32 were explicitly used
o drop redundant check of 'bytes' in i386 pciereg_cfgread() - valid
values are already checked in the subsequent switch
Reviewed by: jhb
MFC after: 1 week
startup and genericize it so it can be reused to map other tables as well:
- Add a routine to walk a list of ACPI subtables such as those used in the
APIC and SRAT tables in the MI acpi(4) driver.
- Move the routines for mapping and unmapping an ACPI table as well as
mapping the RSDT or XSDT and searching for a table with a given signature
out into acpica_machdep.c for both amd64 and i386.
amd64 similar to i386. This fixes a bug on amd64 where overlapping
entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
instead of an append to support systems with out-of-order SMAP entries.
PR: amd64/138220
Reported by: James R. Van Artsdalen james of jrv org
MFC after: 3 days
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.
Do panic regardeless of execution mode at the moment of trap.
Reviewed by: jhb
MFC after: 1 month
- Add vesa kernel options for amd64.
- Connect libvgl library and splash kernel modules to amd64 build.
- Connect manual page dpms(4) to amd64 build.
- Remove old vesa/dpms files.
Submitted by: paradox <ddkprog yahoo com> [1], swell k at gmail.com
(with some minor tweaks)
The macros for PCPU can be slightly simplified, which makes the
resulting tangle qa lot easier to understand when trying to read them.
MFC after: 4 weeks
based Intel Macs. Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason. For these platforms, we just
use the old PAT layout. Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.
Reported by: rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by: jhb
when removing an interrupt handler from an IRQ during shutdown. During
shutdown we are already bound to CPU 0 and this was triggering a panic.
MFC after: 3 days
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.
Discussed with: alc
MFC after: 3 days
page into 4KB pages as needed. This should be fairly rare in practice
on i386. This includes merging the following changes from the amd64 pmap:
180430, 180485, 180845, 181043, 181077, and 196318.
- Add basic support for changing attributes on PDEs to pmap_change_attr()
similar to the support in the initial version of pmap_change_attr() on
amd64 including inlines for pmap_pde_attr() and pmap_pte_attr().
- Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.
- Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2/4MB page
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2/4MB page. Previously, in such cases,
pmap_change_attr() gave up and returned an error.
- Correct a critical accounting error in pmap_demote_pde().
Reviewed by: alc
MFC after: 3 days
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.
Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.
The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo was matched first, as the fallback to ld-elf32.so.1 does
not exist in that case.
Reported and tested by: ticso
In collaboration with: kib
MFC after: 3 days
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.
Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.
Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].
In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].
These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).
PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.
Tested by: bz
Approved by: re (kib)
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
routines in the local APIC code that the hwpmc(4) driver can use to
manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly
enables the interrupt when it is succesfully initialized and disables
the interrupt when it is unloaded. This avoids enabling the interrupt
on unsupported CPUs which may result in spurious NMIs.
Reported by: rnoland
Reviewed by: jkoshy
Approved by: re (kib)
MFC after: 2 weeks
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding
This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.
Please don't forget to update your config file with the STOP_NMI
option removal
Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)
The newbus lock is responsible for protecting newbus internIal structures,
device states and devclass flags. It is necessary to hold it when all
such datas are accessed. For the other operations, softc locking should
ensure enough protection to avoid races.
Newbus lock is automatically held when virtual operations on the device
and bus are invoked when loading the driver or when the suspend/resume
take place. For other 'spourious' operations trying to access/modify
the newbus topology, newbus lock needs to be automatically acquired and
dropped.
For the moment Giant is also acquired in some key point (modules subsystem)
in order to avoid problems before the 8.0 release as module handlers could
make assumptions about it. This Giant locking should go just after
the release happens.
Please keep in mind that the public interface can be expanded in order
to provide more support, if there are really necessities at some point
and also some bugs could arise as long as the patch needs a bit of
further testing.
Bump __FreeBSD_version in order to reflect the newbus lock introduction.
Reviewed by: ed, hps, jhb, imp, mav, scottl
No answer by: ariff, thompsa, yongari
Tested by: pho,
G. Trematerra <giovanni dot trematerra at gmail dot com>,
Brandon Gooch <jamesbrandongooch at gmail dot com>
Sponsored by: Yahoo! Incorporated
Approved by: re (ksmith)
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.
Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.
Proposed and reviewed by: alc
Approved by: re (kensmith)
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().
The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.
Add pmap_page_set_memattr() to the Xen pmap.
Approved by: re (kib)
configuring machine-dependent memory attributes...":
Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.
Eliminate pointless code from pmap_cache_bits() on amd64.
Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.
Approved by: re (kib)
dependent memory attributes:
Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.
Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.
Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:
kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.
vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.
Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.
Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.
In collaboration with: jhb
Approved by: re (kib)
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.
Authentication and encryption are not implemented.
There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).
A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.
Drivers that support mesh networks right now are: ath, ral and mwl.
More information at: http://wiki.freebsd.org/WifiMesh
Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.
Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.
Reviewed by: sam
Approved by: re (kensmith)
Obtained from: projects/mesh11s
if the new file mode is the same as it was before; however, this
optimization must be disabled for filesystems that support NFSv4 ACLs.
Chmod uses pathconf(2) to determine whether this is the case - however,
pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't.
This change adds lpathconf(3) to make it possible to solve that problem
in a clean way.
Reviewed by: rwatson (earlier version)
Approved by: re (kib)
when the interrupt was moved from one CPU to another. If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled. This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range. However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled. The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.
Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method. While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.
Approved by: re (kensmith)
4-entry table that must be located within the first 4GB of RAM. This
requirement is met by defining an UMA zone with a custom back-end
allocator function. This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig(). This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags. (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.
The back-end allocator function in the Xen pmap is dead code. I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.
Approved by: re (kib)
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent
Reviewed by: bde, imp, marcel
Approved by: re (kensmith)
More applications (including Firefox) seem to depend on this nowadays,
so not having this enabled by default is a bad idea.
Proposed by: miwi
Patch by: Florian Smeets <flo kasimir com>
Approved by: re (kib)
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.
Approved by: re (kib)
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).
In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.
Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week
required by video card drivers. Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures. In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.
In collaboration with: jhb
This is mostly important for the multiple MSI message case where the
IDT vectors for the entire group need to be allocated together. This
also restores the assumptions made by the PCI bus code that it could
invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
assigned to interrupt sources on activation. Instead of assigning the
CPU via pic_assign_cpu() before calling enable_intr(), allow the
different interrupt source drivers to ask the MD interrupt code which
CPU to use when they allocate an IDT vector. I/O APIC interrupt pins
do this in their pic_enable_intr() routines giving the same behavior as
before. MSI sources do it when the IDT vectors are allocated during
msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.
Tested by: rnoland
- The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned
short.
- The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned
short.
- The mode member of struct ipc_perm is now mode_t instead of unsigned short
(this is merely a style bug).
- The rather dubious padding fields for ABI compat with SV/I386 have been
removed from struct msqid_ds and struct semid_ds.
- The shm_segsz member of struct shmid_ds is now a size_t instead of an
int. This removes the need for the shm_bsegsz member in struct
shmid_kernel and should allow for complete support of SYSV SHM regions
>= 2GB.
- The shm_nattch member of struct shmid_ds is now an int instead of a
short.
- The shm_internal member of struct shmid_ds is now gone. The internal
VM object pointer for SHM regions has been moved into struct
shmid_kernel.
- The existing __semctl(), msgctl(), and shmctl() system call entries are
now marked COMPAT7 and new versions of those system calls which support
the new ABI are now present.
- The new system calls are assigned to the FBSD-1.1 version in libc. The
FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls.
- A simplistic framework for tagging system calls with compatibility
symbol versions has been added to libc. Version tags are added to
system calls by adding an appropriate __sym_compat() entry to
src/lib/libc/incldue/compat.h. [1]
PR: kern/16195 kern/113218 bin/129855
Reviewed by: arch@, rwatson
Discussed with: kan, kib [1]
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.
Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
While general idea of patch was good, it was not working properly due the way
it was implemented. When we are using same timer interrupt for several of
hard/prof/stat purposes we should not send several IPIs same time to other
CPUs. Sending several IPIs same time leads to terrible accounting/profiling
results due to strong synchronization effect, when the second interrupt
handler accounts processing of the first one.
Interlink timer events in a such way, that no more then one IPI is sent for
any original timer interrupt.
* Driver for ACPI HP extra functionations, which required
ACPI WMI driver.
Submitted by: Michael <freebsdusb at bindone.de>
Approved by: re
MFC after: 2 weeks
getgroups for ibcs emulation. It seems vanishingly likely any
programs will actually be affected since they probably assume a much
lower value and use a static array size.
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)
The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.
Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.
Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.
Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.
Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867
allocated. MSI have strict vectors allocation requirements, which are not
satisfied now during reallocation. This is not the best possible solution,
but better then just broken, as it was.
No objections: current@, arch@, jhb@
memory with 4MB pages was added to pmap_object_init_pt(). This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous. Unfortunately, this is not always the case. For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption. Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous. This
revision also changes some inconsistent if not buggy behavior. For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid. However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page. The amd64 version has a bug of
my own creation. It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.
To my knowledge, there have been no reports of these bugs, hence,
their persistance. I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms. However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity). These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)
Discussed with: jhb@
In the past there have been some reports of PRINTF_BUFR_SIZE not
functioning correctly. Instead of having garbled console messages, we
should just see whether the issues are still there and analyze them.
Approved by: re
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.
Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.
Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.
Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.
Submitted by: jhb [1]
Noted by: jhb [2]
Reviewed by: jhb
Tested by: rnoland
controller. These controllers are also known as L1C(AR8131) and
L2C(AR8132) respectively. These controllers resembles the first
generation controller L1 but usage of different descriptor format
and new register mappings over L1 register space requires a new
driver. There are a couple of registers I still don't understand
but the driver seems to have no critical issues for performance and
stability. Currently alc(4) supports the following hardware
features.
o MSI
o TCP Segmentation offload
o Hardware VLAN tag insertion/stripping
o Tx/Rx interrupt moderation
o Hardware statistics counters(dev.alc.%d.stats)
o Jumbo frame
o WOL
AR8131/AR8132 also supports Tx checksum offloading but I disabled
it due to stability issues. I'm not sure this comes from broken
sample boards or hardware bugs. If you know your controller works
without problems you can still enable it. The controller has a
silicon bug for Rx checksum offloading, so the feature was not
implemented.
I'd like to say big thanks to Atheros. Atheros kindly sent sample
boards to me and answered several questions I had.
HW donated by: Atheros Communications, Inc.
- Interpolate stat/prof clock using clkintr() in a similar fashion to
local APIC timer, since statclock usually run slower.
- Liberate hardclockintr() from taking the burden of handling both stat
and prof clock interrupt. Instead, send IPIs within clkintr() to handle
those.
only during initial booting process, while there are laptops/BIOSes that
tend to act 'smarter' by force enabling C1E if the main power adapter
being pulled out, rendering previous workaround ineffective. Given the
fact that we still rely on local APIC to drive timer interrupt, this
workaround should keep all Turion (probably Phenom too) X\d+ alive whether
its on battery power or not.
URL: http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.htmlhttp://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html
Tested by: Peter Jeremy <peterjeremy at optushome d com d au>
the "Get Scan Line Length" function fails, as it does in Parallels
(in Version 2.2, Build 2112 at least).
PR: i386/127367
Obtained from: DragonFly
Submitted by: Pedro Giffuni
MFC after: 1 month
It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.
Requested by: rwatson