HSD131 erratum in [1]) at a considerable rate. So filter these (default),
unless logging is enabled. Unfortunately, there really is no better way to
reasonably implement suppressing these errors than to just skipping them
in mca_log(). Given that they are reported for bank 0, they'd need to be
masked in MSR_MC0_CTL. However, P6 family processors require that register
to be set to either all 0s or all 1s, disabling way more than the one error
in question when using all 0s there. Alternatively, it could be masked for
the corresponding CMCI, but that still wouldn't keep the periodic scanner
from detecting these spurious errors. Apart from that, register contents of
MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel
Architectures Developer's Manual nor in the Haswell datasheets.
Note that while HSD131 actually is only about C0-stepping as of revision
014 of the Intel desktop 4th generation processor family specification
update, these corrected errors also have been observed with D0-stepping
aka "Haswell Refresh".
1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
Reviewed by: jhb
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
involves updating the corresponding page tables followed by accesses to the
pages in question. This sequence is subject to the situation exactly described
in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming"
rev. 3.23, "7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore,
issuing the INVLPG right after modifying the PTE bits is crucial (see also
r269050).
For the amd64 PMAP code, the order of instructions was already correct. The
above fact still is worth documenting, though.
1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
Reviewed by: alc
Sponsored by: Bally Wulff Games & Entertainment GmbH
corresponding page tables followed by accesses to the pages in question.
This sequence is subject to the situation exactly described in the "AMD64
Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23,
"7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore, issuing
the INVLPG right after modifying the PTE bits is crucial.
For pmap_copy_page(), this has been broken in r124956 and later on carried
over to pmap_copy_pages() derived from the former, while all other places
in the i386 PMAP code use the correct order of instructions in this regard.
Fixing the latter breakage solves the problem of data corruption seen with
unmapped I/O enabled when running at least bare metal on AMD R-268D APUs.
However, this might also fix similar corruption reported for virtualized
environments.
- In pmap_copy_pages(), correctly set the cache bits on the source page being
copied. This change is thought to be a NOP for the real world, though. [2]
1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
Submitted by: kib [2]
Reviewed by: alc, kib
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.
A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.
custom free routine (rxb_free) in the driver. Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet. This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.
MFC after: 1 week
Setting PSE together with PAE or in long mode just makes the PSE bit
completely ignored, so don't set it.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
the assumption that consumers would respect bio_completed and/or
bio_resid to detect short reads. This assumption proved false and
file corruption was the result.
Create as many bios as we need to satisfy the original request.
Check the cached chunk every time we need to do I/O to increase the
hit rate.
Obtained from: junipre Networks, Inc.
MFC after: 1 week
This change is a bit ugly, but so is the coupling between the i915
driver and syscons. It isn't worth developing a more elegant solution
only to support the legacy syscons console.
The hcreate(3) implementation and related functions we inherited
from NetBSD used to free() the key value, something that is not
supported by the standard implementation.
This would cause a segmentation fault when attempting to run
the examples from the opengroup and linux manpages. NetBSD
has added non-standard calls to provide the previous
behaviour but hdestroy is not very commonly used so at this
time it seems excessive to bring those to FreeBSD.
Bump the __FreeBSD_version as this is an ABI change.
Reference:
http://bugs.dragonflybsd.org/issues/1398
MFC after: 2 weeks
If RSS is enabled, ixgbe(4) will query the RSS API for the types of hashes
which should be used. It'll then only enable hashes that are exposed via
the RSS layer.
This way it won't try to do things like enable UDP hashing if RSS explicitly
states that it isn't supported in lookups.
Tested:
* 82599EB ixgbe(4) NIC
A mix of fragmented and non-fragmented UDP in a single stream will end up
being hashed differently, resulting in out-of-order behaviour in the receive
path.
This was done in the linux e1000 driver in 2011.
Discussed with: jfv
by the stack.
Right now the stack isn't really setup for RSS with 4-tuple UDP hashing
for either IPv4 and IPv6.
The specifics:
* The UDP init path udp_init() and udplite_init() specify the hash as
2-tuple, so the PCBGROUPS code only tries a 2-tuple check;
* The PCBGROUPS and RSS code doesn't know about the UDP hash types
just yet, so they're never treated as valid hashes.
* For correctness, 4-tuple can't be enabled in the general case because
UDP datagrams can be more fragmented than IP datagrams may be.
Strictly speaking, TCP datagrams may also be fragmented and this could
cause issues with PCBGROUPS/RSS until the IP defragment path grows some
code to re-calculate the RSS hash.
I'll follow this commit up with awareness of the UDP 4-tuple for those
who wish to configure it, but for now it'll stay disabled.
No drivers (yet) know to use this function when RSS is enabled.
handling. For statically linked apps this uses the __exidx_start/end
symbols set up by the linker. For dynamically linked apps it finds the
shared object that contains the given address and returns the location and
size of the exidx section in that shared object.
The dl_unwind_find_exidx() name is used by other BSD projects and Android,
and is mentioned in clang 3.5 comments as "the BSD interface" for finding
exidx data. GCC (in libgcc_s) expects the exact same API and functionality
to be provided by a function named __gnu_Unwind_Find_exidx(), so we provide
that with an alias ("strong reference").
Reviewed by: kib@
MFC after: 1 week
A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.
vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.
vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.
calling mmap on /dev/mem and add a handler for the possible userland
machine checks that may result. Remove some pointless and wrong copy/paste
that has been in here for a decade as well.
This results in a /dev/mem with identical semantics to the x86 version.
MFC after: 1 week
read workload by splitting the single teardown rrw lock into
RRM_NUM_LOCKS (17) of them.
Read acquisitions are randomly distributed among these locks based
on curthread pointer. Write acquisitions are going to all the
locks, which for the usage of this type of lock should be rare.
Illumos issue:
5008 lock contention (rrw_exit) while running a read only load
MFC after: 2 weeks