hardware. It is possible to turn this feature off and fall back to software
emulation of the APIC by setting the tunable hw.vmm.vmx.use_apic_vid to 0.
We now start handling two new types of VM-exits:
APIC-access: This is a fault-like VM-exit and is triggered when the APIC
register access is not accelerated (e.g. apic timer CCR). In response to
this we do emulate the instruction that triggered the APIC-access exit.
APIC-write: This is a trap-like VM-exit which does not require any instruction
emulation but it does require the hypervisor to emulate the access to the
specified register (e.g. icrlo register).
Introduce 'vlapic_ops' which are function pointers to vector the various
vlapic operations into processor-dependent code. The 'Virtual Interrupt
Delivery' feature installs 'ops' for setting the IRR bits in the virtual
APIC page and to return whether any interrupts are pending for this vcpu.
Tested on an "Intel Xeon E5-2620 v2" courtesy of Allan Jude at ScaleEngine.
drivers and their firmware were under active development, but those days
have passed. The firmware now exists in pre-compiled form, no longer
dependent on it's sources or on aicasm. If you wish to rebuild the
firmware from source, the glue still exists under the 'make firmware'
target in sys/modules/aic7xxx.
This also fixes the problem introduced with r257777 et al with building
kernels the old fashioned way in sys/$arch/compile/$CONFIG when the
ahc/ahd drivers were included.
Keep a copy of the 'rip' and the 'exit_reason' and use that when calling
vmx_exit_trace(). This is because both the 'rip' and 'exit_reason' can
be changed by 'vmx_exit_process()' and can lead to very misleading traces.
Remove old bits of data concat for 'ascii' field.
Remove special SIOCGIFSTATUS handling from if.c (which Coverity yells at).
Reported by: Coverity
Coverity CID: 1147174
MFC after: 2 weeks
when PMC-soft feature is not used the check will be false.
Sponsored by: EMC / Isilon storage division
Submitted by: Anton Rang <anton.rang@isilon.com>
The uboot mapping is only 128KiB (0x20000) and not 2MiB (0x200000).
Dynamically adjust kernel and rootfs mappings based on the
geom_uncompress(4) magic.
This makes the built images more reliable by accepting changes on kernel
size transparently and matches the images built with zrouter and
freebsd-wifi-build.
Tested by: gjb
Approved by: adrian (mentor)
Obtained from: Zrouter
WARNING: icl_pdu_check_data_digest: data digest check failed; got 0xf23b,
should be 0xdb7f23b
Tested by: Darcy Birkbeck
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
value. The "Intel Lynx Point" XHCI controller found in the MBP2013 has
been observed to not always set the event interrupt bit while there
are events to consume in the event ring.
MFC after: 1 week
Tested by: Huang Wen Hui <huanghwh@gmail.com>
the vcpu should be kicked to process a pending interrupt. This will be useful
in the implementation of the Posted Interrupt APICv feature.
Change the return value of 'vlapic_pending_intr()' to indicate whether or not
an interrupt is available to be delivered to the vcpu depending on the value
of the PPR.
Add KTR tracepoints to debug guest IPI delivery.
the spoofed identify data into the user buffer rather than issuing the
command to the controller, since Chatham IDENTIFY data is always spoofed.
While here, fix a bug in the spoofed data for Chatham submission and
completion queue entry sizes.
Sponsored by: Intel
MFC after: 3 days
'vmx_vminit()' that does customization.
This makes it easier to turn on optional features (e.g. APICv) without
having to keep adding new parameters to 'vmcs_set_defaults()'.
Reviewed by: grehan@
every arm system must have some static mappings to work correctly (although
currently they all do), so remove some panic() calls (which would never
been seen anyway, because they would happen before a console is available).
Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current
features and give some examples.
This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.
freeing them.
The current code would walk the list and call the buffer free, which
didn't remove it from any lists before pushing it back on the free list.
Tested: AR9485, STA mode
Noticed by: dillon@apollo.dragonflybsd.org
related to setting up static device mappings. Since it was only used by
arm/mv/mv_pci.c, it's now just static functions within that file, plus
one public function that gets called only from arm/mv/mv_machdep.c.
in the dts source, and adding the right devices to the kernel config. Also
generally bring the kernel config into line with what we have for other
Marvell/Kirkwood systems (add lots of useful devices and options).
One particularly notable addition amongst the kernel config changes is
USB_HOST_ALIGN=32, which may help eliminate data corruption on USB drives.
PR: kern/181975 arm/162159
obsolete. This involves the following pieces:
- Remove it entirely on PowerPC, where it is not used by MD code either
- Remove all references to machine/fdt.h in non-architecture-specific code
(aside from uart_cpu_fdt.c, shared by ARM and MIPS, and so is somewhat
non-arch-specific).
- Fix code relying on header pollution from machine/fdt.h includes
- Legacy fdtbus.c (still used on x86 FDT systems) now passes resource
requests to its parent (nexus). This allows x86 FDT devices to allocate
both memory and IO requests and removes the last notionally MI use of
fdtbus_bs_tag.
- On those architectures that retain a machine/fdt.h, unused bits like
FDT_MAP_IRQ and FDT_INTR_MAX have been removed.
static device mappings.
This SoC relied heavily on the fact that all devices were static-mapped
at a fixed address, and it (rather bogusly) used bus_space read and write
calls passing hard-coded virtual addresses instead of proper bus handles,
relying on the fact that the virtual addresses of the mappings were known
at compile time, and relying on the implementation details of arm
bus_space never changing. All such usage was replaced with calls to
bus_space_map() to obtain a proper bus handle for the read/write calls.
This required adjusting some of the #define values that map out hardware
registers, and some of them were renamed in the process to make it clear
which were defining absolute physical addresses and which were defining
offsets. (The ones that just define offsets don't appear to be referenced
and probably serve no value other than perhaps documentation.)
it performs exact match search, regardless of netmask existance.
This simplifies most of rnh_lookup() consumers.
Fix panic triggered by deleting non-existent host route.
PR: kern/185092
Submitted by: Nikolay Denev <ndenev at gmail.com>
MFC after: 1 month
and add static mappings that cover most of the on-chip peripherals with
1MB section mappings. This adds about 220MB or so available kva space
by not using a hard-coded 0xF0000000 as the mapping address.
Keep only one uint64_t spare for further cap_rights_t expension.
Add a comment clarifying that if the size of this structure changes,
a new sysctl MIB has to be allocate for it and the old structure has
to be returned by the old sysctl MIB.
Requested by: re
MFC after: 3 days
when running on FDT systems. Unmap memory in nexus_deactivate_resource().
Also, call rman_activate_resource() before mapping device memory, and only
do the mapping if it returns success.
Reviewed by: nwhitehorn
New algorithm does not create additional lock congestion, while some races
it includes should not be a problem. Those races may keep requests in DRC
cache for some more time by returning ACK position smaller then actual,
but it still should be able to drop thems when proper ACK finally read.
Races of the original algorithm based on TCP seq number were worse because
they happened when reply sequence number were recorded. After that even
correctly read ACKs could not clean DRC sometimes.
searching. If you didn't configure a timer capture pin you'd get a data
abort as it wandered into the weeds, now you get a nice warning message
about your config, as originally intended.
guest disables the HPET.
The HPET timer interrupt is triggered from the callout handler associated with
the timer. It is possible for the callout handler to be delayed before it gets
a chance to execute. If the guest disables the HPET during this window then the
handler never gets a chance to execute and the timer interrupt is lost.
This is now fixed by injecting a timer interrupt into the guest if the callout
time is detected to be in the past when the HPET is disabled.
the LED specification was just misplaced). The rather odd memory mappings
that were in place used an undocumented attribute value (0x0f) that caused
problems with the system.
Submitted by: Markus Pfeiffer <markus.pfeiffer@morphism.de>
Callers do that already and additional check races with process
decreasing limits and can result in not growing the table at all, which
is currently not handled.
MFC after: 3 days
- Introduce additional hash to group requests by hash of sockref. This
allows to process TCP acknowledgements without looping though all the cache,
and as result allows to do it every time.
- Indroduce additional callbacks to notify application layer about sockets
disconnection. Without this last few requests processed just before socket
disconnection never processed their ACKs and stuck in cache for many hours.
- Implement transport-specific method for tracking reply acknowledgements.
New implementation does not cross multiple stack layers to get the data and
does not have race conditions that previously made some requests stuck
in cache. This could be done more efficiently at sockbuf layer, but that
would broke some KBIs, while I don't know other consumers for it aside NFS.
- Instead of traversing all DRC twice per request, run cleaning only once
per request, and except in some conditions traverse only single hash slot
at a time.
Together this limits NFS DRC growth only to situations of real connectivity
problems. If network is working well, and so all replies are acknowledged,
cache remains almost empty even after hours of heavy load. Without this
change on the same test cache was growing to many thousand requests even
with perfectly working local network.
As another result this reduces CPU time spent on the DRC handling during
SPEC NFS benchmark from about 10% to 0.5%.
Sponsored by: iXsystems, Inc.
times [1]. Assert that the pmap passed to pmap_remove_pages() is only
active on current CPU.
Submitted by: alc [1]
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.
Reviewed by: glebius, adrian
MFC after: 1 week
Sponsored by: Yandex LLC
When we encounter an I/O error on a piece of metadata while deleting
a file system or zvol, we don't update the bptree_entry_phys_t's
bookmark. This would lead to double free of bp's which will lead to
space map corruption.
Instead of tolerating and allowing the corruption, panic immediately.
See Illumos #4390 for more details.
4391 panic system rather than corrupting pool if we hit bug 4390
Illumos/illumos-gate@8b36997aa2
MFC after: 2 weeks
The operation was documented and implemented partially (both from a
type and architecture perspective) on 2013-08-21 and got used in
ZFS with revision 260150 (zfeature.c) and since ZFS is supported on
ia64, the lack of having atomic_swap became problem.
- Use simple ++ for rare events.
- Use uma_zone_get_cur() to get knowledge about space left in cache.
- Convert many fields of struct ng_netflow_info to 64 bit.
Tested by: Viktor Velichkin <avisom yandex.ru>
Sponsored by: Nginx, Inc.
hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used
to enter guest context and vmx_exit_guest() is used to transition back into
host context.
Fix a longstanding race where a vcpu interrupt notification might be ignored
if it happens after vmx_inject_interrupts() but before host interrupts are
disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with
host interrupts disabled to prevent this.
Suggested by: grehan@
Fix race condition in DELAY function: sc->tc was not initialized yet when
time_counter pointer was set, what resulted in NULL pointer dereference.
Export sysfreq to dts.
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Using unmapped BIOs causes failure inside bus_dmamap_sync, since
this function requires valid MVA address, which is not present
if mapping is not set up.
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
Add suport for setting triggering level and polarity in GIC.
New function pointer was added to nexus which corresponds
to the function which sets level/sense in the hardware (GIC).
Submitted by: Wojciech Macek <wma@semihalf.com>
Obtained from: Semihalf
to lla_lookup().
This drastically reduces the very high lock contention when doing parallel
TCP throughput tests (> 1024 sockets) with IPv6.
Tested:
* parallel IPv6 TCP bulk data exchange, 8192 sockets
MFC after: 1 week
Sponsored by: Netflix, Inc.
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
illumos/illumos-gate@43466aae47
NOTE: Make sure the boot code is updated if a zpool upgrade is
done on boot zpool.
MFC after: 2 weeks
(Note: this change is not applicable to FreeBSD and the file
is not included in build. It's integrated for completeness).
4128 disks in zpools never go away when pulled
illumos/illumos-gate@39cddb10a3
MFC after: 2 weeks