Prior to this change the cached value of 'pm_eptgen' was tracked per-vcpu
and per-hostcpu. In the degenerate case where 'N' vcpus were sharing
a single hostcpu this could result in 'N - 1' unnecessary TLB invalidations.
Since an 'invept' invalidates mappings for all VPIDs the first 'invept'
is sufficient.
Fix this by moving the 'eptgen[MAXCPU]' array from 'vmxctx' to 'struct vmx'.
If it is known that an 'invept' is going to be done before entering the
guest then it is safe to skip the 'invvpid'. The stat VPU_INVVPID_SAVED
counts the number of 'invvpid' invalidations that were avoided because
they were subsumed by an 'invept'.
Discussed with: grehan
the output buffer wasn't being cleared between the inflate() calls,
producing zeroed output after the first inflate() call.
This fixes the read of mkuzip(8) images with geom_uncompress(4).
Reviewed by: ray
Approved by: adrian (mentor)
the old way was to store pcpu in a register, and get curthread from pcpu,
which is not very atomic, and led to issues if the thread was migrated
to another core between the time we got the pcpu address and the time we
got curthread.
Instead, we now store curthread where pcpu used to be store, and we
calculate the pcpu address based on the cpu id.
map a fraction of the pages that were fetched by vm_pager_get_pages() from
secondary storage. Now, we map them all in order to avoid future soft
faults. This effect is most evident when a memory-mapped file is accessed
sequentially. Previously, there were 6 soft faults for every hard fault.
Now, these soft faults are eliminated.
Sponsored by: EMC / Isilon Storage Division
to check the status property in their probe routines.
Simplebus used to only instantiate its children whose status="okay"
but that was improper behavior, fixed in r261352. Now that it doesn't
check anymore and probes all its children; the children all have to
do the check because really only the children know how to properly
interpret their status property strings.
Right now all existing drivers only understand "okay" versus something-
that's-not-okay, so they all use the new ofw_bus_status_okay() helper.
process "status" properties of OF nodes.
I've avoided adding new KOBJ methods here so that we don't have to modify
every ofw_bus in the tree. Since 100% of implementations of ofw_bus use
only ofw_bus_gen_*(), it might be worth garbage-collecting the other
methods as well.
The sglist segment array has grown to a bit over 512 bytes (on
64-bit system) which is more than ideally should be put on the
stack. Instead allocate an appropriately sized sglist and hang
it off each Rx/Tx queue structure.
Bump the maximum number of Tx segments to 64 to make it unlikely
we'll have defragment an mbuf chain. Our previous count was
rounded up to this value since it is the next power of two, so
effective memory usage should not change.
Also only allocate the maximum number of Tx segments if TSO was
negotiated.
It turns out the version of gas we're using interprets the old '_all' mask
as 'fc' instead of 'fsxc'. That is, "all" doesn't really mean "all".
This was the cause of the "wrong-endian register restore" bug that's
been causing problems with some cortex-a9 chips. The 'endian' bit in the
spsr register would never get changed (it falls into the 'x' mask group)
and the first return-from-exception would fail if the chip had powered on
with garbage in the spsr register that included the big-endian bit. It's
unknown why this affected only certain cortex-a9 chips.
CAPABILITIES stuff required to make ssh work.
Hopefully, Book-E can eventually be added to GENERIC, which would avoid
this kind of issue with bitrot. That will require figuring out how to link
Book-E and AIM kernels at the same address, however...
get the Routerboard 800 up and running with the vendor device tree. This
does not implement some BERI-specific features (which hopefully won't be
necessary soon), so move the old code to mips/beri, with a higher attach
priority when built, until MIPS interrupt domain support is rearranged.
strings and include arbitrary information (IRQ line/domain/sense). When the
ofw_bus_map_intr() API was introduced, it assumed that, as on most systems,
these were either 1 cell, containing an interrupt line, or 2, containing
a line number plus a sense code. It turns out a non-negligible number of
ARM systems use 3 (or even 4!) cells for interrupts, so make this more
general.
This also fixes asserts on removal of the module for the mpc74xx.
The PowerPC 970 processors have two different types of events: direct events
and indirect events. Thus far only direct events are supported. I included
some documentation in the driver on how indirect events work, but support is
for the future.
MFC after: 1 month
r261266:
Add a jail parameter, allow.kmem, which lets jailed processes access
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
devices. This is a nop, except for what's reported by atmelbus for the
resources.
It would be nice if we could dymanically allocated these things, but
the pmap_mapdev panics if we don't keep the static mappings, so we
still need to play the carefully allocate VA space between all
supported SoC game.
User's with their own devices may need to make adjustments.
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
that share a slot attempt to distribute their INTx interrupts across
the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
and when the INTx DIS bit is set in a function's PCI command register.
Either assert or deassert the associated I/O APIC pin when the
state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.
Submitted by: neel (7)
Reviewed by: neel
MFC after: 2 weeks
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.
Submitted by: netchild
MFC after: 1 week
- Use system provided functions for HID report requests.
- Nice the mode setting, because the USB hardware does appear to
handle the commands right away.
MFC after: 1 week
swcr_newsession can change the pointer for swcr_sessions which races with
swcr_process which is looking up entries in this array.
Add a rwlock that protects changes to the array pointer so that
swcr_newsession and swcr_process no longer race.
Original patch by: Steve O'Hara-Smith <Steve.OHaraSmith@isilon.com>
Reviewed by: jmg
Sponsored by: EMC / Isilon Storage Division
By definition, the very first FIN is not a duplicate. Process it normally
and don't feed it to congestion control as though it were a dupe. Don't
prevent CC from seeing later dupe acks while in a half close state.
the memory ranges that they decode for downstream devices rather than
creating ResourceProducer range resource entries. The result is that
we allocate the full range to the PCI root bridge device causing
allocations in child devices to all fail.
As a workaround, ignore any standard memory resources on a PCI root
bridge device. It is normal for a PCI root bridge to allocate an I/O
resource for the I/O ports used for PCI config access, but I have not
seen any PCI root bridges that legitimately allocate a memory resource.
Reviewed by: jkim
MFC after: 1 week
the INP_INFO lock from tcp_usr_accept. As the PR/patch states
this was following the advice already in the code.
See the PR below for a full disucssion of this change and its
measured effects.
PR: 183659
Submitted by: Julian Charbon
Reviewed by: jhb
The ext4 inode flags do not have equivalents for chflags (1)
and hold information that is private to the implementation.
The i_flag field in the inode is a better place to hold the Ext4
inode flags as it saves us from masking flags while setting or
getting attributes. It should also make things cleaner if we
implement write support for Ext4.
Suggested by: bde
Tested by: Mike Ma
MFC after: 3 days
that all pressed keys are released before completing the USB keyboard
detach. This will prevent so-called "ghost-keys" from appearing after
that the USB device generating the key event(s) has been detached.
MFC after: 1 week
It's common for multi-threaded processes to create a thread for
the purpose of synchronously processing signals. Allow such processes to
utilize a capabilities sandbox.
Discussed with: rwatson, pjd
MFC after: 2 weeks
when activating an I/O or memory window on the CardBus bridge.
Tested by: Olivier Cochard-Labbe <olivier@cochard.me>
Reviewed by: imp
MFC after: 3 days
Note that this commit hasn't been compile tested because these files
are not hooked up to the build...
PR: 186129
Submitted by: Takanori Sawada
Approved by: rpaulo
The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector.
If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger
an EOI-induced VM-exit when it is doing EOI virtualization.
The EOI-induced VM-exit results in the EOI being forwarded to the vioapic
so that level triggered interrupts can be properly handled.
Tested by: Anish Gupta (akgupt3@gmail.com)
The IN_* flags should be set in i_flag instead of corrupting
i_flags [1].
Re-enable HTree dirindex as the last series of bug fixes
seems to have fixed the issues.
Reported by: bde [1]
Tested by: kevlo
MFC after: 1 week
macro was changed. Since copying 64-bit srr1 into 32-bit srr1 drops the upper
32 bits, any bits set in the context were dropped, meaning the context check
fails. Since 32-bit set_context can't change those bits anyway, copy the ones
from the current context (td->td_frame) before calling set_context().
MFC after: 3 weeks
geom_alloc_copyin() can't return ENOMEM, so describe its fail as bad
control request. Add check for NULL pointer in gctl_dump(), since it
can be NULL when geom_alloc_copyin() failed.
MFC after: 1 week
Add to the gctl_error() an ability to specify error description even
if numeric error code is already specified. Also by default set
error code to EINVAL.
PR: 185852
MFC after: 1 week
injected into the vcpu but the VM-entry interruption information field
already has the valid bit set.
Pointed out by: David Reed (david.reed@tidalscale.com)
The latest draft of SBC-3 tells: "A MAXIMUM UNMAP LBA COUNT field set to
a non-zero value indicates the maximum number of LBAs that may be unmapped
by an UNMAP command." To me it does not sound like that limit is set per
single descriptor, but rather per all command. And I have at least one
device that behaves exactly that way. This patch fixes the problem there.
MFC after: 1 week
Make comments match parameters
Add options for early printf so we get regression build testing on it.
Add preview of options for FDT support coming soon (I hope)
paper trail now, this patch is similar to one posted for one of the
preliminary versions of a new armv6 port. I took them and made them
more generic. Option not enabled by default since each board/port has
to provide its own eputc, and possibly do other things as well...
Use the bitwise negation instead of bogus boolean negation and move
the flag manipulation with the assignment.
Fix some grammatical errors introduced in the same change.
Reported by: bde
MFC after: 3 days
pim_input() properly.
While here, remove extra variable and incorrect condition
before m_pullup().
Reported by: Olivier Cochard-Labbé <olivier cochard.me>
Sponsored by: Nginx, Inc.
a timeout value of a single tick is given. With FreeBSD-10 and newer
the current system time is used as a starting point, and the minimum
callout time of a single tick will be guaranteed. This patch mostly
affect the DMA delay timeouts, which are typically in the range from
0.125 to 2ms.
MFC after: 1 week
via a software interrupt.
This is safe to do because the logical processor is already cognizant of the
NMI and further NMIs are blocked until the host's NMI handler executes "iret".
r260545 cleared the inode flags to fix corruption problems but
we still need to pass some EXT4 flags for the ext4 read-only
mode. None of these attributes has an equivalent in FreeBSD and
are uninteresting for the system utilities so they should be
innaccessible in ext2_getattrib().
Note: we also use EXT4_HUGE_FILE but we use it directly from the
dinode structure so it is not necessary to translate it,
Suggested by: bde
MFC after: 3 days
This is the first step needed to get the snapper codec working on those
machines.
The second step is to enable the corresponding I2S device and its clock.
Tested on machines where the snapper codec was already working, a G4 PowerBook
and a PowerMac9,1 with a Shasta based macio.
The PowerMac7,2/7,3 with a K2 based macio can now also play sound.
MFC after: 1 month
- Store the length of each read-only VPD value since not all values are
guaranteed to be ASCII values (though most are).
- Add a new pciio ioctl to fetch VPD for a single PCI device. The values
are returned as a list of variable length records, one for the device
name and each keyword.
- Add a new -V flag to pciconf's list mode which displays VPD data for
each device.
MFC after: 1 week
fence. Under system load, the CPU has been found to change the order
by which the stores are made visible. When the tag is made visible
before the other TLB values, other CPUs may use the invalid TLB values
and do bad things.
While here (i.e. not a fix) don't return errors from pmap_remove_vhpt()
to callers of pmap_remove_pte(). Those callers don't check the return
value and as such don't do what is needed to keep a consistent state.
More importantly, pmap_remove_vhpt() can't really have an error without
it indicating something unintended. Using KASSERT is therefore better.
PR: 182999, 183227
console, it calls the grab functions. These functions should turn off
the RX interrupts, and any others that interfere. This makes mountroot
prompt work again. If there's more generalized need other than
prompting, many of these routines should be expanded to do those new
things.
Should have been part of r260889, but waasn't due to command line typo.
Reviewed by: bde (with reservations)
console, it calls the grab functions. These functions should turn off
the RX interrupts, and any others that interfere. This makes mountroot
prompt work again. If there's more generalized need other than
prompting, many of these routines should be expanded to do those new
things.
Reviewed by: bde (with reservations)
* Set ia address/mask values BEFORE attaching to address lists.
Inet6 address assignment is not atomic, so the simplest way to
do this atomically is to fill in ia before attach.
* Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code).
* Do some renamings:
in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here)
in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code)
in6_ifaddloop -> nd6_add_ifa_lle
in6_ifremloop -> nd6_rem_ifa_lle
* Split working with LLE and route announce code for last two.
Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour.
* Call device SIOCSIFADDR handler IFF we're adding first address.
In IPv4 we have to call it on every address change since ARP record
is installed by arp_ifinit() which is called by given handler.
IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so
there is no reason to call SIOCSIFADDR often.
ia64 for the very first time. Only 9 years in the making...
Note that the vt/vga driver does not actually make sure there's
VGA hardware at the standard/legacy VGA I/O port and memory I/O
addresses. This can cause machine checks if the H/W does not have
a VGA controller.
of a syncache connection, copy it into the inp_flowid field.
Without this, an incoming TCP connection won't have an inp_flowid marked
until some data comes in, and this means that things like the per-CPU
TCP timer option will choose a different CPU for the timer work.
(It also means that if one grabbed the flowid via an ioctl from userland,
it won't be available until some data has been received.)
Sponsored by: Netflix, Inc.
callback providers. link_init_sdl() function can be used to
fill most of the parameters. Use caller stack instead of
allocation / freing memory for each request. Do not drop support
for extra-long (probably non-existing) link-layer protocols by
introducing link_alloc_sdl() (used by if_resolvemulti() callback)
and link_free_sdl() (used by caller).
Since this change breaks KBI, MFC requires slightly different approach
(link_init_sdl() auto-allocating buffer if necessary to handle cases
with unmodified if_resolvemulti() callers).
MFC after: 2 weeks
the Guest Interruptibility-state field. However, there isn't any way to
figure out which processors have this requirement.
So, inject a pending NMI only if NMI_BLOCKING, MOVSS_BLOCKING, STI_BLOCKING
are all clear. If any of these bits are set then enable "NMI window exiting"
and inject the NMI in the VM-exit handler.
1. Be consistent in the style of "act_delta" manipulations between the
inactive and active queue scans.
2. Explicitly compare to zero.
3. The deactivation of a page is based is based on its recent history
and not just the current call to vm_pageout_scan(). The variable
"act_delta" represents the current state of the page, and not its
history. Avoid possible confusion by not (ab)using "act_delta" for
the making the deactivation decision.
Submitted by: kib [1]
Reviewed by: kib [2,3]
with USB device detach when using character device handles. This also
includes LibUSB. It turns out that "usb_close()" cannot always get a
reference to clean up its USB transfers and such, if called during the
kernel USB device detach.
Analysis by: hselasky @
Reported by: Juergen Lock <nox@jelal.kn-bremen.de>
MFC after: 1 week
This is done to ensure that visited object IDs are always increasing.
Also, pass correct object ID to prefetch_dnode_metadata for
os_groupused_dnode.
Without this change we would hit an assert if traversal was paused on
a GROUPUSED object, which is unlikely but possible.
Apparently the same change was independently developed by Deplhix.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days
Sponsored by: HybridCluster
This fires off a kqueue note (of type sendfile) to the configured kqfd
when the sendfile transaction has completed and the relevant memory
backing the transaction is no longer in use by this transaction.
This is analogous to SF_SYNC waiting for the mbufs to complete -
except now you don't have to wait.
Both SF_SYNC and SF_KQUEUE should work together, even if it
doesn't necessarily make any practical sense.
This is designed for use by applications which use backing cache/store
files (eg Varnish) or POSIX shared memory (not sure anything is using
it yet!) to know when a region of memory is free for re-use. Note
it doesn't mark the region as free overall - only free from this
transaction. The application developer still needs to track which
ranges are in the process of being recycled and wait until all
pending transactions are completed.
TODO:
* documentation, as always
Sponsored by: Netflix, Inc.
This is still under a bit of flux, as the final API hasn't been nailed
down. It's also unclear whether we should define the two new types in the
header or not - it may allow bad code to compile that shouldn't (ie,
since uintX's are defined, the developer may not include sys/types.h.)
Reviewed by: peter, imp, bde
Sponsored by: Netflix, Inc.
in the Guest Interruptibility-state VMCS field.
If we fail to do this then a subsequent VM-entry will fail because it is an
error to inject an NMI into the guest while "NMI Blocking" is turned on. This
is described in "Checks on Guest Non-Register State" in the Intel SDM.
Submitted by: David Reed (david.reed@tidalscale.com)