via a software interrupt.
This is safe to do because the logical processor is already cognizant of the
NMI and further NMIs are blocked until the host's NMI handler executes "iret".
r260545 cleared the inode flags to fix corruption problems but
we still need to pass some EXT4 flags for the ext4 read-only
mode. None of these attributes has an equivalent in FreeBSD and
are uninteresting for the system utilities so they should be
innaccessible in ext2_getattrib().
Note: we also use EXT4_HUGE_FILE but we use it directly from the
dinode structure so it is not necessary to translate it,
Suggested by: bde
MFC after: 3 days
This is the first step needed to get the snapper codec working on those
machines.
The second step is to enable the corresponding I2S device and its clock.
Tested on machines where the snapper codec was already working, a G4 PowerBook
and a PowerMac9,1 with a Shasta based macio.
The PowerMac7,2/7,3 with a K2 based macio can now also play sound.
MFC after: 1 month
- Store the length of each read-only VPD value since not all values are
guaranteed to be ASCII values (though most are).
- Add a new pciio ioctl to fetch VPD for a single PCI device. The values
are returned as a list of variable length records, one for the device
name and each keyword.
- Add a new -V flag to pciconf's list mode which displays VPD data for
each device.
MFC after: 1 week
fence. Under system load, the CPU has been found to change the order
by which the stores are made visible. When the tag is made visible
before the other TLB values, other CPUs may use the invalid TLB values
and do bad things.
While here (i.e. not a fix) don't return errors from pmap_remove_vhpt()
to callers of pmap_remove_pte(). Those callers don't check the return
value and as such don't do what is needed to keep a consistent state.
More importantly, pmap_remove_vhpt() can't really have an error without
it indicating something unintended. Using KASSERT is therefore better.
PR: 182999, 183227
console, it calls the grab functions. These functions should turn off
the RX interrupts, and any others that interfere. This makes mountroot
prompt work again. If there's more generalized need other than
prompting, many of these routines should be expanded to do those new
things.
Should have been part of r260889, but waasn't due to command line typo.
Reviewed by: bde (with reservations)
console, it calls the grab functions. These functions should turn off
the RX interrupts, and any others that interfere. This makes mountroot
prompt work again. If there's more generalized need other than
prompting, many of these routines should be expanded to do those new
things.
Reviewed by: bde (with reservations)
* Set ia address/mask values BEFORE attaching to address lists.
Inet6 address assignment is not atomic, so the simplest way to
do this atomically is to fill in ia before attach.
* Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code).
* Do some renamings:
in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here)
in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code)
in6_ifaddloop -> nd6_add_ifa_lle
in6_ifremloop -> nd6_rem_ifa_lle
* Split working with LLE and route announce code for last two.
Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour.
* Call device SIOCSIFADDR handler IFF we're adding first address.
In IPv4 we have to call it on every address change since ARP record
is installed by arp_ifinit() which is called by given handler.
IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so
there is no reason to call SIOCSIFADDR often.
ia64 for the very first time. Only 9 years in the making...
Note that the vt/vga driver does not actually make sure there's
VGA hardware at the standard/legacy VGA I/O port and memory I/O
addresses. This can cause machine checks if the H/W does not have
a VGA controller.
of a syncache connection, copy it into the inp_flowid field.
Without this, an incoming TCP connection won't have an inp_flowid marked
until some data comes in, and this means that things like the per-CPU
TCP timer option will choose a different CPU for the timer work.
(It also means that if one grabbed the flowid via an ioctl from userland,
it won't be available until some data has been received.)
Sponsored by: Netflix, Inc.
callback providers. link_init_sdl() function can be used to
fill most of the parameters. Use caller stack instead of
allocation / freing memory for each request. Do not drop support
for extra-long (probably non-existing) link-layer protocols by
introducing link_alloc_sdl() (used by if_resolvemulti() callback)
and link_free_sdl() (used by caller).
Since this change breaks KBI, MFC requires slightly different approach
(link_init_sdl() auto-allocating buffer if necessary to handle cases
with unmodified if_resolvemulti() callers).
MFC after: 2 weeks
the Guest Interruptibility-state field. However, there isn't any way to
figure out which processors have this requirement.
So, inject a pending NMI only if NMI_BLOCKING, MOVSS_BLOCKING, STI_BLOCKING
are all clear. If any of these bits are set then enable "NMI window exiting"
and inject the NMI in the VM-exit handler.
1. Be consistent in the style of "act_delta" manipulations between the
inactive and active queue scans.
2. Explicitly compare to zero.
3. The deactivation of a page is based is based on its recent history
and not just the current call to vm_pageout_scan(). The variable
"act_delta" represents the current state of the page, and not its
history. Avoid possible confusion by not (ab)using "act_delta" for
the making the deactivation decision.
Submitted by: kib [1]
Reviewed by: kib [2,3]
with USB device detach when using character device handles. This also
includes LibUSB. It turns out that "usb_close()" cannot always get a
reference to clean up its USB transfers and such, if called during the
kernel USB device detach.
Analysis by: hselasky @
Reported by: Juergen Lock <nox@jelal.kn-bremen.de>
MFC after: 1 week
This is done to ensure that visited object IDs are always increasing.
Also, pass correct object ID to prefetch_dnode_metadata for
os_groupused_dnode.
Without this change we would hit an assert if traversal was paused on
a GROUPUSED object, which is unlikely but possible.
Apparently the same change was independently developed by Deplhix.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days
Sponsored by: HybridCluster
This fires off a kqueue note (of type sendfile) to the configured kqfd
when the sendfile transaction has completed and the relevant memory
backing the transaction is no longer in use by this transaction.
This is analogous to SF_SYNC waiting for the mbufs to complete -
except now you don't have to wait.
Both SF_SYNC and SF_KQUEUE should work together, even if it
doesn't necessarily make any practical sense.
This is designed for use by applications which use backing cache/store
files (eg Varnish) or POSIX shared memory (not sure anything is using
it yet!) to know when a region of memory is free for re-use. Note
it doesn't mark the region as free overall - only free from this
transaction. The application developer still needs to track which
ranges are in the process of being recycled and wait until all
pending transactions are completed.
TODO:
* documentation, as always
Sponsored by: Netflix, Inc.
This is still under a bit of flux, as the final API hasn't been nailed
down. It's also unclear whether we should define the two new types in the
header or not - it may allow bad code to compile that shouldn't (ie,
since uintX's are defined, the developer may not include sys/types.h.)
Reviewed by: peter, imp, bde
Sponsored by: Netflix, Inc.
in the Guest Interruptibility-state VMCS field.
If we fail to do this then a subsequent VM-entry will fail because it is an
error to inject an NMI into the guest while "NMI Blocking" is turned on. This
is described in "Checks on Guest Non-Register State" in the Intel SDM.
Submitted by: David Reed (david.reed@tidalscale.com)
The bug was introduced in r256956 "Improve ZFS N-way mirror read
performance".
The code in vdev_mirror_dva_select erroneously considers already
tried DVAs for the next attempt. Thus, it is possible that a failing DVA
would be retried forever.
As a secondary effect, if the attempts fail with checksum error, then
checksum error reports are accumulated until the original request
ultimately fails or succeeds. But because retrying is going on indefinitely
the cheksum reports accumulation will effectively be a memory leak.
Reviewed by: gibbs
MFC after: 13 days
Sponsored by: HybridCluster
If we prematurely free the name buffer and it gets quickly recycled,
then zfs_rename may see data from another lookup or even unmapped memory
via cn_nameptr.
MFC after: 6 days
Sponsored by: HybridCluster
The bug was introduced in r256956 "Improve ZFS N-way mirror read
performance".
The code in vdev_mirror_dva_select erroneously considers already
tried DVAs for the next attempt. Thus, it is possible that a failing DVA
would be retried forever.
As a secondary effect, if the attempts fail with checksum error, then
checksum error reports are accumulated until the original request
ultimately fails or succeeds. But because retrying is going on indefinitely
the cheksum reports accumulation will effectively be a memory leak.
Reviewed by: gibbs
MFC after: 13 days
Sponsored by: HybridCluster
Otherwise we could run into the following deadlock.
A thread has a transaction open and assigned to a transaction group.
That would prevent the transaction group from be quiesced and synced.
The thread is blocked in getnewvnode_reserve waiting for a vnode to
a be reclaimed. vnlru thread is blocked trying to enter ZFS VOP because
a filesystem is suspended by an ongoing rollback or receive operation.
In its turn the operation is waiting for the current transaction group
to be synced.
zfs_zget is always used outside of active transactions, but zfs_mknode
is always used in a transaction context. Thus, we hoist
getnewvnode_reserve from zfs_mknode to its callers.
While there, assert that ZFS always calls getnewvnode while having
a vnode reserved.
Reported by: adrian
Tested by: adrian
MFC after: 17 days
Sponsored by: HybridCluster
Problem case:
Original lookup returns route with GW set, so gw points to
rte->rt_gateway.
After that we're changing dst and performing lookup another time.
Since fwd host is most probably directly reachable, resulting
rte does not contain rt_gateway, so gw is not set. Finally, we
end with packet transmitted to proper interface but wrong
link-layer address.
Found by: lstewart
Discussed with: ae,lstewart
MFC after: 2 weeks
Sponsored by: Yandex LLC
add separate rx/tx ring indexes
add ring specifier in nm_open device name
netmap.c, netmap_vale.c
more consistent errno numbers
netmap_generic.c
correctly handle failure in registering interfaces.
tools/tools/netmap/
massive cleanup of the example programs
(a lot of common code is now in netmap_user.h.)
nm_util.[ch] are going away soon.
pcap.c will also go when i commit the native netmap support for libpcap.
sure to clear the lower 12 bits. We're adding the translation
attributes to the physical address and non-zero bits in the first
12 bits would give us something unexpected, including invalid bit
values. Those trigger nested general protection faults.
We do not have to clear the region bits, because they are ignored
anyway, so we can replace an existing dep instruction with the one
we need.
This fixes GP faults for the swapper thread, as it's the only thread
that has a direct-mapped stack. Since the bug is in the nested TLB
fault handler, the frequency of hitting the GP is in the order of
hours/days under load.
non-modifier key press. This prevents so-called "ghost
keyboards" keeping modifier keys pressed while not
actually seen as a real keyboard.
MFC after: 2 weeks
can be initiated in the context of a vcpu thread or from the bhyve(8) control
process.
The first use of this functionality is to update the vlapic trigger-mode
register when the IOAPIC pin configuration is changed.
Prior to this change we would update the TMR in the virtual-APIC page at
the time of interrupt delivery. But this doesn't work with Posted Interrupts
because there is no way to program the EOI_exit_bitmap[] in the VMCS of
the target at the time of interrupt delivery.
Discussed with: grehan@
found in High Speed USB HUBs which translate from High Speed USB into
FULL or LOW speed USB. In some rare cases SPLIT transactions might get
lost, which might leave the TT in an unknown state. Whenever we detect
such an error try to issue either a clear TT buffer request, or if
that is not possible reset the whole TT.
MFC after: 1 week
vm_pageout_scan(). There were missing increments in two less common cases.
Don't conflate the count of stuck pages and the pageout deficit provided by
vm_page_alloc{,_contig}(). (A proposed fix to the OOM code depends on this.)
Handle held pages consistently in the inactive queue scan. In the more
common case, we did not move the page to the tail of the queue. Whereas, in
the less common case, we did. There's no particular reason to move the page
in the less common case, so remove it.
Perform the calculation of the page shortage for the active queue scan a
little earlier, before the active queue lock is acquired. The correctness
of this calculation doesn't depend on the active queue lock being held.
Eliminate a redundant variable, "pcount". Use the more descriptive
variable, "maxscan", in its place.
Apply a few nearby style fixes, e.g., eliminate stray whitespace and excess
parentheses.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
originally.
I am not sure why exactly have I moved it during one of many refactorings
during camlock project, but obviously it opens race window that may cause
use after free panics during SIM (in reported cases umass(4)) detach.
MFC after: 2 weeks
LibAliasSetAddress() uses its own mutex to serialize changes.
While here, convert ifp->if_xname access to if_name() function.
MFC after: 2 weeks
Sponsored by: Yandex LLC
After r252890 we are naively attempting to pass through the
inode flags. This is technically incorrect as the ext2
inode flags don't match the UFS/system values used in
FreeBSD and a clean conversion is needed.
Some filtering was left in place so the change didn't cause
significant changes in FreeBSD but some of the garbage passed
is likely to be the cause for warning messages in linux.
Fix the issue by resetting the flags before conversion as was
done previously. This also means we will not pass the EXT4_*
inode flags into FreeBSD's inode.
PR: kern/185448
MFC after: 3 days
controller found in the MBP2013 has been observed to not work properly
unless this operation is performed.
MFC after: 1 week
Tested by: Huang Wen Hui <huanghwh@gmail.com>
inject interrupts into the guest without causing a VM-exit.
This feature can be disabled by setting the tunable "hw.vmm.vmx.use_apic_pir"
to "0".
The following sysctls provide information about this feature:
- hw.vmm.vmx.posted_interrupts (0 if disabled, 1 if enabled)
- hw.vmm.vmx.posted_interrupt_vector (vector number used for vcpu notification)
Tested on a Intel Xeon E5-2620v2 courtesy of Allan Jude at ScaleEngine.