- Need to set the pre-fetch memory address when reading the host memory.
- We currently assume that no endianness conversion is needed.
Sponsored by: DARPA, AFRL
Assert that the hold count has not fallen below the use count, a situation
that would only happen when a vref() (or similar) is erroneously paired
with a vdrop(). This situation has not been observed in the wild, but
could be helpful for someone implementing a new filesystem.
Reviewed by: kib
Approved by: hrs (mentor)
boundary. This was addressed several years ago by creating a parent
tag hierarchy for the root buses that set the boundary restriction
for appropriate buses and allowed child deviced to inherit it.
Somewhere along the way, this restriction was turned into a case for
marking the tag as a candidate for needing bounce buffers, instead
of just splitting the segment along the boundary line. This flag
also causes all maps associated with this tag to be non-NULL, which
in turn causes bus_dmamap_sync() to take the slow path of function
pointer indirection to discover that there's no bouncing work to
do. The end result is a lot of pages set aside in bounce pools
that will never be used, and a slow path for data buffers in nearly
every DMA-capable PCIe device. For example, our workload at Netflix
was spending nearly 1% of all CPU time going through this slow path.
Fix this problem by being more selective about when to set the
COULD_BOUNCE flag. Only set it when the boundary restriction
exists and the consumer cannot do more than a single DMA segment
at once. This fixes the case of dynamic buffers (mbufs, bio's)
but doesn't address static buffers allocated from bus_dmamem_alloc().
That case will be addressed in the future.
For those interested, this was discovered thanks to Dtrace Flame
Graphs.
Discussed with: jhb, kib
Obtained from: Netflix, Inc.
MFC after: 3 days
Set the accessed and dirty bits in the page table entry. If it fails then
restart the page table walk from the beginning. This might happen if another
vcpu modifies the page tables simultaneously.
Reviewed by: alc, kib
ismt(4) supports the SMBus Message Transport controller found on Intel
C2000 series (Avoton) and S1200 series (Briarwood) Atom SoCs.
Sponsored by: Intel
on execve(2), it calls vmspace_exec(), which frees the current
vmspace. The thread executing an exec syscall gets new vmspace
assigned, and old vmspace is freed if only referenced by the current
process. The free operation includes pmap_release(), which
de-constructs the paging structures used by hardware.
If the calling process is multithreaded, other threads are suspended
in the thread_suspend_check(), and need to be unsuspended and run to
be able to exit on successfull exec. Now, since the old vmspace is
destroyed, paging structures are invalid, threads are resumed on the
non-existent pmaps (page tables), which leads to triple fault on x86.
To fix, postpone the free of old vmspace until the threads are resumed
and exited. To avoid modifications to all image activators all of
which use exec_new_vmspace(), memoize the current (old) vmspace in
kern_execve(), and notify it about the need to call vmspace_free()
with a thread-private flag TDP_EXECVMSPC.
http://bugs.debian.org/743141
Reported by: Ivo De Decker <ivo.dedecker@ugent.be> through secteam
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
birthday-paradox style address collisions when
bhyve VMs are connected to the same broadcoast
domain and are using pseudo-random allocations.
Reviewed by: gnn
MFC after: 1 week
g_orphan_register and g_resize_provider_event. Both are called from the
event queue. Also we have GEOM_DEV class, which does deferred destroy
for its consumers via g_dev_destroy (also called from the event queue).
So it is possible, that for some consumers an orphan method will be
called twice. This triggers panic in g_dev_orphan.
Check that consumer isn't already orphaned before call orphan method.
MFC after: 2 weeks
Replace with the existing loop termination test with a similar
condition from the nested "if" that may terminate the loop a bit
sooner, but still not too early. This condition can then be removed
from the nested "if". Relocate an operator to be style(9) compliant.
MFC after: 3 days
to a guest physical address.
PG_PS (page size) field is valid only in a PDE or a PDPTE so it is now
checked only in non-terminal paging entries.
Ignore the upper 32-bits of the CR3 for PAE paging.
Intel 40G Ethernet Controller XL710 Family. This is
the core driver, a VF driver called i40evf, will be
following soon. Questions or comments to myself or
my co-developer Eric Joyner. Cheers!
lookup for the inp flowid/flowtype to destination CPU.
This only modifies the case where RSS is enabled and the per-cpu tcp
timer option is enabled. Otherwise the behaviour should be the same
as before.
This is intended to be used by various places that wish to hash some
information about a TCP/UDP/IP flow but don't necessarily have a
live mbuf to do it with.
Refactor rss_m2cpuid() to use the refactored function.
- Implement support for interrupt filters in the DWC OTG driver, to
reduce the amount of CPU task switching when only feeding the FIFOs.
- Add common spinlock to the USB bus structure.
MFC after: 2 weeks
- inserting frame enter/leave sequences
- restructuring the vmx_enter_guest routine so that it subsumes
the vm_exit_guest block, which was the #vmexit RIP and not a
callable routine.
Reviewed by: neel
MFC after: 3 weeks
src/sys and the rest of the tree for builds.
o eliminate including bsd.mkopts.mk for the moment in kern.opts.mk
o No need to include src.opts.mk at all anymore. The reasons for it
are now coverted in sys.mk and src.sys.mk.
Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().
Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.
PR: bin/189471
Submitted by: Dennis Yusupoff <dyr@smartspb.net>
MFC after: 2 weeks
!(PFRULE_FRAGCROP|PFRULE_FRAGDROP) case.
o In the (PFRULE_FRAGCROP|PFRULE_FRAGDROP) case we should allocate mtag
if we don't find any.
Tested by: Ian FREISLICH <ianf cloudseed.co.za>
platform code, it is expected these will be merged in the future when the
ARM code is more complete.
Until more boards can be tested only use this with the Raspberry Pi and
rrename the functions on the other SoCs.
Reviewed by: ian@
it doesn't leak through when the command structure is reused for a user
command without a data buffer.
PR: amd64/189668
Tested by: Pete Long <pete@nrth.org>
MFC after: 1 week
near-term future use.
These are intended to fetch the current flow id, flow hash type
(M_HASHTYPE_* from the sys/mbuf.h) and if RSS is enabled, the
RSS destined CPU ID for the receive path.
XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions).
Obtained from: Intel's Instruction Set Extensions Programming Reference
(March 2014)
possible and do not clear IN6_IFF_TENTATIVE. If IFDISABLED was accidentally
set after a DAD started, TENTATIVE could be cleared because no NA was
received due to IFDISABLED, and as a result it could prevent DAD when
manually clearing IFDISABLED after that.
eight years. The original concept was to improve the
corner case where you run out of ephemeral ports, but it
was causing performance problems and the mechanism
of limiting the number of time_wait sockets serves
the same purpose in the end.
Reviewed by: bz
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
8 steerable pins configured via config space access to byte-wide
registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
pin and a PCI interrupt router pin. When a PCI INTx interrupt is
asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
8259A pins (ISA IRQs) and initialize the interrupt line config register
for the corresponding PCI function with the ISA IRQ as this matches
existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
PRT tables and return the appropriate table based on the configured
routing configuration. Note that if the lpc device is not configured, no
routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
the DSDT. iasl(8) will fill in the actual number of elements, and
this makes it simpler to generate a Package with a variable number of
elements.
Reviewed by: tycho
This includes decodes of recent Intel instructions, in particular
VT-x and related instructions. This allows the FBT provider to
locate the exit points of routines that include these new
instructions.
Illumos issues:
3414 Need a new word of AT_SUN_HWCAP bits
3415 Add isainfo support for f16c and rdrand
3416 Need disassembler support for rdrand and f16c
3413 isainfo -v overflows 80 columns
3417 mdb disassembler confuses rdtscp for invlpg
1518 dis should support AMD SVM/AMD-V/Pacifica instructions
1096 i386 disassembler should understand complex nops
1362 add kvmstat for monitoring of KVM statistics
1363 add vmregs[] variable to DTrace
1364 need disassembler support for VMX instructions
1365 mdb needs 16-bit disassembler support
This corresponds to Illumos-gate (github) version
eb23829ff08a873c612ac45d191d559394b4b408
Reviewed by: markj
MFC after: 1 week
with all bits set to 1 beyond the I/O permission bitmap.
Prior to this change accessing I/O ports [0xFFF8-0xFFFF] would trigger a
#GP fault even though the I/O bitmap allowed access to those ports.
For more details see section "I/O Permission Bit Map" in the Intel SDM, Vol 1.
Reviewed by: kib
Here, "suitably endowed" means that the System Control Coprocessor
(#15) has Performance Monitoring Registers, including a CCNT (Cycle
Count) register.
The CCNT register is used in a way similar to the TSC register in
x86 processors by the get_cyclecount(9) function. The entropy-harvesting
thread is a heavy user of this function, and will benefit from not
having to call binuptime(9) instead.
One problem with the CCNT register is that it is 32-bit only, so
the upper 32-bits of the returned number are always 0. The entropy
harvester does not care, but in case any one else does, follow-up
work may include an interrup trap to increment an upper-32-bit
counter on CCNT overflow.
Another problem is that the CCNT register is not readable in user-mode
code; in can be made readable by userland, but then it is also
writable, and so is a good chunk of the PMU system. For that reason,
the CCNT is not enabled for user-mode access in this commit.
Like the x86, there is one CCNT per core, so they don't all run in
perfect sync.
Reviewed by: ian@ (an earlier version)
Tested by: ian@ (same earlier version)
Committed from: WANDBOARD-QUAD
audio device driver is detached first and not its children. This fixes
a panic in some cases when unloading "snd_uaudio" while a USB device
is plugged. The linking order affects the order in which the module
dependencies are registered.
MFC after: 1 week
parent and child processes. Previously, we copied these pages even though
they are read only. However, the reason for copying them is historical and
no longer exists. In recent times, vm_map_protect() has developed the
ability to copy pages when write access is added to wired copy-on-write
pages. So, in this case, copy-on-write sharing of wired pages is not to be
feared. It is not going to lead to copy-on-write faults on wired memory.
Reviewed by: kib
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
Previously only TX IP checksum offloading was disabled but it's
reported that TX checksum offloading for UDP datagrams with IP
options also generates corrupted frames. Reporter's controller is
RTL8168CP but I guess RTL8168C also have the same issue since it
shall share the same core.
Reported and tested by: tuexen
that and the need to be in a critical section when switching to idleclock
mode for event timers, use spinlock_enter()/exit() to achieve both needs.
The ARM WFI (wait for interrupt) instruction blocks until an interrupt is
asserted, and it will unblock even if interrupts are masked, and it will
unblock immediately if an interrupt is already pending. It is necessary
to execute it with interrupts disabled, otherwise the interrupt that
should unblock it may occur and be serviced just prior to executing the
instruction. At that point the system is inappropriately asleep until
the next timer tick or some other random interrupt happens.
In general, interrupts need to be disabled continuously from the time the
decision is made that there is no work to be done and sleeping is needed
until actually going to sleep, to avoid a race where handling a new
interrupt changes the basis for deciding there is no work to be done.
Submitted by: hps@ (in slightly different form)
1. Make sure IPI mask is set before sending the IPI
2. Operate atomically on PS3 PIC outstanding interrupt list
3. Make sure IPIs are EOI'ed before, not after, processing. Without this,
a second IPI could be sent partway through processing the first one,
get erroneously acknowledge by the EOI to the first, and be lost. In
particular in the case of smp_rendezvous(), this can be fatal.
In combination, this makes the PS3 boot SMP again. It probably also fixes
some latent bugs elsewhere.
MFC after: 2 weeks
does an out-of-tree build without setting MAKESYSPATH) and recently
added requirements (JIRA's building the modules in a non-standard
layout). So, when MAKESYSPATH is defined, trust that it will do the
right thing (to catch the JIRA use case). When it isn't defined,
assume a standard FreeBSD tree and reach over to grab bsd.mkopt.mk (to
fix the /usr/ports use case). Both camps cannot be appeased otherwise,
so we have this kludge until it can be sorted out.
If the underlying protocol reported an error (e.g. because a connection was
closed while waiting in the queue), this error was also indicated by
returning a zero-length address. For all other kinds of errors (e.g.
[EAGAIN], [ENFILE], [EMFILE]), *addrlen is unmodified and there are
successful cases where a zero-length address is returned (e.g. a connection
from an unbound Unix-domain socket), so this error indication is not
reliable.
As reported in Austin Group bug #836, modifying *addrlen on error may cause
subtle bugs if applications retry the call without resetting *addrlen.