Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from
malloc use 2048 bytes.
There is no easy way to shrink the structure <= 1024 an it is likely to grow in
the future.
random_adaptors_lock is held.
- Use sx_sleep instead of tsleep in read and write path to allow
another thread that registers a new random adapter when waiting.
Assert that random_adaptor is not NULL after reacquiring the lock.
- Capture EINTR/ERESTART from sx_sleep to allow the blocking cycle be
stopped when user requests so, while there also make short
read/write's return 0.
- Move M_WAITOK allocations out of lock scope.
In collobration with: kib, markm, ian, jilles
Reviewed by: kib, markm
Approved by: so
support for AVX on i386.
- Similar to amd64, move the FPU save area out of the PCB and instead
store saved FPU state in a variable-sized buffer after the PCB on the
stack.
- To support the variable PCB location, alter the locore code to only use
the bottom-most page of proc0stack for init386(). init386() returns
the correct stack pointer to locore which adjusts the stack for thread0
before calling mi_startup().
- Don't bother setting cr3 in thread0's pcb in locore before calling
init386(). It wasn't used (init386() overwrote it at the end) and
it doesn't work with the variable-sized FPU save area.
- Remove the new-bus attachment from npx. This was only ever useful for
external co-processors using IRQ13, but those have not been supported
for several years. npxinit() is now called much earlier during boot
(init386()) similar to amd64.
- Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE.
- npxsave() is now only called from context switch contexts so it can
use XSAVEOPT.
Differential Revision: https://reviews.freebsd.org/D1058
Reviewed by: kib
Tested on: FreeBSD/i386 VM under bhyve on Intel i5-2520
The problem was that only the kbdmux keyboard index was saved in
vd->vd_keyboard. This index is -1 when kbdmux isn't used. In this
case, the keyboard was correctly allocated, but the returned index was
discarded.
PR: 194718
MFC after: 1 week
are not suspended. In particular, on the SU-enabled vulumes, there is
no reason why, between the call to softdep_flushfiles() and
softdep_waitidle(), SU work items cannot be queued.
Correct the condition to trigger the panic by only checking when
forced operation is done. Convert direct panic() call into KASSERT(),
there is no invalid on-disk data structures directly involved, so
follow the usual debugging vs. non-debugging approach.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
whether the shared request for already shared-locked lock could be
granted. Both problems result in the exclusive locker starvation.
The concurrent exclusive request is indicated by either
LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse
condition, i.e. no exclusive waiters, must check that both flags are
cleared.
Add a flag LK_NODDLKTREAT for shared lock request to indicate that
current thread guarantees that it does not own the lock in shared
mode. This turns back the exclusive lock starvation avoidance code;
see man page update for detailed description.
Use LK_NODDLKTREAT when doing lookup(9).
Reported and tested by: pho
No objections from: attilio
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
updating the GTT and flushing the AGP TLB by storing the GTT in
write-combining memory.
On x86 flushing the AGP TLB is done by an I/O operation or a store to a
MMIO register in uncacheable memory. Both cases imply that WC buffers are
flushed so no memory barriers are needed.
On powerpc there is no WC memory type. It maps to uncacheable memory and
two stores to uncacheable memory, such as to the GTT and then to an MMIO
register, are strongly ordered, so no memory barriers are needed either.
MFC after: 1 month
moving U-Boot specific code from libfdt.a to a new libuboot_fdt.a. This
needs to be a new library for linking to work correctly.
Differential Revision: https://reviews.freebsd.org/D1054
Reviewed by: ian, rpaulo (earlier version)
MFC after: 1 week
A new terminal_set_cursor() is added: it wraps the existing
teken_set_cursor() function.
In vtbuf_grow(), the cursor position is adjusted at the end of the
function. In vt_change_font(), we call terminal_set_cursor() just after
terminal_set_winsize_blank(), while the terminal is mute.
This fixes a bug where, after loading a kernel video driver which
increases the terminal window size, the cursor remains at its old
position, in other words, in the middle of the display content.
PR: 194421
MFC after: 1 week
hold the gpiobus lock between the gpio calls.
gpiobus_acquire_lock() now accepts a third parameter which tells gpiobus
what to do when the bus is already busy.
When GPIOBUS_WAIT wait is used, the calling thread will be put to sleep
until the bus became free.
With GPIOBUS_DONTWAIT the calling thread will receive EWOULDBLOCK right
away and then it can act upon.
This fixes the gpioiic(4) locking issues that arises when doing multiple
concurrent access on the bus.
of fuword(9) and suword(9). This makes the functions type-compatible
with volatile objects and does not require devolatile force, e.g. in
kern_umtx.c.
Requested by: bde
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
The check was recommened in the draft-ietf-ngtrans-mech-05.txt. But it isn't
clear, should it compare the source with all direct broadcast addresses in the
system or not.
RFC 4213 says it is enough to verify that the source address is the address
of the encapsulator, as configured on the decapsulator. And this verification
can be extended by administrator with any other forms of IPv4 ingress filtering.
Discussed with: glebius, melifaro
Sponsored by: Yandex LLC
The problem is that the __LINE__ macro is constant inside a macro and
results in identical assert statements when the compiler does not
support the static builtin assert function.
MFC: 3 days
Sponsored by: Mellanox Technologies
appropriately
Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable.
We don't have to call it with the lock held if we are just creating new
filedesc.
As a side note, strictly speaking processes can have fdtables with
fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file
descriptor they get will be 0 and the only syscall allowing to choose fd number
requires an active file descriptor. Should this ever change, we can add an 'init'
(or similar) parameter to fdgrowtable.
While here add 'fdused_init' which does not perform unnecessary work.
Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold
it when appropriate. This function is only used with INVARIANTS.
No functional changes intended.
Test for file availability by fde_file != NULL instead of fdisused, this is
consistent with similar checks later.
Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never
reached in practice.
fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS.
No functional changes.
This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources.
The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people.
The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway.
Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to.
My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise.
My Nomex pants are on. Let the feedback commence!
Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?)
Approved by: so(des)
- convert boot1.efi to corrrectly calculate the lba for what the
media reports and convert the size based on what FreeBSD uses.
The existing code would use the 512 byte lba and convert the
size using 4K byte size.
- make fsck_msdosfs read the boot block as 4K so the read doesn't
fail on a 4Kn drive since FreeBSD will error out parition reads
of a block. Make the bpbBytesPerSec check a multiple of 512 since
it can be 512 or 4K depending on the disk. This allows fsck to
pass checking the EFI partition on a 4Kn disk.
To create the EFI file system I used:
newfs_msdos -F 32 -S 4096 -c 1 -m 0xf8 <partition>
This works for booting 512 and 4Kn disks.
Caveat is that loader.efi cannot read the 4Kn EFI partition. This isn't
critical right now since boot1.efi will read loader.efi from the ufs
partition. It looks like loader.efi can be fixed via making some of the
512 bytes reads more flexible. loader.efi doesn't have trouble reading
the ufs partition. This is probably a simple fix.
I now have FreeBSD installed on a system with 4Kn drives and tested the
same code works on 512.
MFC after: 1 week
in the radeonkms driver.
Note: In PCI mode virtual addresses on the graphics card that map to system
RAM are translated to physical addresses by the graphics card itself. In
AGP mode address translation is done by the AGP chipset so fictitious
addresses appear on the system bus. For the CPU cache management to work
correctly when the CPU accesses this memory it needs to use the same
fictitious addresses (and let the chipset translate them) instead of using
the physical addresses directly.
Reviewed by: kib
MFC after: 1 month
When multicast capable interface goes away, it leaves multicast groups,
this leads to generate MLD reports, but MLD code does deffered send and
MLD reports are queued in the in6_multi's in6m_scq ifq. The problem is
that in6_multi structures are freed when interface leaves multicast groups
and thread that does deffered send will not take these queued packets.
PR: 194577
MFC after: 1 week
Sponsored by: Yandex LLC
to mount_nfs(8). They are implemented on Linux, OS X, and Solaris,
and thus can be expected to appear in automounter maps.
Reviewed by: rmacklem@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
without restarting whole lookup
Restart is only needed when fp was closed by current process, which is a much
rarer event than ref/deref by some other thread.
A read barrier was necessary because fd table pointer and table size were
updated separately, opening a window where fget_unlocked could read new size
and old pointer.
This patch puts both these fields into one dedicated structure, pointer to which
is later atomically updated. As such, fget_unlocked only needs data a dependency
barrier which is a noop on all supported architectures.
Reviewed by: kib (previous version)
MFC after: 2 weeks
This makes VMWare VAAI Thin Provisioning Stun primitive activate, pausing
the virtual machine, when backing storage (ZFS pool) is getting overflowed.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
- Move the existing code to x86/x86/identcpu.c since it is x86-specific.
- If the CPUID2_HV flag is set, assume a hypervisor is present and query
the 0x40000000 leaf to determine the hypervisor vendor ID. Export the
vendor ID and the highest supported hypervisor CPUID leaf via
hv_vendor[] and hv_high variables, respectively. The hv_vendor[]
array is also exported via the hw.hv_vendor sysctl.
- Merge the VMWare detection code from tsc.c into the new probe in
identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in
the TSC code to identify VMWare.
Differential Revision: https://reviews.freebsd.org/D1010
Reviewed by: delphij, jkim, neel
unit 0.
It seems that this 'simplification' was copied to all GPIO drivers in tree.
This fix a bug where a GPIO controller could fail to attach its children
(gpioc and gpiobus) if another GPIO driver attach first.
initial MPA exchange must be tracked this way so that t4_tom's state for
the tid is all clean at the time the tid transitions to RDMA mode. Once
it does, t4_tom is out of the way and iw_cxgbe uses the qp endpoints
directly.
Sponsored by: Chelsio Communications
Also fix some mishandling of suword(9) errors as errno, which resulted
in spurious ERESTART.
Sponsored by: The FreeBSD Foundation
Tested by: pho
MFC after: 3 weeks
and casuword(9), but do not mix value read and indication of fault.
I know (or remember) enough assembly to handle x86 and powerpc. For
arm, mips and sparc64, implement fueword() and casueword() as wrappers
around fuword() and casuword(), which means that the functions cannot
distinguish between -1 and fault.
On architectures where fueword() and casueword() are native, implement
fuword() and casuword() using fueword() and casuword(), to reduce
assembly code duplication.
Sponsored by: The FreeBSD Foundation
Tested by: pho
MFC after: 2 weeks (ia64 needs treating)
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.
Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.
MFC after: 3 days
Sponsored by: Mellanox Technologies
- Free rt in c4iw_connect only if it is allocated.
- Call soclose instead of so_shutdown if there is an abort from the peer.
- Close socket and return failure if TOE is not enabled.
Submitted by: Hariprasad at Chelsio dot com
Sponsored by: Chelsio Communications
If bootverbose is enabled, a detailed list is provided; otherwise, a
single-line summary is displayed.
Differential Revision: https://reviews.freebsd.org/D1008
Reviewed by: jhb, neel
MFC after: 1 week
transfers to be default. It simplifies porting code which assumes
such settings.
Discussed with: avg, llos, nwhitehorn
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The kernel tracks syscall users so that modules can safely unregister them.
But if the module is not unloadable or was compiled into the kernel, there is
no need to do this.
Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC
during kernel build and 0 otherwise.
Reviewed by: kib (previous version)
MFC after: 2 weeks
'struct vm *'. Previously it used to be a 'void *' but there is no reason
to hide the actual type from the handler.
Discussed with: tychon
MFC after: 1 week
This prevents BIO_DELETE requests getting stuck in the TRIM queue which
results in a panic on shutdown due to outstanding requests.
PR: 194606
Reported by: Guido Falsi
Reviewed by: mav
MFC after: 3 days
Sponsored by: Multiplay
Multipass device attachment was tested on many arm platforms by users and
only success was reported on the arm@ mailing list. This is just the
long-delayed followup of making it the default.
Multipass attachment is necessary when using vendor-supplied FDT data,
because our devices may need to be attached in a different order than they
are described in the FDT data.
- 'groups' initialization to NULL is always ovewrwriten before use, so plug it
- get rid of 'goto out'
- kern_setgroups's callers already validate ngrp, so only assert the condition
- ngrp is an u_int, so 'ngrp < 1' is more readable as 'ngrp == 0'
No functional changes.
This reduces variability during timer calibration by keeping the emulation
"close" to the guest. Additionally having all timer emulations in the kernel
will ease the transition to a per-VM clock source (as opposed to using the
host's uptime keep track of time).
Discussed with: grehan
Most I/O port handlers return -1 to signal an error. If this value is returned
without modification to vm_run() then it leads to incorrect behavior because
'-1' is interpreted as ERESTART at the system call level.
Fix this by always returning EIO to signal an error from an I/O port handler.
MFC after: 1 week
crash.sh script attached to FreeNAS bug 4109:
https://bugs.freenas.org/issues/4109
Three are in the snapshot layer:
a) AVG explains in his notes: https://wiki.freebsd.org/AvgVfsSolarisVsFreeBSD
"VOP_INACTIVE must not do any destructive actions to a vnode
and its filesystem node, nor invalidate them in any way."
gfs_vop_inactive and zfsctl_snapshot_inactive did just that. In
OpenSolaris VOP_INACTIVE is much closer to FreeBSD's VOP_RECLAIM.
Rename & move them to gfs_vop_reclaim and zfsctl_snapshot_reclaim
and merge in the requisite vnode_destroy from zfsctl_common_reclaim.
b) gfs_lookup_dot and various zfsctl functions do not honor the
FreeBSD VFS convention of only locking from the root downward. When
looking up ".." the convention is to drop the current leaf vnode lock before
acquiring the directory vnode and then subsequently re-acquiring the lock on the
leaf vnode. This fixes that in all the places that our exercised by crash.sh.
c) The snapshot may already be unmounted when the directory vnode is reclaimed.
Check for this case and return.
One in the common layer:
d) Callers of traverse expect the reference to the vnode passed in to be
maintained. Don't release it.
This last one may be an unclear contract. There may in fact be some callers that
do expect the reference to be dropped on success in addition to callers that
expect it to be released. In this case a further audit of the callers is needed
and a consensus on the correct behavior.
PR: 184677
Submitted by: kmacy
Reviewed by: delphij, will, avg
MFC after: 2 weeks
Sponsored by: iXsystems
and the following r273143 commit, supposed to workaround introduced issue by
quite innocent-looking change.
While there is no clear understanding why, but r273143 is accused in data
corruption in some environments with high I/O load. I personally don't see
any problem in that commit, and possibly it is just a trigger to some other
bug somewhere, but better safe then sorry for now.
Requested by: scottl@
MFC after: 3 days
that sctp_med_chunk_output() always initialized the reason_code
instead of relying on the caller.
The variable is only used for debugging purpose.
This issue was reported by Peter Bostroem from Google.
MFC after: 3 days
used for kernel devices it is used by i2c(8).
This fix the 'error: Device not configured' when i2c(8) tries to reset the
controller, as an example:
# i2c -r
Resetting I2C controller on /dev/iic0: error: Device not configured
For now use conservative settings for default i2c speeds.
MFC after: 1 week
For an unkown reason (at moment), sometimes if_cpsw cannot read from PHY
and fails to attach calling cpsw_detach() which end up in a panic.
Fix it by doing the proper check before detach the miibus and also fix the
leak of few variables.
And to actually make it work, ether_ifattach() has to be moved to the end
of cpsw_attach() to avoid a race where calling ether_ifdetach() before
domain_init() (which will only run later on) would make it crash at
INP_INFO_RLOCK() on in_pcbpurgeif0().
Tested on: BBB (am335x)
MFC after: 1 week
in a separate word from the _count. This does not permit both items to
be updated atomically in a portable manner. As a result, sem_post()
must always perform a system call to safely clear _has_waiters.
This change removes the _has_waiters field and instead uses the high bit
of _count as the _has_waiters flag. A new umtx object type (_usem2) and
two new umtx operations are added (SEM_WAIT2 and SEM_WAKE2) to implement
these semantics. The older operations are still supported under the
COMPAT_FREEBSD9/10 options. The POSIX semaphore API in libc has
been updated to use the new implementation. Note that the new
implementation is not compatible with the previous implementation.
However, this only affects static binaries (which cannot be helped by
symbol versioning). Binaries using a dynamic libc will continue to work
fine. SEM_MAGIC has been bumped so that mismatched binaries will error
rather than corrupting a shared semaphore. In addition, a padding field
has been added to sem_t so that it remains the same size.
Differential Revision: https://reviews.freebsd.org/D961
Reported by: adrian
Reviewed by: kib, jilles (earlier version)
Sponsored by: Norse
It had two bugs: one where mmap was still allowed and another where
D_TRACKCLOSE doesn't handle all cases.
Thanks to jhb and kib for pointing them out.
MFC after: 1 week
the map count and without being able to keep track of the current map
allocation, bus_dma_tag_destroy() will fail to proceed and will return
EBUSY even after all the maps have been correctly destroyed with
bus_dmamap_destroy().
Found while testing the detach method of a NIC.
Tested on: BBB (am335x)
Reviewed by: cognet, ian
MFC after: 1 week
In some cases, TSC is broken and special applications might benefit
from memory mapping HPET and reading the registers to count time.
Most often the main HPET counter is 32-bit only[1], so this only gives
the application a 300 second window based on the default HPET
interval.
Other applications, such as Intel's DPDK, expect /dev/hpet to be
present and use it to count time as well.
Although we have an almost userland version of gettimeofday() which
uses rdtsc in userland, it's not always possible to use it, depending
on how broken the multi-socket hardware is.
Install the acpi_hpet.h so that applications can use the HPET register
definitions.
[1] I haven't found a system where HPET's main counter uses more than
32 bit. There seems to be a discrepancy in the Intel documentation
(claiming it's a 64-bit counter) and the actual implementation (a
32-bit counter in a 64-bit memory area).
MFC after: 1 week
Relnotes: yes
Place the code introduced in r268660 into a separate function that can be
called from uiomove_fromphys. Instead of pre-allocating two KVA pages use
vmem_alloc to allocate them on demand when needed. This prevents blocking if
a page fault is taken while physical addresses from outside the DMAP are
used, since the lock is now removed.
Also introduce a safety catch in PHYS_TO_DMAP and DMAP_TO_PHYS.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D947
amd64/amd64/pmap.c:
- Factor out the code to deal with non DMAP addresses from pmap_copy_pages
and place it in pmap_map_io_transient.
- Change the code to use vmem_alloc instead of a set of pre-allocated
pages.
- Use pmap_qenter and don't pin the thread if there can be page faults.
amd64/amd64/uio_machdep.c:
- Use pmap_map_io_transient in order to correctly deal with physical
addresses not covered by the DMAP.
amd64/include/pmap.h:
- Add the prototypes for the new functions.
amd64/include/vmparam.h:
- Add safety catches to make sure PHYS_TO_DMAP and DMAP_TO_PHYS are only
used with addresses covered by the DMAP.
* Use a constant to define the number of stack frames in a probe exception.
* Only allow function symbols in powerpc64 ('.' prefixed)
* Set the fbtp_roffset for return probes, so the correct dtrace_probe call is
made.
MFC after: 1 week
indiscriminately to printf() and freeenv() is incorrect. Add a NULL
check before freeenv(); as for printf(), we could use req.newptr
instead, but we'd have to select the correct format string based on
the type, and that's too much work for an error message, so just
remove it.
workaround for an imx6 chip erratum. Linux works around the bug with
changes in fdt data that we can't currently handle, so to enable running
with standard vendor-supplied fdt data, this watches for an attempt to map
the gpio1_6 interrupt and remaps it back to the standard ethernet interrupt.
This can be undone when the intrng project is completed and our gpio drivers
can also be interrupt controllers.
search (i.e. without returning any result) and you would end up with a
random MAC address.
Change the search algorithm to a recursive one to ensure that all the nodes
on DTS will be verified.
The previous algorithm could not keep up if the DTS has too many sub-nodes.
While here, fix the punctuation on comments.
To restore the default font using vidcontrol(1), use the "-f" flag
without an argument:
vidcontrol -f < /dev/ttyv0
PR: 193910
Differential Revision: https://reviews.freebsd.org/D971
Submitted by: Marcin Cieslak <saper@saper.info>
Reviewed by: ray@, emaste@
Approved by: ray@
MFC after: 1 week
Support for the multiport feature is mostly implemented, but currently
disabled due to some potential races in the hot plug code paths.
Requested by: marcel
MFC after: 1 month
Relnotes: yes
for, or that are required to run the chip (such as busses). Turn off all
the devices we don't yet have drivers for.
Some day we will have a fully functional imx6 clock driver so that we can
manage clocks based on fdt data. This will have to do until then.
variable (if any) provided in the boot environment. Unset it from
the kernel environment after doing this, so that the passphrase is
no longer present in kernel memory once we enter userland.
This will make it possible to provide a GELI passphrase via the boot
loader; FreeBSD's loader does not yet do this, but GRUB (and PCBSD)
will have support for this soon.
Tested by: kmoore
initial static environment to a dynamic one, zero the static environment
buffer, and zero individual values when kern_unsetenv and freeenv are
called.
Tested by: kmoore (VM memory dump + grep)
Tested by: cperciva (kernel panic dump + grep)
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.
Reported by: tcberner@gmail.com
MFC after: 2 weeks
to a power of 2. For non-power of 2 settings, intermittent
page faults have been reported. Although the bug that causes
these page faults/crashes has not been identified, it does
not appear to occur when rsize, wsize is a power of 2.
Reported by: tcberner@gmail.com
MFC after: 2 weeks
Current FreeBSD netback names the interface with xnb<device unit>, but
this is not suitable for usage with the Xen toolstack, which expects
something similar to <prefix><domid><handle>. In order to solve this,
change the netback naming convention to use xnb<domid>.<handle>.
Sponsored by: Citrix Systems R&D
dev/xen/netback/netback.c:
- Change netback to use the nomenclature stated above.
This device is only attached to priviledged domains, and allows the
toolstack to interact with Xen. The two functions of the privcmd
interface is to allow the execution of hypercalls from user-space, and
the mapping of foreign domain memory.
Sponsored by: Citrix Systems R&D
i386/include/xen/hypercall.h:
amd64/include/xen/hypercall.h:
- Introduce a function to make generic hypercalls into Xen.
xen/interface/xen.h:
xen/interface/memory.h:
- Import the new hypercall XENMEM_add_to_physmap_range used by
auto-translated guests to map memory from foreign domains.
dev/xen/privcmd/privcmd.c:
- This device has the following functions:
- Allow user-space applications to make hypercalls into Xen.
- Allow user-space applications to map memory from foreign domains,
this is accomplished using the newly introduced hypercall
(XENMEM_add_to_physmap_range).
xen/privcmd.h:
- Public ioctl interface for the privcmd device.
x86/xen/hvm.c:
- Remove declaration of hypercall_page, now it's declared in
hypercall.h.
conf/files:
- Add the privcmd device to the build process.
Since Xen and FreeBSD error codes are completely different add a
translation layer in order to convert Xen error codes into native
FreeBSD error codes. This will be used by the privcmd device, which
needs to return the hypercall errors into user-space.
Sponsored by: Citrix Systems R&D
xen/error.h:
- Import Xen error codes.
- Create a table to map Xen error codes into FreeBSD native error
codes.
- Create an inline function that performs the translation.
The user-space event channel device is used by applications to receive
and send event channel interrupts. This device is based on the Linux
evtchn device.
Sponsored by: Citrix Systems R&D
xen/evtchn/evtchn_dev.c:
- Remove the old event channel device, which was already disabled in
the build system.
dev/xen/evtchn/evtchn_dev.c:
- Import a new event channel device based on the one present in
Linux.
- This device allows the following operations:
- Bind VIRQ event channels (ioctl).
- Bind regular event channels (ioctl).
- Create and bind new event channels (ioctl).
- Unbind event channels (ioctl).
- Send notifications to event channels (ioctl).
- Reset the device shared memory ring (ioctl).
- Unmask event channels (write).
- Receive event channel upcalls (read).
- The new code is MP safe, and can be used concurrently.
conf/files:
- Add the new device to the build system.
This is needed by the event channel user-space device, that requires
registering event channels without unmasking them. intr_add_handler
will unconditionally unmask the event channel, so we avoid calling it
if no filter/handler is provided, and then the user will be in charge
of calling it when ready.
In order to do this, we need to change the opaque type
xen_intr_handle_t to contain the event channel port instead of the
opaque cookie returned by intr_add_handler, since now registration of
event channels without handlers are allowed. The cookie will now be
stored inside of the private xenisrc struct. Also, introduce a new
function called xen_intr_add_handler that allows adding a
filter/handler after the event channel has been registered.
Sponsored by: Citrix Systems R&D
x86/xen/xen_intr.c:
- Leave the event channel without a handler if no filter/handler is
provided to xen_intr_bind_isrc.
- Don't perform an evtchn_mask_port, intr_add_handler will already do
it.
- Change the opaque type xen_intr_handle_t to contain a pointer to
the event channel port number, and make the necessary changes to
related functions.
- Introduce a new function called xen_intr_add_handler that can be
used to add filter/handlers to an event channel after registration.
xen/xen_intr.h:
- Add prototype of xen_intr_add_handler.
the r241987 commit message, instead of having users locally overriding
the value using tunables in /boot/loader.conf .
Found by: Adam Parco
Discussed with: Nick Hibma
when that happens, we happily access our resource array out of
bounds. Make sure we stay within the MAX_ROMS limit.
While here, bump MAX_ROMS from 16 to 32 to minimize the chance
of leaving option ROMs unaccounted for.
Obtained from: Juniper Networks, Inc.
Rename it to fdsetugidsafety for consistency with other functions.
There is no need to take filedesc lock if not closing any files.
The loop has to verify each file and we are guaranteed fdtable has space
for at least 20 fds. As such there is no need to check fd_lastfile.
While here tidy up is_unsafe.
the Microsoft Azure service does not recognize the second
attached disk on the system.
Submitted by: kyliel@Microsoft
Patched by: weh@Microsoft
PR: 194376
MFC after: 3 days
X-MFC-10.1: yes, ASAP
Sponsored by: The FreeBSD Foundation
- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.
- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.
- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days
Sponsored by: Mellanox Technologies
misconfiguration VM-exit.
An EPT misconfiguration is triggered when the processor encounters a PTE
that is writable but not readable (WR=10). On processors that require A/D
bit emulation PG_M and PG_A map to EPT_PG_WRITE and EPT_PG_READ respectively.
If the PTE is updated as in the following code snippet:
*pte |= PG_M;
*pte |= PG_A;
then it is possible for another processor to observe the PTE after the PG_M
(aka EPT_PG_WRITE) bit is set but before PG_A (aka EPT_PG_READ) bit is set.
This will trigger an EPT misconfiguration VM-exit on the other processor.
Reported by: rodrigc
Reviewed by: grehan
MFC after: 3 days
timecounter resolution is available, so ask for a 1 GHz frequency. It
won't actually get one that fast, but that'll get the fastest available
clock and use a divisor of 1 (probably 132 or 66mhz on current hardware).
Add support for AMD's nested page tables in pmap.c:
- Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit)
for a pmap of type PT_RVI.
- Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of
type PT_EPT or PT_RVI.
Add CPU_SET_ATOMIC_ACQ(num, cpuset):
This is used when activating a vcpu in the nested pmap. Using the 'acquire'
variant guarantees that the load of the 'pm_eptgen' will happen only after
the vcpu is activated in 'pm_active'.
Add defines for various AMD-specific MSRs.
Submitted by: Anish Gupta (akgupt3@gmail.com)
EWOULDBLOCK.
Do not print any message at errors. The errors are properly sent to upper
layers which should be able to deal with it, including printing the errors
when they need to.
The error message was quite annoying while scanning the i2c bus.
MFC after: 1 week
two.
nullfs and unionfs need to request suspension if underlying filesystem(s)
use it. Utilize mnt_kern_flag for this purpose.
This is a fixup for 273271.
No strong objections from: kib
Pointy hat to: mjg
MFC after: 2 weeks
This involves:
1. Have the loader pass the start and size of the .ctors section to the
kernel in 2 new metadata elements.
2. Have the linker backends look for and record the start and size of
the .ctors section in dynamically loaded modules.
3. Have the linker backends call the constructors as part of the final
work of initializing preloaded or dynamically loaded modules.
Note that LLVM appends the priority of the constructors to the name of
the .ctors section. Not so when compiling with GCC. The code currently
works for GCC and not for LLVM.
Submitted by: Dmitry Mikulin <dmitrym@juniper.net>
Obtained from: Juniper Networks, Inc.
vxlan creates a virtual LAN by encapsulating the inner Ethernet frame in
a UDP packet. This implementation is based on RFC7348.
Currently, the IPv6 support is not fully compliant with the specification:
we should be able to receive UPDv6 packets with a zero checksum, but we
need to support RFC6935 first. Patches for this should come soon.
Encapsulation protocols such as vxlan emphasize the need for the FreeBSD
network stack to support batching, GRO, and GSO. Each frame has to make
two trips through the network stack, and each frame will be at most MTU
sized. Performance suffers accordingly.
Some latest generation NICs have begun to support vxlan HW offloads that
we should also take advantage of. VIMAGE support should also be added soon.
Differential Revision: https://reviews.freebsd.org/D384
Reviewed by: gnn
Relnotes: yes
Before, the font was loaded and the window size recalculated, giving an
unusable terminal, even if the actual font didn't change.
Reported by: beeessdee@ruggedinbox.com
MFC after: 3 days
This fix a race where the threads waiting for the bus would wake up early
and still see bus as busy.
While here, give a better description to wmesg for the two use cases we
have (bus and io waiting).
MFC after: 1 week
bhyve doesn't emulate the MSRs needed to support this feature at this time.
Don't expose any model-specific RAS and performance monitoring features in
cpuid leaf 80000007H.
Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and
BIOS signature and P-state related MSRs.
This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels
2.6.32, 3.10.0 and 3.17.0.
will now find the virtual to physical mapping for libkvm to use at
runtime. This makes PHYSADDR redundant, however keep it around to give
everyone a chance to update their libkvm.
MFC after: 1 week
physaddr. This should allow for a kernel where PHYSADDR and KERNPHYSADDR
are both undefined.
For now libkvm will use the old method of reading physaddr and kernaddr
to allow it to work with old kernels. This could be removed in the future
when enough time has passed.
Differential Revision: https://reviews.freebsd.org/D939
MFC after: 1 week
bus_new_pass() handler so it doesn't happen until BUS_PASS_CPU. This allows
the anatop driver to outbid the generic simplebus driver (which the FDT
data describes as compatible).
Some day when we handle power regulators, this driver may actually
become a functional simplebus and attach the regulators as children, as
described in the FDT data.
Increasingly, FDT data has the "simple-bus" compatible string on nodes
that have children, but we wouldn't consider them to be busses. If the
node lacks a ranges property then we will fail to attach successfully,
so fail to probe as well.
rather than u_char.
To try and play nice with the ABI, the u_char CPU ID values are clamped
at 254. The new fields now contain the full CPU ID, or -1 for no cpu.
Differential Revision: D955
Reviewed by: jhb, kib
Sponsored by: Norse Corp, Inc.
lose the contents of consecutive writes (that happens within two SD card
clock cycles).
This fixes the causes of instability during the SD card detection and
identification on Raspberry Pi (which happens at 400 kHz and so was much
more vulnerable to this issue).
Remove the previous workaround which clearly can't provide the same effect.
MFC after: 1 week
Relnotes: yes
to be present. Thsi creates a new per-SoC driver that handles probe and
setting/getting the gpio flags.
Differential Revision: https://reviews.freebsd.org/D943
Reviewed by: loos, rpaulo
MFC after: 1 week