When the CPU Topology was added to bhyve in r332298 the SMBIOS table was
missed, this table passes topology information to the system and was still
using the old concept of each vCPU is a socket with 1 core and 1 thread.
This code did not even try to use the old sysctl information to adjust
this data.
Correct that by building a proper SMBios table, mapping the > 254 cases to
0 per the SMBios 2.6 specification that is claimed by the structure.
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: bde and/or phk (mentor), jhb (maintainer)
MFC: 3 days
Differential Revision: https://reviews.freebsd.org/D18998
The bhyve acpi MADT table was given a static space of 256 (0x100) bytes,
this is enough space to allow VM_MAXCPU to be 21, this patch changes that
so VM_MAXCPU can be of arbitrary value and not overflow the space by
actually calculating the space needed for the table.
PR: 212782
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: bde (mentor), jhb (maintainer)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18815
Replace most VM_MAXCPU constant useses with an accessor function to
vm->maxcpus which for now is initialized and kept at the value of
VM_MAXCPUS.
This is a rework of Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
work from D10070 to adjust it for the cpu topology changes that
occured in r332298
Submitted by: Fabian Freyer (fabian.freyer_physik.tu-berlin.de)
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: bde (mentor), jhb (maintainer)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18755
r346190 added support for printing of INET6 addresses for the "-o" option
(all opens) but missed adding support for INET6 addresses for the "-l" option.
This patch adds that support.
PR: 223036
MFC after: 1 week
By default, cores are now assigned to queues in a sequential
manner rather than all NICs starting at the first core. On a four-core
system with two NICs each using two queue pairs, the nic:queue -> core
mapping has changed from this:
0:0 -> 0, 0:1 -> 1
1:0 -> 0, 1:1 -> 1
To this:
0:0 -> 0, 0:1 -> 1
1:0 -> 2, 1:1 -> 3
Additionally, a device can now be configured to use separate cores for TX
and RX queues.
Two new tunables have been added, dev.X.Y.iflib.separate_txrx and
dev.X.Y.iflib.core_offset. If core_offset is set, the NIC is not part
of the auto-assigned sequence.
Reviewed by: marius
MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D20029
SHLIBDIR should still be optionally set, just before src.opts.mk is included
so that libcompat can properly override it. This fixes lib32 failures
reported by both Jenkins and Michael Butler.
Reported by: Michael Butler <imb@protected-networks.net>
MFC after: 3 days
X-MFC-With: r346546
snagging them from UEFI BIOS). Call the device type init routines
earlier as well, as they don't depend on how the console is
setup. This will allow us to read files earlier in boot, so any rare
error messages that this might move only to the EFI console will be an
acceptable price to pay. Also tweak the order of has_kbd so it resides
next to the rest of the console code. It needs to be after we initialize
the buffer cache.
When efi_autoload is called it will call fdt_setup_fdtp which setup the
dtb and overlays. If a user already loaded at dtb or overlays or just
printed the efi provided dtb, this will re-setup everything and also
re-applying the overlays.
Test that everything is setup before doing it again.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D20059
Rob's patch in D18564 cemented the SHLIBDIR because bsd.own.mk (included by
src.opts.mk) sets it to /usr/lib. r346546 did somehow not apply this part of
the patch, leaving it to get installed to the wrong place and subsequently
removed via ObsoleteFiles.
Reported by: jkim
MFC after: 3 days
X-MFC-With: r346546
Contrary to the comments, it was never used by core dumps or
debuggers. Instead, it used to hold the signal code of a pending
signal, but that was replaced by the 'ksi_code' member of ksiginfo_t
when signal information was reworked in 7.0.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D20047
Otherwise tap(4) can be loaded by loader despite being compiled into the
kernel, causing a panic as things try to double-initialize.
PR: 220867
MFC after: 3 days
in r346631. VGLEnd() clears some state variables as it restores state,
but not all of them, so it still needs to clear a single state variable
to indicate that it has completed. Put this clearing back where it was
(at the start instead of the end) to avoid moving bugs in the signal
handling.
Drivers can now pass up numa domain information via the
mbuf numa domain field. This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.
Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.
Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20028
The disk_open() function searches for "the best partition" when slice and
partition information is not provided as part of the device name. As of
r345477 the slice and partition fields of a disk_devdesc are initialized to
D_SLICEWILD and D_PARTWILD; in the past they were initialized to -1, which
was sometimes interpreted as meaning 'wildcard' and sometimes as 'open the
raw partition' depending on the context. So as an unintended side effect of
r345477 it became basically impossible to ever open a disk or partition
without doing the 'best partition' search. One visible effect of that was
the inability to open the raw disk to read the partition table correctly in
zfs_probe_dev(), leading to failures to find the zfs pool unless it was on
the first partition.
Now instead of always initializing slice and partition to wildcards, the
disk_parsedev() function initializes them based on the presence of a
path/file name following the device. If there is any path or filename
following the ':' that ends the device name, then slice and partition are
initialized to D_SLICEWILD and D_PARTWILD. If there is nothing after the
':' then it is considered to be a request to open the raw device or
partition itself (not a file stored within it), and the fields are
initialized to D_SLICENONE and D_PARTNONE.
With this change in place, all the tests in src/tools/boot are succesful
again, including the recently-added cases of booting from a zfs pool on
a partition other than slice 1 of the device.
PR: 236981
Previously, a pid check was used to prevent open of the tun(4); this works,
but may not make the most sense as we don't prevent the owner process from
opening the tun device multiple times.
The potential race described near tun_pid should not be an issue: if a
tun(4) is to be handed off, its fd has to have been sent via control message
or some other mechanism that duplicates the fd to the receiving process so
that it may set the pid. Otherwise, the pid gets cleared when the original
process closes it and you have no effective handoff mechanism.
Close up another potential issue with handing a tun(4) off by not clobbering
state if the closer isn't the controller anymore. If we want some state to
be cleared, we should do that a little more surgically.
Additionally, nothing prevents a dying tun(4) from being "reopened" in the
middle of tun_destroy as soon as the mutex is unlocked, quickly leading to a
bad time. Return EBUSY if we're marked for destruction, as well, and the
consumer will need to deal with it. The associated character device will be
destroyed in short order.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20033
It seems that there should be a better way to handle this, but this seems to
be the more common approach and it should likely get replaced in all of the
places it happens... Basically, thread 1 is in the process of destroying the
tun/tap while thread 2 is executing one of the ioctls that requires the
tun/tap mutex and the mutex is destroyed before the ioctl handler can
acquire it.
This is only one of the races described/found in PR 233955.
PR: 233955
Reviewed by: ae
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20027
The <sys/pctrie.h> APIs expect a 64-bit DMA key.
This is fine as long as the DMA is less than or equal to 64 bits, which
is currently the case.
Sponsored by: Mellanox Technologies
From 7d8dc6544c
"The mcbin (and likely others) have a nonstandard uart clock. This means
that the earlycon programming will incorrectly set the baud rate if it is
specified. The way around this is to tell the kernel to continue using the
preprogrammed baud rate. This is done by setting the baud to 0."
Our drivers (uart_dev_ns8250) do respect zero, but SPCR would error. Let's
not error.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: mw, imp, bcran
Differential Revision: https://reviews.freebsd.org/D19914
ufs partition as p2, and put the zfs partition at p3, to test the ability
of the zfs probe code to find a zfs pool on something other than the first
partition.
Parse the R_MIPS_32 and R_MIPS_64 relocations. Both Elf_Rel and
Elf_Rela relocations are handled since O32 MIPS uses Elf_Rel while N64
uses Elf_Rela. Note that R_MIPS_32 is only handled for 32-bit mips
and R_MIPS_64 for 64-bit. N32 is untested.
Reviewed by: imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D19870
This is fairly similar to the AES-GCM support in ccr(4) in that it will
fall back to software for certain cases (requests with only AAD and
requests that are too large).
Tested by: cryptocheck, cryptotest.py
MFC after: 1 month
Sponsored by: Chelsio Communications
A request to encrypt an empty payload without any AAD is unusual, but
it is defined behavior. Removing this assertion removes a panic and
instead returns the correct tag for an empty buffer.
Reviewed by: cem, sef
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20043
To workaround limitations in the crypto engine, empty buffers are
handled by manually constructing the final length block as the payload
passed to the crypto engine and disabling the normal "final" handling.
For HMAC this length block should hold the length of a single block
since the hash is actually the hash of the IPAD digest, but for
"plain" SHA the length should be zero instead.
Reported by: NIST SHA1 test failure
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Add support for newer Thinkpad models with id LEN0268. Was tested on
Thinkpad T480 and ThinkPad X1 Yoga 2nd gen.
PR: 229120
Submitted by: Ali Abdallah <aliovx@gmail.com>
MFC after: 1 week
tools/boot/install-boot.sh was assuming that if a device was passed in,
it should operate on the current system and run efibootmgr etc. to
update the boot manager. However, rootgen.sh passes a md(4) device and
not a fixed disk.
Add a -u option to install-boot.sh to tell it to update the system
in-place and run efibootmgr etc.
Also, source install-boot.sh in rootgen.sh to allow it to find and
call make_esp_file etc. And pass the loader file to make_esp_file instead
of a directory name.
Reported by: ian
Reviewed by: ian,imp,tsoome
Differential Revision: https://reviews.freebsd.org/D19992
destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
We may need the BSP to reboot, but we don't need any AP CPU that isn't the
panic thread. Any CPU landing in this routine during panic isn't the panic
thread, so we can just detect !BSP && panic and shut down the logical core.
The savings can be demonstrated in a bhyve guest with multiple cores; before
this change, N guest threads would spin at 100% CPU. After this change,
only one or two threads spin (depending on if the panicing CPU was the BSP
or not).
Konstantin points out that this may break any future patches which allow
switching ddb(4) CPUs after panic and examining CPU-local state that cannot
be inspected remotely. In the event that such a mechanism is incorporated,
this behavior could be made configurable by tunable/sysctl.
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20019
VGLMouseFreeze() now only defers mouse signals and leaves it to higher
levels to hide and unhide the mouse cursor if necessary. (It is never
necessary, but is done to simplify the implementation. It is slow and
flashes the cursor. It is still done for copying bitmaps and clearing.)
VGLMouseUnFreeze() now only undoes 1 level of freezing. Its old
optimization to reduce mouse redrawing is too hard to do with unhiding
in higher levels, and its undoing of multiple levels was a historical
mistake.
VGLMouseOverlap() determines if a region overlaps the (full) mouse region.
VGLMouseFreezeXY() is the freezing and a precise overlap check combined
for the special case of writing a single pixel. This is the single-pixel
case of the old VGLMouseFreeze() with cleanups.
Fixes:
- check in more cases that the application didn't pass an invalid VIDBUF
- check for errors from copying a bitmap to the shadow buffer
- freeze the mouse before writing to the shadow buffer in all cases. This
was not done for the case of writing a single pixel (there was a race)
- don't spell the #defined values for VGLMouseShown as 0, 1 or boolean.