The seg_max value reported to the guest should be two less than the
host's maximum, in order to leave room for the request and the
response. This is analogous to r347033 for virtio_block.
We hit the "too many segments to enqueue" assertion on OneFS because
we increase MAXPHYS to 256 KB.
Reviewed by: bryanv
Discussed with: cem jhb rgrimes
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20529
bsnmpd(1) main does that early on init and the connection is available
to all loaded modules
Event: Vienna Hackathon 2019
PR: 233431 , 221487
MFC after: 2 weeks
Otherwise duplicate messages can trigger a reinitialization of the
compression stream while the update thread is running. Also ensure
that the stream is initialized before the update thread may attempt
to use it.
PR: 238333
Reviewed by: cem, rgrimes
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20673
The vsc_rx_ready and the RX virtqueue is protected by the rx_mtx lock.
However, pci_vtnet_ping_rxq() (currently called only once after each
device reset) accesses those without acquiring the lock.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20609
can be combined with configuring the period and duty cycle (the same ioctl
sets all 3 values at once, so there's no reason to require the user to run
the program twice to get all 3 things set).
The driver now names its cdev nodes pwmcX.Y where X is unit number and
Y is the channel within that unit. Change the default device name from
pwmc0 to pwmc0.0. The driver now puts cdev files and label aliases in
the /dev/pwm directory, so allow the user to provide unqualified names
with -f and automatically prepend the /dev/pwm part for them.
Update the examples in the manpage to show the new device name format
and location within /dev/pwm.
ioctl definitions and related datatypes that allow userland control of pwm
hardware via the pwmc device. The new name and location better reflects its
assocation with a single device driver.
Both virtio_net and e82545 network frontends have code to validate and
generate MAC addresses. These functionalities are replicated in the two
files, so we move them in a separate compilation unit.
Reviewed by: rgrimes, bryanv, imp, kevans
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20626
The VirtIO standard supports two schemes for notification suppression:
a notification enable bit and a more sophisticated one (event_idx) that
also supports delayed notifications. Currently bhyve fully supports
only the first scheme. This patch hides the notification suppression
internals by means of two inline routines, vq_kick_enable() and
vq_kick_disable(), and makes the code more readable.
Moreover, further improve readability by replacing the call to mb()
with a call to atomic_thread_fence_seq_cst(), which is already used
in virtio.c
Reviewed by: pmooney_pfmooney.com, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20581
On vtnet device reset it is necessary to wait for threads to stop TX and
RX processing. However, the rx_in_progress variable (used for to wait for
RX processing to stop) is actually useless, and can be removed. Acquiring
and releasing the RX lock is enough to synchronize correctly. Moreover,
it is possible to reset the device while holding both TX and RX locks, so
that the "resetting" variable becomes unnecessary for the RX thread, and
can be protected by the TX lock (instead of being volatile).
Reviewed by: jhb, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20543
The NVMe CAM driver reports the PCIe Link Capability and Status for
devices. For emulated bhyve NVMe devices, this looks like:
nda0: nvme version 1.3 x63 (max x63) lanes PCIe Gen15 (max Gen15) link
The driver outputs this because the emulated device doesn't include the
PCIe Capability structure. The NVMe specification requires these
registers, so the fix is to add this set of capability registers to the
emulated device.
Note that PCI Express devices that are integrated into the Root Complex
(i.e. Bus 0x0) do not have to support the Link Capability or Status
registers. Windows will fail to start (i.e. Code 10) devices that appear
to be part of the Root Complex but report being a PCI Express Endpoint.
So also add a check to pci_emul_add_pciecap() to check if the device is
integrated and change the device type.
Reviewed by: imp, ken, araujo, jhb, rgrimes
Approved by: imp (mentor), ken (mentor), jhb (maintainer)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19904
This ensures that bhyve properly recognizes when decoding is disabled
for BARs on passthru devices. To properly handle writes to the
register, export a pci_emul_cmd_changed function from pci_emul.c that
the pass through device model invokes for config writes that change
PCIR_COMMAND.
Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20531
Rather than uncoditionally setting the MEMEN and PORTEN bits in
PCIR_COMMAND for PCI devices, set the respective bit when the first
BAR of a given type is added to the device. This more closely matches
what firmware does on bare metal.
BUSMASTEREN is still set unconditionally. Eventually this bit should
move into the device models as not all device models need this set.
Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20530
Coverity warned about gdb_write_mem sign extending the result of
parse_byte shifted left by 24 bits when generating a 32-bit memory
write value for MMIO. Simplify the code by using parse_integer
instead of unrolled parse_byte calls.
CID: 1401600
Reviewed by: cem
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20508
bhyve has to virtualize the MSI-X table to trap reads and writes to
that table and map those to virtual interrupts that it maps real host
interrupts on to. For the pending-bit-array (PBA), bhyve passes
accesses from the guest directly to the hardware.
bhyve's virtualization of the MSI-X table is done by intercepting all
reads and writes to the BAR holding the MSI-X table. However, if the
PBA is stored in the same BAR as the MSI-X table, accesses to the PBA
portion of this BAR have to be forwarded to the real BAR.
However, in the case that the PBA was stored in a separate BAR and
it's offset in that separate BAR overlapped with the portion of the
MSI-X table BAR that the table used, the handlers for the table BAR
would incorrectly think that some accesses were PBA reads and writes.
This caused a crash in bhyve when it indirected a NULL pointer. Fix
this case by never trying to handle PBA access if the PBA lives in a
separate BAR.
Reported by: gallatin
Tested by: gallatin
Reviewed by: markj, Patrick Mooney
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20523
I believe this was introduced in the original '-r' commit, r231911 (2012).
At the time, the scope was limited to a 1 second sleep. r332518 (2018)
added '-R', which increased the potential duration of the affected interval
(from 1 to N seconds) by permitting arbitrary restart intervals.
Instead, handle SIGTERM normally during restart-sleep, when the monitored
process is not running, and shut down promptly.
(I noticed this behavior when debugging a child process that exited quickly
under the 'daemon -r -R 30' environment. 'kill <daemonpid>' had no
immediate effect and the monitor process slept until the next restart
attempt. This was annoying.)
Reviewed by: allanjude, imp, markj
Differential Revision: https://reviews.freebsd.org/D20509
Without this patch, mountd would delete/load all exports from the exports
file(s) when it receives a SIGHUP. This works fine for small exports file(s),
but can take several seconds to do when there are large numbers (10000+) of
exported file systems. Most of this time is spent doing the system calls
that delete/export each of these file systems. When the "-S" option
has been specified (the default these days), the nfsd threads are suspended
for several seconds while the reload is done.
This patch changes mountd so that it only does system calls for file systems
where the exports have been changed/added/deleted as compared to the exports
done for the previous load/reload of the exports file(s).
Basically, when SIGHUP is posted to mountd, it saves the exportlist structures
from the previous load and creates a new set of structures from the current
exports file(s). Then it compares the current with the previous and only does
system calls for cases that have been changed/added/deleted.
The nfsd threads do not need to be suspended until the comparison step is
being done. This results in a suspension period of milliseconds for a server
with 10000+ exported file systems.
There is some code using a LOGDEBUG() macro that allow runtime debugging
output via syslog(LOG_DEBUG,...) that can be enabled by creating a file
called /var/log/mountd.debug. This code is expected to be replaced with
code that uses dtrace by cy@ in the near future, once issues w.r.t. dtrace
in stable/12 have been resolved.
The patch should not change the usage of the exports file(s), but improves
the performance of reloading large exports file(s) where there are only a
small number of changes done to the file(s).
Tested by: pen@lysator.liu.se
PR: 237860
Reviewed by: kib
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20487
install -> ${INSTALL}
mtree -> ${MTREE_CMD}
services_mkdb -> ${SERVICES_MKDB_CMD}
cap_mkdb -> ${CAP_MKDB_CMD}
pwd_mkdb -> ${PWD_MKDB_CMD}
kldxref -> ${KLDXREF_CMD}
If you do custom FreeBSD builds you may want to override those
in some cases.
Sponsored by: Sippy Software, Inc.
mountd.c uses a single linked list of "struct exportlist" structures,
where there is one of these for each exported file system on the NFS server.
This list gets long if there are a large number of file systems exported and
the list must be searched for each line in the exports file(s) when
SIGHUP causes the exports file(s) to be reloaded.
A simple benchmark that traverses SLIST() elements and compares two 32bit
fields in the structure for equal (which is what the search is)
appears to take a couple of nsec. So, for a server with 72000 exported file
systems, this can take about 5sec during reload of the exports file(s).
By replacing the single linked list with a hash table with a target of
10 elements per list, the time should be reduced to less than 1msec.
Peter Errikson (who has a server with 72000+ exported file systems) ran
a test program using 5 hashes to see how they worked.
fnv_32_buf(fsid,..., 0)
fnv_32_buf(fsid,..., FNV1_32_INIT)
hash32_buf(fsid,..., 0)
hash32_buf(fsid,..., HASHINIT)
- plus simply using the low order bits of fsid.val[0].
The first three behaved about equally well, with the first one being
slightly better than the others.
It has an average variation of about 4.5% about the target list length
and that is what this patch uses.
Peter Errikson also tested this hash table version and found that the
performance wasn't measurably improved by a larger hash table, so a
load factor of 10 appears adequate.
Tested by: pen@lysator.liu.se (with other patches)
PR: 237860
MFC after: 1 month
Probably due to historical reasons the driver uses In/Out arguments in
odd way. While this tool still never uses Out arguments to see that,
make the code to not trigger EINVAL in possible future uses.
MFC after: 2 weeks
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.
PR: 215202
Reviewed by: tijl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20415
MDT_MODULE info is required to be ordered before any other MDT metadata for
a given kld because it serves as an implicit record boundary between
distinct klds for linker.hints consumers. kldxref(8) has previously relied
on the assumption that MDT_MODULE was ordered relative to other module
metadata in kld objects by source code ordering.
However, C does not require implementations to emit file scope objects in
any particular order, and it seems that GCC 6.4.0 and/or binutils 2.32 ld
may reorder emitted objects with respect to source code ordering.
So: just take two passes over a given .ko's module metadata, scanning for
the MDT_MODULE on the first pass and the other metadata on subsequent
passes. It's not super expensive and not exactly a performance-critical
piece of code. This ensures MDT_MODULE is always ordered before
MDT_PNP_INFO and other MDTs, regardless of compiler/linker movement. As a
fringe benefit, it removes the requirement that care be taken to always
order MODULE_PNP_INFO after DRIVER_MODULE in source code.
Reviewed by: emaste, imp
Differential Revision: https://reviews.freebsd.org/D20405
This doesn't recognize any features yet, but does parse the features
string. It advertises an arbitrary packet size of 4k.
Reviewed by: markj, Scott Phillips <d.scott.phillips@intel.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20308
- Add a write_mem counterpart to read_mem to handle writes to MMIO.
- Add support for the GDB 'M' packet to write bytes to the guest's
memory. For MMIO writes, attempt to batch writes up into words.
This is imprecise, but if you write a single 2 or 4-byte aligned
word, it should be treated as a single MMIO write operation.
- While here, tidy up the parsing of the 'm' command used for reading
memory to match 'M'.
Reviewed by: markj, Scott Phillips <d.scott.phillips@intel.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20307
Use the .PATH mechanism instead so keep installing them from lib/libc/gen
While here revert 347961 and 347893 which are no longer needed
Discussed with: manu
Tested by: manu
ok manu@
Rather than the tedious and error-prone grep of sys/conf/newvers.sh,
use the new -v arg to dig out the data that's desired.
Differential Revision: https://reviews.freebsd.org/D19849
Some i2c controller hardware does not provide a way to do individual START,
REPEAT-START and STOP actions on the i2c bus. Instead, they can only do
a complete transfer as a single operation. Typically they can do either
START-data-STOP or START-data-REPEATSTART-data-STOP. In the i2c driver
framework, this corresponds to the iicbus_transfer method. In the userland
interface they are initiated with the I2CRDWR ioctl command.
These changes add a new 'tr' mode which can be specified with the '-m'
command line option. This mode should work on all hardware; when an i2c
controller driver doesn't directly support the iicbus_transfer method,
code in the i2c driver framework uses the lower-level START/REPEAT/STOP
methods to implement the transfer. After this new mode has gotten some
testing on various hardware, the 'tr' mode should probably become the
new default mode.
PR: 189914
It's invalid to reference a C++ string's c_str() buffer after the object
goes out of scope. Adjust the scope of the string to match the use in
write(2) to fix the misuse.
CID: 1393383
Reported by: Coverity
ed(4) and ep(4) have been removed. fxp(4) remains popular in older
systems, but isn't as future proof as em(4).
Reviewed by: bz, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20311
Under certain tight race conditions, we found that the lack of a memory
barrier in bhyve's virtio handling causes it to miss a NO_NOTIFY state
transition on block devices, resulting in guest stall. The investigation
is recorded in OS-7613. As part of the examination into bhyve's use of
barriers, one other section was found to be problematic, but only on
non-x86 ISAs with less strict memory ordering. That was addressed in
this patch as well, although it was not at all a problem on x86.
PR: 231117
Submitted by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: jhb, kib, rgrimes
Approved by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D19501
Check the legacy directory and use it instead if present.
Install these first if using beinstall.
UPDATING entry to follow.
Approved by: allanjude (mentor, in person)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20279