The VirtIO standard supports two schemes for notification suppression:
a notification enable bit and a more sophisticated one (event_idx) that
also supports delayed notifications. Currently bhyve fully supports
only the first scheme. This patch hides the notification suppression
internals by means of two inline routines, vq_kick_enable() and
vq_kick_disable(), and makes the code more readable.
Moreover, further improve readability by replacing the call to mb()
with a call to atomic_thread_fence_seq_cst(), which is already used
in virtio.c
Reviewed by: pmooney_pfmooney.com, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20581
On vtnet device reset it is necessary to wait for threads to stop TX and
RX processing. However, the rx_in_progress variable (used for to wait for
RX processing to stop) is actually useless, and can be removed. Acquiring
and releasing the RX lock is enough to synchronize correctly. Moreover,
it is possible to reset the device while holding both TX and RX locks, so
that the "resetting" variable becomes unnecessary for the RX thread, and
can be protected by the TX lock (instead of being volatile).
Reviewed by: jhb, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20543
The NVMe CAM driver reports the PCIe Link Capability and Status for
devices. For emulated bhyve NVMe devices, this looks like:
nda0: nvme version 1.3 x63 (max x63) lanes PCIe Gen15 (max Gen15) link
The driver outputs this because the emulated device doesn't include the
PCIe Capability structure. The NVMe specification requires these
registers, so the fix is to add this set of capability registers to the
emulated device.
Note that PCI Express devices that are integrated into the Root Complex
(i.e. Bus 0x0) do not have to support the Link Capability or Status
registers. Windows will fail to start (i.e. Code 10) devices that appear
to be part of the Root Complex but report being a PCI Express Endpoint.
So also add a check to pci_emul_add_pciecap() to check if the device is
integrated and change the device type.
Reviewed by: imp, ken, araujo, jhb, rgrimes
Approved by: imp (mentor), ken (mentor), jhb (maintainer)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19904
This ensures that bhyve properly recognizes when decoding is disabled
for BARs on passthru devices. To properly handle writes to the
register, export a pci_emul_cmd_changed function from pci_emul.c that
the pass through device model invokes for config writes that change
PCIR_COMMAND.
Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20531
Rather than uncoditionally setting the MEMEN and PORTEN bits in
PCIR_COMMAND for PCI devices, set the respective bit when the first
BAR of a given type is added to the device. This more closely matches
what firmware does on bare metal.
BUSMASTEREN is still set unconditionally. Eventually this bit should
move into the device models as not all device models need this set.
Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20530
Coverity warned about gdb_write_mem sign extending the result of
parse_byte shifted left by 24 bits when generating a 32-bit memory
write value for MMIO. Simplify the code by using parse_integer
instead of unrolled parse_byte calls.
CID: 1401600
Reviewed by: cem
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20508
bhyve has to virtualize the MSI-X table to trap reads and writes to
that table and map those to virtual interrupts that it maps real host
interrupts on to. For the pending-bit-array (PBA), bhyve passes
accesses from the guest directly to the hardware.
bhyve's virtualization of the MSI-X table is done by intercepting all
reads and writes to the BAR holding the MSI-X table. However, if the
PBA is stored in the same BAR as the MSI-X table, accesses to the PBA
portion of this BAR have to be forwarded to the real BAR.
However, in the case that the PBA was stored in a separate BAR and
it's offset in that separate BAR overlapped with the portion of the
MSI-X table BAR that the table used, the handlers for the table BAR
would incorrectly think that some accesses were PBA reads and writes.
This caused a crash in bhyve when it indirected a NULL pointer. Fix
this case by never trying to handle PBA access if the PBA lives in a
separate BAR.
Reported by: gallatin
Tested by: gallatin
Reviewed by: markj, Patrick Mooney
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20523
I believe this was introduced in the original '-r' commit, r231911 (2012).
At the time, the scope was limited to a 1 second sleep. r332518 (2018)
added '-R', which increased the potential duration of the affected interval
(from 1 to N seconds) by permitting arbitrary restart intervals.
Instead, handle SIGTERM normally during restart-sleep, when the monitored
process is not running, and shut down promptly.
(I noticed this behavior when debugging a child process that exited quickly
under the 'daemon -r -R 30' environment. 'kill <daemonpid>' had no
immediate effect and the monitor process slept until the next restart
attempt. This was annoying.)
Reviewed by: allanjude, imp, markj
Differential Revision: https://reviews.freebsd.org/D20509
Without this patch, mountd would delete/load all exports from the exports
file(s) when it receives a SIGHUP. This works fine for small exports file(s),
but can take several seconds to do when there are large numbers (10000+) of
exported file systems. Most of this time is spent doing the system calls
that delete/export each of these file systems. When the "-S" option
has been specified (the default these days), the nfsd threads are suspended
for several seconds while the reload is done.
This patch changes mountd so that it only does system calls for file systems
where the exports have been changed/added/deleted as compared to the exports
done for the previous load/reload of the exports file(s).
Basically, when SIGHUP is posted to mountd, it saves the exportlist structures
from the previous load and creates a new set of structures from the current
exports file(s). Then it compares the current with the previous and only does
system calls for cases that have been changed/added/deleted.
The nfsd threads do not need to be suspended until the comparison step is
being done. This results in a suspension period of milliseconds for a server
with 10000+ exported file systems.
There is some code using a LOGDEBUG() macro that allow runtime debugging
output via syslog(LOG_DEBUG,...) that can be enabled by creating a file
called /var/log/mountd.debug. This code is expected to be replaced with
code that uses dtrace by cy@ in the near future, once issues w.r.t. dtrace
in stable/12 have been resolved.
The patch should not change the usage of the exports file(s), but improves
the performance of reloading large exports file(s) where there are only a
small number of changes done to the file(s).
Tested by: pen@lysator.liu.se
PR: 237860
Reviewed by: kib
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20487
install -> ${INSTALL}
mtree -> ${MTREE_CMD}
services_mkdb -> ${SERVICES_MKDB_CMD}
cap_mkdb -> ${CAP_MKDB_CMD}
pwd_mkdb -> ${PWD_MKDB_CMD}
kldxref -> ${KLDXREF_CMD}
If you do custom FreeBSD builds you may want to override those
in some cases.
Sponsored by: Sippy Software, Inc.
mountd.c uses a single linked list of "struct exportlist" structures,
where there is one of these for each exported file system on the NFS server.
This list gets long if there are a large number of file systems exported and
the list must be searched for each line in the exports file(s) when
SIGHUP causes the exports file(s) to be reloaded.
A simple benchmark that traverses SLIST() elements and compares two 32bit
fields in the structure for equal (which is what the search is)
appears to take a couple of nsec. So, for a server with 72000 exported file
systems, this can take about 5sec during reload of the exports file(s).
By replacing the single linked list with a hash table with a target of
10 elements per list, the time should be reduced to less than 1msec.
Peter Errikson (who has a server with 72000+ exported file systems) ran
a test program using 5 hashes to see how they worked.
fnv_32_buf(fsid,..., 0)
fnv_32_buf(fsid,..., FNV1_32_INIT)
hash32_buf(fsid,..., 0)
hash32_buf(fsid,..., HASHINIT)
- plus simply using the low order bits of fsid.val[0].
The first three behaved about equally well, with the first one being
slightly better than the others.
It has an average variation of about 4.5% about the target list length
and that is what this patch uses.
Peter Errikson also tested this hash table version and found that the
performance wasn't measurably improved by a larger hash table, so a
load factor of 10 appears adequate.
Tested by: pen@lysator.liu.se (with other patches)
PR: 237860
MFC after: 1 month
Probably due to historical reasons the driver uses In/Out arguments in
odd way. While this tool still never uses Out arguments to see that,
make the code to not trigger EINVAL in possible future uses.
MFC after: 2 weeks
struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed.
PR: 215202
Reviewed by: tijl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20415
MDT_MODULE info is required to be ordered before any other MDT metadata for
a given kld because it serves as an implicit record boundary between
distinct klds for linker.hints consumers. kldxref(8) has previously relied
on the assumption that MDT_MODULE was ordered relative to other module
metadata in kld objects by source code ordering.
However, C does not require implementations to emit file scope objects in
any particular order, and it seems that GCC 6.4.0 and/or binutils 2.32 ld
may reorder emitted objects with respect to source code ordering.
So: just take two passes over a given .ko's module metadata, scanning for
the MDT_MODULE on the first pass and the other metadata on subsequent
passes. It's not super expensive and not exactly a performance-critical
piece of code. This ensures MDT_MODULE is always ordered before
MDT_PNP_INFO and other MDTs, regardless of compiler/linker movement. As a
fringe benefit, it removes the requirement that care be taken to always
order MODULE_PNP_INFO after DRIVER_MODULE in source code.
Reviewed by: emaste, imp
Differential Revision: https://reviews.freebsd.org/D20405
This doesn't recognize any features yet, but does parse the features
string. It advertises an arbitrary packet size of 4k.
Reviewed by: markj, Scott Phillips <d.scott.phillips@intel.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20308
- Add a write_mem counterpart to read_mem to handle writes to MMIO.
- Add support for the GDB 'M' packet to write bytes to the guest's
memory. For MMIO writes, attempt to batch writes up into words.
This is imprecise, but if you write a single 2 or 4-byte aligned
word, it should be treated as a single MMIO write operation.
- While here, tidy up the parsing of the 'm' command used for reading
memory to match 'M'.
Reviewed by: markj, Scott Phillips <d.scott.phillips@intel.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20307
Use the .PATH mechanism instead so keep installing them from lib/libc/gen
While here revert 347961 and 347893 which are no longer needed
Discussed with: manu
Tested by: manu
ok manu@
Rather than the tedious and error-prone grep of sys/conf/newvers.sh,
use the new -v arg to dig out the data that's desired.
Differential Revision: https://reviews.freebsd.org/D19849
Some i2c controller hardware does not provide a way to do individual START,
REPEAT-START and STOP actions on the i2c bus. Instead, they can only do
a complete transfer as a single operation. Typically they can do either
START-data-STOP or START-data-REPEATSTART-data-STOP. In the i2c driver
framework, this corresponds to the iicbus_transfer method. In the userland
interface they are initiated with the I2CRDWR ioctl command.
These changes add a new 'tr' mode which can be specified with the '-m'
command line option. This mode should work on all hardware; when an i2c
controller driver doesn't directly support the iicbus_transfer method,
code in the i2c driver framework uses the lower-level START/REPEAT/STOP
methods to implement the transfer. After this new mode has gotten some
testing on various hardware, the 'tr' mode should probably become the
new default mode.
PR: 189914
It's invalid to reference a C++ string's c_str() buffer after the object
goes out of scope. Adjust the scope of the string to match the use in
write(2) to fix the misuse.
CID: 1393383
Reported by: Coverity
ed(4) and ep(4) have been removed. fxp(4) remains popular in older
systems, but isn't as future proof as em(4).
Reviewed by: bz, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20311
Under certain tight race conditions, we found that the lack of a memory
barrier in bhyve's virtio handling causes it to miss a NO_NOTIFY state
transition on block devices, resulting in guest stall. The investigation
is recorded in OS-7613. As part of the examination into bhyve's use of
barriers, one other section was found to be problematic, but only on
non-x86 ISAs with less strict memory ordering. That was addressed in
this patch as well, although it was not at all a problem on x86.
PR: 231117
Submitted by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: jhb, kib, rgrimes
Approved by: jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D19501
Check the legacy directory and use it instead if present.
Install these first if using beinstall.
UPDATING entry to follow.
Approved by: allanjude (mentor, in person)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20279
leap-seconds file from NIST at ftp://ftp.nist.gov/pub/time.
Future updates should use the NIST version of file, available
at ftp://ftp.nist.gov/pub/time/leap-seconds.list .
Requested by: ian@
Obtained from: ftp://ftp.nist.gov/pub/time/leap-seconds.3676924800
MFC after: 3 days
In mountd.c, the grouplist structures are linked into a single global
linked list headed by "grphead". The only use of this linked list is
to free all list elements when the exportlist elements are also all being
free'd at the time the exports are being reloaded.
This patch replaces this one global linked list head with a list head in
each exportlist structure, where the grouplist elements for that exported
file system are linked.
The only change is that now the grouplist elements are free'd with the
associated exportlist element as they are free'd instead of all grouplist
elements being free'd after the exportlist elements are free'd. This
change should have no effect in practice.
This is being done, since a future patch that will add a "-I" option for
incrementally updating the exports in the kernel needs to know which
grouplist elements are associated with each exported file system and
having them linked into a list headed by the exportlist element does that.
MFC after: 1 month
Factor code into two functions.
read_exportfile() a functon which reads the exports file(s) and calls
get_exportlist_one() to process each of them.
delete_export() a function which deletes the exports in the kernel for a file
system.
The contents of these functions is just the same code as was used to do the
operations, moved into separate functions. As such, there is no semantic change.
This is being done in preparation for a future commit that will add an
option to do incremental changes of kernel exports upon receiving SIGHUP.
MFC after: 1 month
As per https://datacenter.iers.org/data/latestVersion/16_BULLETIN_C16.txt:
INTERNATIONAL EARTH ROTATION AND REFERENCE SYSTEMS SERVICE (IERS)
SERVICE INTERNATIONAL DE LA ROTATION TERRESTRE ET DES SYSTEMES DE REFERENCE
SERVICE DE LA ROTATION TERRESTRE DE L'IERS
OBSERVATOIRE DE PARIS
61, Av. de l'Observatoire 75014 PARIS (France)
Tel. : +33 1 40 51 23 35
e-mail : services.iers@obspm.frhttp://hpiers.obspm.fr/eop-pc
Paris, 07 January 2019
Bulletin C 57
To authorities responsible
for the measurement and
distribution of time
INFORMATION ON UTC - TAI
NO leap second will be introduced at the end of June 2019.
The difference between Coordinated Universal Time UTC and the
International Atomic Time TAI is :
from 2017 January 1, 0h UTC, until further notice : UTC-TAI = -37 s
Leap seconds can be introduced in UTC at the end of the months of December
or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every
six months, either to announce a time step in UTC, or to confirm that there
will be no time step at the next possible date.
Christian BIZOUARD
Director
Earth Orientation Center of IERS
Observatoire de Paris, France
Requested by: rgrimes
Obtained from: ftp://tycho.usno.navy.mil/pub/ntp/leap-seconds.3757622400
MFC after: 3 days
This patch moves the code that removes and frees all exportlist elements
out into a separate function called free_exports().
It does the same for the insertion of a new exportlist entry into a list.
It also adds a second argument to ex_search() for the list to use.
None of these changes have any semantic effect. They are being done to
prepare the code for future patches that convert the single linked list
for the exportlist to a hash table of lists and a patch that will do
incremental changes of exports in the kernel.
And it fixes the argument for SLIST_HEAD_INITIALIZER() to be a pointer,
which doesn't really matter, since SLIST_HEAD_INITIALIZER() doesn't use
the argument.
MFC after: 1 month
- Remove Tn macros
- Refernce sysctl(8) instead of sysctl(1)
- Start new sentences on new lines
- Capitalize NFS where needed
- Use Fx for FreeBSD
- Remove a list block (Bl) that was added to the manual page
by accident in r335174
Reviewed by: bcr
Approved by: doc (bcr)
Differential Revision: https://reviews.freebsd.org/D20215
Remove unnecessary `char*` casting for arguments passed to `cget*(3)`, and
deconst `_PATH_PRINTCAP` before passing it to `cget*` via the `printcapdb`
variable.
This unblocks ^/projects/runtime-coverage-v2 from building cleanly on
universe13a.freebsd.org. I suspect the issue was introduced through some
changes to `bsd.*.mk` inclusion on the branch, which I will continue to
investigate/isolate.
MFC after: 1 week
Tested with: clang 8 (arm64)
The Windows virtio driver ignores the advertized seg_max field and
assumes the host can accept up to 67 segments in indirect descriptors,
triggering an assert in the bhyve process.
This brings back r282922 but with a couple of changes:
- It raises the block interface segment limit to 128 instead of 67.
- Linux's virtio driver assumes that the segment limit is no
larger than the ring size. To avoid breaking Linux guests,
raise the VirtIO ring size to 128, and cap the VirtIO segment
limit at ring size - 2 (effectively 126).
Reviewed by: rgrimes, Patrick Mooney <pmooney@pfmooney.com>
Obtained from: Joyent (Linux workaround)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18831