- Drop mention of _LP64. FreeBSD's source generally uses __LP64__
instead of _LP64, and the relevant macros are better covered in the
"Predefined Macros" section.
- Fix a noun/verb disagreement.
Reviewed by: emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22975
- Add missing .Pp after the end of some lists so that there is a blank
line before the subsequent paragraph.
- Use a more typical '-tag' bullet list of the make variable descriptions
at the end. This adds separation between bullets and is the formatting
typically used in manpages for this sort of list.
r355588 Fix WITHOUT_CLANG build
r355646 Revert r354348
r355943 add LDNS build knob dependency on OPENSSL
r356111 Use LLVM as default toolchain for all PowerPC targets
To be used when like rmlocks, except when sleeping for readers needs to be
allowed. See the manpage for more information.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D22823
This enables LLVM as the default compiler for powerpc, powerpc64, and
powerpcspe, as well as LLD as the default linker for powerpc64.
LLD is not yet ready for prime time for powerpc and powerpcspe, but work is
continuing on it.
Submitted by: alfredo.junior_eldorado.org.br
Relnotes: YES
Differential Revision: https://reviews.freebsd.org/D20378
srandom(9) is meaningless on SMP systems or any system with, say,
interrupts. One could never rely on random(9) to produce a reproducible
sequence of outputs on the basis of a specific srandom() seed because the
global state was shared by all kernel contexts. As such, removing it is
literally indistinguishable to random(9) consumers (as compared with
retaining it).
Mark random(9) as deprecated and slated for quick removal. This is not to
say we intend to remove all fast, non-cryptographic PRNG(s) in the kernel.
It/they just won't be random(9), as it exists today, in either name or
implementation.
Before random(9) is removed, a replacement will be provided and in-tree
consumers will be converted.
Note that despite the name, the random(9) interface does not bear any
resemblance to random(3). Instead, it is the same crummy 1988 Park-Miller
LCG used in libc rand(3).
_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers. Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST(). Plumb const through all of the
underlying sleepqueue bits.
No functional change.
Reviewed by: rlibby
Discussed with: kib, markj
Differential Revision: https://reviews.freebsd.org/D22914
off the stack, initialized to default values, and then filled in with
driver-specific values, all without having to worry about the numerous
other fields in the tag. The resulting template is then passed into
busdma and the normal opaque tag object created. See the man page for
details on how to initialize a template.
Templates do not support tag filters. Filters have been broken for many
years, and only existed for an ancient make/model of hardware that had a
quirky DMA engine. Instead of breaking the ABI/API and changing the
arugment signature of bus_dma_tag_create() to remove the filter arguments,
templates allow us to ignore them, and also significantly reduce the
complexity of creating and managing tags.
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D22906
- Don't allow an unprivileged user to set the stride. [1]
- Only set the stride under the softc lock.
- Rename the internal fields to accurately reflect their use. Keep
ro_bkt to avoid changing the user API.
- Simplify the implementation. The port index is just sc_seq / stride.
- Document rr_limit in ifconfig.8.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> [1]
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22857
There are probably bits that are still wrong, but this fixes some
things at least:
- Add named arguments to the functions in crypto(9).
- Add missing algorithms.
- Don't mention arguments that don't exist in crypto_register.
- Add CIOGSESSION2.
- Remove CIOCNFSESSION.
- Clarify some stale language that assumed an fd had only one sesson.
- Note that you have to use CRIOGET and add a note in BUGS lamenting
that one has to use CRIOGET.
- Various other cleanups.
Reviewed by: cem (earlier version)
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D22784
than "/compat/linux". Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.
Reviewed by: bcr (manpages)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22574
Document the common practices around copyrights with "all rights reserved" in
them as new copyright notices get added.
It's an open question qhether to point people at the fact that since the Berne
convention was ratified, All rights reserved is largely obsolete.
https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence has the
details. The committer's guide will be revised shortly, and it's likely that's a
better place for this discussion. If not, I'll add a blurb here.
Reviewed by: jhb@, brooks@
Differential Review: https://reviews.freebsd.org/D22800
Contractions cause problems for translators, so s/aren't/are not/ in the one
place this slipped through.
While here, noticed I commited with the date I did the work, not today's
date. Fix that too.
Noticed by: bjk@
Delay the attachment of children, when requested, until after interrutps are
running. This is often needed to allow children to run transactions on i2c or
spi busses. It's a common enough idiom that it will be useful to have its own
wrapper.
Reviewed by: ian
Differential Revision: https://reviews.freebsd.org/D21465
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too. The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.
Don't supply a NAND (not (A and B)) operation at this time.
Discussed with: jeff
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22791
r355436 moved mitigation sysctls to machdep.mitigations but did not
rationalize the sense of the invidual knobs. Clarify that the old
names remain the canonical way to set these mitigations.
Backwards compatibility will be maintained for the original names
(e.g. hw.ibrs_disable), but not from the interim names
(e.g. machdep.mitigations.ibrs.disable).
Sponsored by: The FreeBSD Foundation
This typedef is the same as timeout_t except that it is in the callout
namespace and header.
Use this typedef in various places of the callout implementation that
were either using the raw type or timeout_t.
While here, add <sys/callout.h> to the manpage.
Reviewed by: kib, imp
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D22751
The datasheets for these chips claim the maximum is 921,600, but testing
shows these two higher rates also work (but no rates above 921,600 other
than these two work; these represent dividing the base buad clock by 3 and 2
respectively).
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.
v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.
Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
All Rights Reserved on same line as other copyright holders (but not
me). Other such holders are also listed last where it's clear.
This makes it possible to retrieve per-connection statistical
information such as the receive window size, RTT, or goodput,
using a newly added TCP_STATS getsockopt(3) option, and extract
them using the stats_voistat_fetch(3) API.
See the net/tcprtt port for an example consumer of this API.
Compared to the existing TCP_INFO system, the main differences
are that this mechanism is easy to extend without breaking ABI,
and provides statistical information instead of raw "snapshots"
of values at a given point in time. stats(3) is more generic
and can be used in both userland and the kernel.
Reviewed by: thj
Tested by: thj
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20655
FDT bindings document for gpio-i2c devices.
Using the gpio_pin_* functions to acquire/release/manipulate gpio pins
removes the constraint that both gpio pins must belong to the same gpio
controller/bank, and that the gpioiic instance must be a child of gpiobus.
Removing those constraints allows the driver to be fully compatible with
the modern dts bindings for a gpio bitbanged i2c bus.
For hinted attachment, the two gpio pins still must be on the same gpiobus,
and the device instance must be a child of that bus. This preserves
compatibility for existing installations that have use gpioiic(4) with hints.
It was introduced by r290122, but no documentation was provided.
This is taken from https://reviews.freebsd.org/D21798, since it
is not related to the feature added there.
Submitted by: Richard Scheffenegger
MFC after: 1 week
Generally, it's preferred that an application fork/setsid if it doesn't want
to keep its controlling TTY, but it could be that a debugger is trying to
steal it instead -- so it would hook in, drop the controlling TTY, then do
some magic to set things up again. In this case, TIOCNOTTY is quite handy
and still respected by at least OpenBSD, NetBSD, and Linux as far as I can
tell.
I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as
long as it doesn't impose a major burden.
Reviewed by: bcr (manpages), kib
Differential Revision: https://reviews.freebsd.org/D22572
In r353734 the use of the page caches was limited to systems with a
relatively large amount of RAM per CPU. This was to mitigate some
issues reported with the system not able to keep up with memory pressure
in cases where it had been able to do so prior to the addition of the
direct free pool cache. This change re-enables those caches.
The change modifies uma_zone_set_maxcache(), which was introduced
specifically for the page cache zones. Rather than using it to limit
only the full bucket cache, have it also set uz_count_max to provide an
upper bound on the per-CPU cache size that is consistent with the number
of items requested. Remove its return value since it has no use.
Enable the page cache zones unconditionally, and limit them to 0.1% of
the domain's pages. The limit can be overridden by the
vm.pgcache_zone_max tunable as before.
Change the item size parameter passed to uma_zcache_create() to the
correct size, and stop setting UMA_ZONE_MAXBUCKET. This allows the page
cache buckets to be adaptively sized, like the rest of UMA's caches.
This also causes the initial bucket size to be small, so only systems
which benefit from large caches will get them.
Reviewed by: gallatin, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22393
Add bit_ffs_area_at and bit_ffc_area_at functions for searching a bit
string for a sequence of contiguous set or unset bits of at least the
specified size.
The bit_ffc_area function will be used by the Intel ice driver for
implementing resource assignment logic using a bitstring to represent
whether or not a given index has been assigned or is currently free.
The bit_ffs_area, bit_ffc_area_at and bit_ffs_area_at functions are
implemented for completeness.
I'd like to add further test cases for the new functions, but I'm not
really sure how to add them easily. The new functions depend on specific
sequences of bits being set, while the bitstring tests appear to run for
varying bit sizes.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: asomers@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D22400
Each boot, regenerate /var/run/os-release based on the currently running
system. Create a /etc/os-release symlink pointing to this file (so that this
doesn't create a new reason /etc can not be mounted read-only).
This is compatible with what other systems do and is what the sysutil/os-release
port attempted to do, but in an incomplete way. Linux, Solaris and DragonFly all
implement this natively as well. The complete standard can be found at
https://www.freedesktop.org/software/systemd/man/os-release.html
Moving this to the base solves both the non-standard location problem with the
port, as well as the lack of update of this file on system update.
Bump __FreeBSD_version to 1300060
PR: 238953
Differential Revision: https://reviews.freebsd.org/D22271
Mount the UEFI ESP on /boot/efi. No current system uses this by default, but
there are many ad-hoc schemes that do this in /efi or /esp or /uefi and adding a
new directory at the top-level would have a much higher likelihood of
collision. Document this in /etc/mtree/BSD.root.mtree and create EFIDIR and
related variables in bsd.own.mk.
Differential Revision: https://reviews.freebsd.org/D21344
r354289 armv6: Switch to LLD by default
r354290 Take arm.arm (armv5) out of universe
r354348 armv6, armv7: Switch to llvm-libunwind by default
r354660 Enable the RISC-V LLVM backend by default.
as well as lib32 changes
Sponsored by: The FreeBSD Foundation
This driver was largely rewritten in 2015 (svn r235911) but the man page was
never updated to match.
Reviewed by: trasz
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D22339
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi. The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.
Error handling in this driver has not been extensively tested
yet.
Submitted by: vbhakta@vmware.com
Relnotes: yes
Sponsored by: VMware, Panzura
Differential Revision: D18613
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs. For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs. The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.
Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.
Reviewed by: alc, emaste, markj, scottl
Security: CVE-2018-12207
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21884
Previously ntb_transport(4) required at least 6 scratchpad registers,
plus 2 more for each additional memory window. That is too much for some
configurations, where several drivers have to share resources of the same
NTB hardware. This patch introduces new compact version of the protocol,
requiring only 3 scratchpad registers, plus one more for each additional
memory window. The optimization is based on fact that neither of version,
number of windows or number of queue pairs really need more then one byte
each, and window sizes of 4GB are not very useful now. The new protocol
is activated automatically when the configuration is low on scratchpad
registers, or it can be activated explicitly with loader tunable.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Address Lookup Table (A-LUT) being enabled allows to specify separate
translation for each 1/128th or 1/256th of the BAR2. Previously it was
used only to limit effective window size by blocking access through some
of A-LUT elements. This change allows A-LUT elements to also point
different memory locations, providing to upper layers several (up to 128)
independent memory windows. A-LUT hardware allows even more flexible
configurations than this, but NTB KPI have no way to manage that now.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
The man page incorrectly described the use of the"len" argument, which
is updated to the number of bytes copied and not reduced by the number
of bytes copied.
This is a content change.
Thanks to bapt, bz, cem, woodsb02, Neel Chauhan and Salvador Martínez
Mármol for helping test the initial 9000-series support.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
GCC 4.2.1 is being removed before FreeBSD 13, as are some other
components required by FreeBSD/sparc64. Contemporary GCC does not build
and there is currently no indication that anyone is going to address
these issues.
PR: 228919, 233405, 236839, 239851
Postings were sent to -arch@ on 2019/09/13 and 2019/10/01, proposing and
confirming a removal of these scripts on 2019/10/31, due to significant work
needed to bring this into the modern world and nobody having done this work
in the past couple of years. No objections or proposed work was raised in
response to these postings. The tinyware may see a resurrection into a
separate repo for archival purposes if any users of it show interest in
doing so.
MFC after: never
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap
Reviewed by: hrs, bcr, lwhsu, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22146
The source part of the review will be addressed in a different way.
Reviewed by: emaste, brooks
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21492
Summary:
Historically, we have built toolchain components such as cc, ld, etc as
statically linked executables. One of the reasons being that you could
sometimes save yourself from botched upgrades, by e.g. recompiling a
"known good" libc and reinstalling it.
In this day and age, we have boot environments, virtual machine
snapshots, cloud backups, and other much more reliable methods to
restore systems to working order. So I think the time is ripe to flip
this default, and link the toolchain components dynamically, just like
almost all other executables on FreeBSD.
Maybe at some point they can even become PIE executables by default! :)
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22061
Notices appear both in picobsd(8) (near the top for easy notice) and are
also printed to stderr on every invocation of picobsd for visibility.
The tentative date for removal is October 31st, as no volunteers have
stepped forward at all from postings to -arch@ at least.
No objection from: -arch@
MFC after: 3 days
The change is for the example in textdump.4 and the default ddb.conf.
First of all, doadump now requires an argument and it won't do a
textdump if the argument is not 'true'.
And 'textdump dump' is more idiomatic anyway.
For what it's worth, ddb 'dump' command seems to always request a vmcore
dump even if a textdump was requested earlier, e.g., by 'textdump set'.
Finally, ddb 'call' command is not documented.
MFC after: 2 weeks
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.
There are three pieces in the complete NetGDB system.
First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.
Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).
Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.
The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU). There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with: emaste, markj
Relnotes: sure
Differential Revision: https://reviews.freebsd.org/D21568
Loosen requirements for connecting to debugnet-type servers. Only require a
destination address; the rest can theoretically be inferred from the routing
table.
Relax corresponding constraints in netdump(4) and move ifp validation to
debugnet connection time.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21482
Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine
to the generic debugnet code. The imagined use is both netdump, shown here,
and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions,
to parse out IPv4 addresses.
'Netdump' uses the generic debugnet routine to load a configuration and
start a dump, without any netdump configuration prior to panic.
Loosely derived from work by: John Reimer <john.reimer AT emc.com>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21460
This allows to remove a bunch of low level code.
Also, superio(4) provides safer interaction with other drivers
that work with Super I/O configuration registers.
Tested only on PCengines APU2:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
wbwd0: <Nuvoton NCT6102 (0xc4/0x53) Watchdog Timer> at WDT ldn 0x08 on superio0
The watchdog output is incorrectly wired on that system and the watchdog
does not really do it its job, but the pulse can be seen with a signal
analyzer.
Reviewed by: delphij, bcr (man page)
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21979
This adds basic documentation on what the superio driver is and how
other drivers can interact with it. I decided to also document
superio's ivar accessors.
Reviewed by: bcr, brueffer (both manual contents only)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21958
Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.
Add total zone size ("Memory Use") as a new column for UMA.
For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.
Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by: Dell EMC Isilon
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was
inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this
is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
Discussed with: kib
MFC after: 4 weeks
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
The new sysctl was not added to the siftr.4 man page at the time.
This updates the man page, and removes one left over trailing whitespace.
Submitted by: Richard Scheffenegger
Reviewed by: bcr@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21619
This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.
See the stats(3) manual page for details.
This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.
The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.
Reviewed by: sef (earlier version)
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20477
This change allows to specify a watchdog(9) timeout for a system
shutdown. The timeout is activated when the watchdogd daemon is
stopped. The idea is to a prevent any indefinite hang during late
stages of the shutdown. The feature is implemented in rc.d/watchdogd,
it builds upon watchdogd -x option.
Note that the shutdown timeout is not actiavted when the watchdogd
service is individually stopped by an operator. It is also not
activated for the 'shutdown' to the single-user mode. In those cases it
is assumed that the operator knows what they are doing and they have
means to recover the system should it hang.
Significant subchanges and implementation details:
- the argument to rc.shutdown, completely unused before, is assigned to
rc_shutdown variable that can be inspected by rc scripts
- init(8) passes "single" or "reboot" as the argument, this is not
changed
- the argument is not mandatory and if it is not set then rc_shutdown is
set to "unspecified"
- however, the default jail management scripts and jail configuration
examples have been updated to pass "jail" to rc.shutdown, just in case
- the new timeout can be set via watchdogd_shutdown_timeout rc option
- for consistency, the regular timeout can now be set via
watchdogd_timeout rc option
- watchdogd_shutdown_timeout and watchdogd_timeout override timeout
specifications in watchdogd_flags
- existing configurations, where the new rc options are not set, should
keep working as before
I am not particularly wed to any of the implementation specifics.
I am open to changing or removing any of them as long as the provided
functionality is the same (or very close) to the proposed one.
For example, I think it can be implemented without using watchdogd -x,
by means of watchdog(1) alone. In that case there would be a small
window between stopping watchdogd and running watchdog, but I think that
that is acceptable.
Reviewed by: bcr (man page changes)
MFC after: 5 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D21221
can handle. Instead using an array on node private data, use per-hook
private data.
- Use NG_NODE_FOREACH_HOOK() to traverse through hooks instead of array.
PR: 240787
Submitted by: Lutz Donnerhacke <lutz donnerhacke.de>
Differential Revision: https://reviews.freebsd.org/D21803
in mlx5core. The EEPROM information is not only a property of the
mlx5en(4) driver.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
and reinserting it back with an updated key.
This is one of dependencies for the upcoming stats(3) code.
Reviewed by: cem
Obtained from: Netflix
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D21786
Summary: When powerpc64 switches to LLVM, use this patch to enable
OpenMP as well. OpenMP on PPC is only for 64-bits, so don't make a
32-bit libomp. A change to openmp files is necesssary (under review on
https://reviews.llvm.org/D67190), because it determines ELF format
version based on endianness, which is incorrect.
Reviewed by: alfredo.junior_eldorado.org.br, #manpages
Differential Revision: https://reviews.freebsd.org/D21532
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
to the traditional tree(3) RB trees, but using an array (preallocated,
linear chunk of memory) to store the tree.
This avoids allocation overhead, improves memory locality,
and makes it trivially easy to share/transfer/copy the entire tree
without the need for marshalling. The downside is that the size
is fixed at initialization time; there is no mechanism to resize
it.
This is one of the dependencies for the new stats(3) framework
(https://reviews.freebsd.org/D20477).
Reviewed by: bcr (man pages), markj
Discussed with: cem
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20324
These commands show the route resolved for a specified destination, or
print out the entire routing table for a given address family (or all
families, if none is explicitly provided).
Discussed with: emaste
Differential Revision: https://reviews.freebsd.org/D21510
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
If ipv4_prefer is specified, Section 10.3 is relevant.
If ipv6_prefer is specified, Section 2.1 is relevant.
This change makes the corresponding options/sections 'respective'
PR: docs/234249
Submitted by: David Fiander <david@fiander.info>
The EXAMPLES section does not contain any examples of output formats for
the old-style scripts. Remove the misleading bits stating otherwise.
Reviewed by: bcr, imp
Approved by: src (imp)
Differential Revision: https://reviews.freebsd.org/D21552
A user may set ${name}_env variable in rc.conf(5) in order to set additional
environment variables for a service command. Unfortunately, at the moment
this variable is only honored when the command is specified via the command
variable. Those additional environment variables coming from ${name}_env
are never set if the service is started via the ${rc_arg}_cmd variable (for
example start_cmd).
PR: 239692
Reviewed by: bcr, jilles
Approved by: src (jilles)
Differential Revision: https://reviews.freebsd.org/D21228
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure. This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.
With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets. Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only
items in excess of the estimate are reclaimed. Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.
Now, when under memory pressure, the page daemon will trim zones
rather than draining them. As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.
Reviewed by: jeff
Tested by: pho (an earlier version)
MFC after: 2 months
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16667
- Fix warnings from igor and mandoc.
- Provide a brief description of the separation between zones and their
backend slab allocators.
- Document cache zones and secondary zones.
- Document the kernel config options added in r350659.
- Document the uma_zalloc_pcpu() and uma_zfree_pcpu() wrappers.
- Document uma_zone_reserve(), uma_zone_reserve_kva() and
uma_zone_prealloc().
- Document uma_zone_alloc() and uma_zone_freef().
- Add some missing MLINKs and Xrefs.
MFC after: 2 weeks
This makes it possible to perform mathematical operations on
fractional values without using floating point. It operates on Q
numbers, which are integer-sized, opaque structures initialized
to hold a chosen number of integer and fractional bits.
For a general description of the Q number system, see the "Fixed Point
Representation & Fractional Math" whitepaper[1]; for the actual
API see the qmath(3) man page.
This is one of dependencies for the upcoming stats(3) framework[2]
that will be applied to the TCP stack in a later commit.
1. https://www.superkits.net/whitepapers/Fixed%20Point%20Representation%20&%20Fractional%20Math.pdf
2. https://reviews.freebsd.org/D20477
Reviewed by: bcr (man pages, earlier version), sef (earlier version)
Discussed with: cem, dteske, imp, lstewart
Sponsored By: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20116
KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports
offload of TLS for transmitted data. Key negotation must still be
performed in userland. Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option. All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.
Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type. Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.
At present, rekeying is not supported though the in-kernel framework
should support rekeying.
KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer. Each TLS frame is described by a single
ext_pgs mbuf. The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.
KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.
Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then
called to schedule TLS frames for encryption. In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed. For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().
A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue(). Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.
(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)
KTLS supports pluggable software encryption backends. Internally,
Netflix uses proprietary pure-software backends. This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames. As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.
Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready(). At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.
ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session. TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted. The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface. If so, the packet is tagged
with the TLS send tag and sent to the interface. The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation. If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped. In addition, a task is scheduled to refresh the TLS send
tag for the TLS session. If a new TLS send tag cannot be allocated,
the connection is dropped. If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag. (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another. As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)
ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.
Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option. They can also use this socket
option to toggle between software and ifnet TLS modes.
In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax. However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.
Various sysctls and counters are available under the kern.ipc.tls
sysctl node. The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default). The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.
KTLS is enabled via the KERN_TLS kernel option.
This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.
Reviewed by: gallatin, hselasky, rrs
Obtained from: Netflix
Sponsored by: Netflix, Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21277
They follow the conventions set by rw and sx lock probes. There is
an additional lockstat:::lockmgr-disown probe.
Update lockstat(1) to report on contention and hold events for
lockmgr locks. Document the new probes in dtrace_lockstat.4, and
deduplicate some of the existing probe descriptions.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21355
This adds safety net for the case of misconfigured NTB with too big
memory window, for which we may be unable to allocate a memory buffer,
which does not make much sense for the network interface. While there,
fix the code to really work with asymmetric window sizes setup.
This makes driver just print warning message on boot instead of hanging
if too large memory window is configured.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
expression howmany(BBSIZE, PAGE_SIZE), where BBSIZE is the size of the
boot block area. That can be less than 2 if PAGE_SIZE is big.
swapon(8) has an option to trim (delete) all the blocks of a device at
startup. However, if the first of those blocks is a bsd label, then
trimming those blocks is destructive. Change swapon to leave the
first BBSIZE bytes untrimmed.
Update manual pages to reflect changes in how swapon and how it may be
used, espeically in association with savecore.
Reviewed by: alc
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21191
The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.
mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format. Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned. This happened in about 1 out of every 512
cases. The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect.
Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).
mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.
* Input block size validation was pulled out of individual compression
init routines into main().
* Init routines now validate a user-provided compression level or select
an algorithm-specific default, if none was provided.
* A new interface for calculating the maximal compressed size of an
incompressible input block was added for each driver. The generic code
uses it to validate against MAXPHYS as well as to allocate compression
result buffers in the generic code.
* Algorithm selection is now driven by a table lookup, to increase ease of
adding other formats in the future.
mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.
Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'. '-L' is considered deprecated, but will probably never be removed.
All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.
Relnotes: yes
Follow-up on r322318 and r322319 and remove the deprecated modules.
Shift some now-unused kernel files into userspace utilities that incorporate
them. Remove references to removed GEOM classes in userspace utilities.
Reviewed by: imp (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21249
rcmds were removed in r32435 and these three man pages can trivially
drop the references.
There's still a reference in pts.4 because it describes a mode
(TIOCPKT_NOSTOP), and only lists rlogin/rlogind as examples of programs
that use that mode. To update later.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
More extensive changes to this page are certainly needed, but at least
remove references to binaries that no longer exist.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The driver was originally written with the name ads1115, but at the last
minute it got renamed to ads111x to reflect its support for many related
chips, but I forgot to update the manpage to match the renaming before
committing it all.
As suggested in:
https://wiki.freebsd.org/WhatsGoing/FreeBSD13
We will be dropping the snd_ds1 driver. The driver is known to be buggy
and no one has been working on it for years now.
Users of old Yamaha cards may have luck with the OSS drivers instead.
MFC after: 3 days
Diferential Revision: https://reviews.freebsd.org/D21138
Similar to what was done for device_printfs in r347229.
Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.
Reviewed by: markj
Discussed with: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21165
The API is used to gracefully terminate text line(s) with a single \n. If
the formatted buffer was empty or already ended in \n, it is unmodified.
Otherwise, a newline character is appended to it. The API, like other
sbuf-modifying routines, is only valid while the sbuf is not FINISHED.
Reviewed by: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21030
The goal is to avoid some kinds of low-memory deadlock when formatting
heap-allocated buffers.
Reviewed by: vangyzen
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21015
with Communication Device Class Ethernet Emulation Model (CDC EEM).
The driver supports both the device, and host side operation; there
is a new USB template (#11) for the former.
This enables communication with virtual USB NIC provided by iLO 5,
as found in new HPE Proliant servers.
Reviewed by: hselasky
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Hewlett Packard Enterprise
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
PR: 199934 216391 233783 234581 235773 235774 235775
PR: 236226 236231 236236 236291 236329 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557
PR: 236560 236844 237052 237181 237588 238565
Reviewed by: bcr (man pages)
Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit
review on project branch)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Pull Request: https://reviews.freebsd.org/D21110
Instances of the device can be configured using hints or FDT data.
Interfaces to reconfigure the chip and extract voltage measurements from
it are available via sysctl(8).
* Initialize the alpha parameter to a conservative value (like Linux)
* Improve handling of arithmetic.
* Improve man-page
Obtained from: Richard Scheffenegger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20549
r350315 created a Linux compatible copy_file_range(2) syscall.
It uses a VOP method called VOP_COPY_FILE_RANGE so that file systems,
such as the NFSv4.2 client can do file system specific copying.
For NFSv4.2, this allows the copying to be done locally on the NFS server,
avoiding transferring the data across the wire twice.
This is a new man page (content changed).
Reviewed by: kib, asomers
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.
Reported by: devgs@ukr.net
Reviewed by: rrs@
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D20980
Describe missed functions.
Give some hint about refcount_release(9) memory ordering guarantees.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21020
It was changed from int to register_t in r22521 and from register_t to long
in r328099, but the man page wasn't updated either time.
MFC after: 2 weeks
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21014
Update login(1), its manual pages, similar utilities, and motd.5 to refer to
the new location.
Suggested by: delphij@ (re: r349256)
Reviewed by: bcr (manpages), delphij
Differential Revision: https://reviews.freebsd.org/D20721
We don't split the other man pages in their own package so do the same for runtime.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D20962
The nvme drive dumps only the most relevant details about a command when it
fails. However, there are times this is not sufficient (such as debugging weird
issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1
in loader.conf will enable more complete debugging information about each
command that fails.
Reviewed by: rpokala
Sponsored by: Netflix
Differential Version: https://reviews.freebsd.org/D20988
with various laptops using hdaa(4) sound devices. We don't seem to know
the "correct" configurations for these devices and the defaults are far
superiour, e.g. they work if you don't nuke the default configs.
PR: 200526
Differential Revision: https://reviews.freebsd.org/D17772