Thanks to bapt, bz, cem, woodsb02, Neel Chauhan and Salvador Martínez
Mármol for helping test the initial 9000-series support.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Alter bsd.compat.mk to set MACHINE and MACHINE_ARCH when included
directly so MD paths in Makefiles work. In the process centralize
setting them in LIBCOMPATWMAKEENV.
Alter .PATH and CFLAGS settings in work when the Makefile is included.
While here only support LIB32 on supported platforms rather than always
enabling it and requiring users of MK_LIB32 to filter based
TARGET/MACHINE_ARCH.
The net effect of this change is to make Makefile.libcompat only build
compatability libraries.
Changes relative to r354449:
Correct detection of the compiler type when bsd.compat.mk is used
outside Makefile.libcompat. Previously it always matched the clang
case.
Set LDFLAGS including the linker emulation for mips where -m32 seems to
be insufficent.
Reviewed by: imp, kib (origional version in r354449)
Obtained from: CheriBSD (conceptually)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22251
GCC 4.2.1 is being removed before FreeBSD 13, as are some other
components required by FreeBSD/sparc64. Contemporary GCC does not build
and there is currently no indication that anyone is going to address
these issues.
PR: 228919, 233405, 236839, 239851
Alter bsd.compat.mk to set MACHINE and MACHINE_ARCH when included
directly so MD paths in Makefiles work. In the process centralize
setting them in LIBCOMPATWMAKEENV.
Alter .PATH and CFLAGS settings in work when the Makefile is included.
While here only support LIB32 on supported platforms rather than always
enabling it and requiring users of MK_LIB32 to filter based
TARGET/MACHINE_ARCH.
The net effect of this change is to make Makefile.libcompat only build
compatability libraries.
Reviewed by: imp, kib
Obtained from: CheriBSD (conceptually)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22251
Postings were sent to -arch@ on 2019/09/13 and 2019/10/01, proposing and
confirming a removal of these scripts on 2019/10/31, due to significant work
needed to bring this into the modern world and nobody having done this work
in the past couple of years. No objections or proposed work was raised in
response to these postings. The tinyware may see a resurrection into a
separate repo for archival purposes if any users of it show interest in
doing so.
MFC after: never
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap
Reviewed by: hrs, bcr, lwhsu, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22146
Replace explicit TARGET_* variables with COMPAT_* versions defined based
on where the file is being included.
Also, require that bsd.compat.mk be included directly. It's not going to
be widely used so always loading it in bsd.prog.mk doesn't make sense.
Instead users can include it directly.
Reviewed by: imp, bdrewery (prior revision)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22059
The source part of the review will be addressed in a different way.
Reviewed by: emaste, brooks
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21492
Summary:
Historically, we have built toolchain components such as cc, ld, etc as
statically linked executables. One of the reasons being that you could
sometimes save yourself from botched upgrades, by e.g. recompiling a
"known good" libc and reinstalling it.
In this day and age, we have boot environments, virtual machine
snapshots, cloud backups, and other much more reliable methods to
restore systems to working order. So I think the time is ripe to flip
this default, and link the toolchain components dynamically, just like
almost all other executables on FreeBSD.
Maybe at some point they can even become PIE executables by default! :)
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22061
Notices appear both in picobsd(8) (near the top for easy notice) and are
also printed to stderr on every invocation of picobsd for visibility.
The tentative date for removal is October 31st, as no volunteers have
stepped forward at all from postings to -arch@ at least.
No objection from: -arch@
MFC after: 3 days
The change is for the example in textdump.4 and the default ddb.conf.
First of all, doadump now requires an argument and it won't do a
textdump if the argument is not 'true'.
And 'textdump dump' is more idiomatic anyway.
For what it's worth, ddb 'dump' command seems to always request a vmcore
dump even if a textdump was requested earlier, e.g., by 'textdump set'.
Finally, ddb 'call' command is not documented.
MFC after: 2 weeks
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.
There are three pieces in the complete NetGDB system.
First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.
Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).
Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.
The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU). There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with: emaste, markj
Relnotes: sure
Differential Revision: https://reviews.freebsd.org/D21568
Loosen requirements for connecting to debugnet-type servers. Only require a
destination address; the rest can theoretically be inferred from the routing
table.
Relax corresponding constraints in netdump(4) and move ifp validation to
debugnet connection time.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21482
Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine
to the generic debugnet code. The imagined use is both netdump, shown here,
and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions,
to parse out IPv4 addresses.
'Netdump' uses the generic debugnet routine to load a configuration and
start a dump, without any netdump configuration prior to panic.
Loosely derived from work by: John Reimer <john.reimer AT emc.com>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21460
This allows to remove a bunch of low level code.
Also, superio(4) provides safer interaction with other drivers
that work with Super I/O configuration registers.
Tested only on PCengines APU2:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
wbwd0: <Nuvoton NCT6102 (0xc4/0x53) Watchdog Timer> at WDT ldn 0x08 on superio0
The watchdog output is incorrectly wired on that system and the watchdog
does not really do it its job, but the pulse can be seen with a signal
analyzer.
Reviewed by: delphij, bcr (man page)
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21979
Linkage is controlled by two make knobs:
WANT_COMPAT - Prefer to link against the compat ABI.
NEED_COMPAT - Link against the compat ABI or fail to build.
Supported values are "32", "soft", and "any". The latter meaning pick
the first[0] supported compat ABI.
This can be used to provide test binaries for compat ABIs or to link
ABI-specific programs.
[0] We currently support only one compat ABI at a time, but this may
change in the future and some code in this commit is structured to ease
that change.
Reviewed by: bdrewery, jhb
Obtained from: CheriBSD (in concept)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22023
This will allow us to link against internal libraries when building
programs for the system's LIBCOMPAT ABI.
Reviewed by: bdrewery
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
This is the first step if refactoring the definitions to allow programs
to be selectively linked against libcompat libraries.
Reviewed by: bdrewery
Sponsored by: DARPA, AFRL
This adds basic documentation on what the superio driver is and how
other drivers can interact with it. I decided to also document
superio's ivar accessors.
Reviewed by: bcr, brueffer (both manual contents only)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21958
Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.
Add total zone size ("Memory Use") as a new column for UMA.
For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.
Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by: Dell EMC Isilon
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was
inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this
is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
Discussed with: kib
MFC after: 4 weeks
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
The new sysctl was not added to the siftr.4 man page at the time.
This updates the man page, and removes one left over trailing whitespace.
Submitted by: Richard Scheffenegger
Reviewed by: bcr@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21619
This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.
See the stats(3) manual page for details.
This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.
The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.
Reviewed by: sef (earlier version)
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20477
Diff partially stolen from CheriBSD; these bits need -Wl,-z,notext in order
to build in an LLVM world. They are needed for all flavors/sizes of MIPS.
This will eventually get fixed in LLVM, but it's unclear when.
Reported by: arichardson, emaste
Differential Revision: https://reviews.freebsd.org/D21696
At least some of the characters in E000-F8FF range are used by Powerline
fonts, and having no attributes for these ranges in UnicodeData.txt
other than "Other, Private Use" it should be safe to mark all of them as
printable. Some actually were before r340491, so this fixes the
regression introduced there as well.
PR: 240911
Reviewed by: bapt
Tested by: Daniel Ponte <amigan@gmail.com>
Differential Revision: https://reviews.freebsd.org/D21850
This change allows to specify a watchdog(9) timeout for a system
shutdown. The timeout is activated when the watchdogd daemon is
stopped. The idea is to a prevent any indefinite hang during late
stages of the shutdown. The feature is implemented in rc.d/watchdogd,
it builds upon watchdogd -x option.
Note that the shutdown timeout is not actiavted when the watchdogd
service is individually stopped by an operator. It is also not
activated for the 'shutdown' to the single-user mode. In those cases it
is assumed that the operator knows what they are doing and they have
means to recover the system should it hang.
Significant subchanges and implementation details:
- the argument to rc.shutdown, completely unused before, is assigned to
rc_shutdown variable that can be inspected by rc scripts
- init(8) passes "single" or "reboot" as the argument, this is not
changed
- the argument is not mandatory and if it is not set then rc_shutdown is
set to "unspecified"
- however, the default jail management scripts and jail configuration
examples have been updated to pass "jail" to rc.shutdown, just in case
- the new timeout can be set via watchdogd_shutdown_timeout rc option
- for consistency, the regular timeout can now be set via
watchdogd_timeout rc option
- watchdogd_shutdown_timeout and watchdogd_timeout override timeout
specifications in watchdogd_flags
- existing configurations, where the new rc options are not set, should
keep working as before
I am not particularly wed to any of the implementation specifics.
I am open to changing or removing any of them as long as the provided
functionality is the same (or very close) to the proposed one.
For example, I think it can be implemented without using watchdogd -x,
by means of watchdog(1) alone. In that case there would be a small
window between stopping watchdogd and running watchdog, but I think that
that is acceptable.
Reviewed by: bcr (man page changes)
MFC after: 5 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D21221
can handle. Instead using an array on node private data, use per-hook
private data.
- Use NG_NODE_FOREACH_HOOK() to traverse through hooks instead of array.
PR: 240787
Submitted by: Lutz Donnerhacke <lutz donnerhacke.de>
Differential Revision: https://reviews.freebsd.org/D21803
GCC uses "dynamic" TLS models when -fpic or -fPIC is explicitly
specified on the command line (which is only true for shared libraries).
It uses "static" (or "exec") TLS models otherwise. In particular, GCC
does _not_ use dynamic TLS models when PIC is implicitly enabled (which
it is on MIPS), only if a PIC flag is explicitly provided.
llvm uses "dynamic" TLS models if PIC is enabled either via a PIC flag
or if it is implicily enabled (as on MIPS64). This means that llvm on
MIPS64 always uses "dynamic" TLS models. However, dynamic TLS models
do not work for static binaries and libraries as the __tls_get_addr
function they invoke is only defined in rtld.
Written by: jhb
Reviewed by: arichardson
Differential Revision: https://reviews.freebsd.org/D21699
in mlx5core. The EEPROM information is not only a property of the
mlx5en(4) driver.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
This setup will add the trusted certificates from the Mozilla NSS bundle
to base.
This commit includes:
- CAROOT option to opt out of installation of certs
- mtree amendments for final destinations
- infrastructure to fetch/update certs, along with instructions
A follow-up commit will add a certctl(8) utility to give the user control
over trust specifics. Another follow-up commit will actually commit the
initial result of updatecerts.
This work was done primarily by allanjude@, with minor contributions by
myself.
No objection from: secteam
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16856
STAGE_DIR.${${_${group}DIR_${file}}:C,[/*],_,g} was getting
${STAGE_OBJTOP}BINDIR rather than
${STAGE_OBJTOP}${BINDIR} when FILESDIR=BINDIR
Reviewed by: stevek
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21858
picobsd/tinyware has had this compact HTTPD server for a long time, and some
people do use it. Move it out into usr.sbin well in advance of any action
being taken on picobsd.
This has been gated behind an HTTPD option defaulted to *off*, primarily for
two reasons:
1.) This code likely needs a good audit, as it's been living off in picobsd
land for a long time, and
2.) We don't currently ship an httpd and this may not be a welcome surprise.
Reviewed by: eugen
Differential Revision: https://reviews.freebsd.org/D21724
and reinserting it back with an updated key.
This is one of dependencies for the upcoming stats(3) code.
Reviewed by: cem
Obtained from: Netflix
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D21786
Summary: When powerpc64 switches to LLVM, use this patch to enable
OpenMP as well. OpenMP on PPC is only for 64-bits, so don't make a
32-bit libomp. A change to openmp files is necesssary (under review on
https://reviews.llvm.org/D67190), because it determines ELF format
version based on endianness, which is incorrect.
Reviewed by: alfredo.junior_eldorado.org.br, #manpages
Differential Revision: https://reviews.freebsd.org/D21532
chflags -R on it, otherwise the command will error out. (Note that
adding -f to the chflags invocation does not help, unlike with rm.)
MFC after: 3 days
Parts of the fusefs tests trigger a bug in current versions of llvm: IR
representation of some routine for the MIPS targets is a function with a
large number of arguments. This then leads the compiler on an hour+ long
goose chase, which is OK if you build the current tree but less-so if you're
trying external toolchain or doing a universe build involving mips when it
eventually gets switched over to LLVM.
Better, accurate details can be found in LLVM PR43263.
has become very trigger-happy with libc++ 9.0.0.
It does not help that gcc's implementation of this warning is even more
trigger-happy, in the sense that it already warns on the declaration
itself, not when you are using it. This is very annoying with our use
of -Wsystem-headers. That should really be disabled for gcc.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
to the traditional tree(3) RB trees, but using an array (preallocated,
linear chunk of memory) to store the tree.
This avoids allocation overhead, improves memory locality,
and makes it trivially easy to share/transfer/copy the entire tree
without the need for marshalling. The downside is that the size
is fixed at initialization time; there is no mechanism to resize
it.
This is one of the dependencies for the new stats(3) framework
(https://reviews.freebsd.org/D20477).
Reviewed by: bcr (man pages), markj
Discussed with: cem
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20324
std::auto_ptr in a whole bunch of individual Makefiles, make the warning
globally non-fatal instead. This is similar to what was done to many
more non-fatal warnings from newer gcc versions.
These commands show the route resolved for a specified destination, or
print out the entire routing table for a given address family (or all
families, if none is explicitly provided).
Discussed with: emaste
Differential Revision: https://reviews.freebsd.org/D21510
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
If ipv4_prefer is specified, Section 10.3 is relevant.
If ipv6_prefer is specified, Section 2.1 is relevant.
This change makes the corresponding options/sections 'respective'
PR: docs/234249
Submitted by: David Fiander <david@fiander.info>
Both clang and gcc development branches have reached version 10. Since we
only parse for a single digit in the major version number, this causes
COMPILER_VERSION to be set to its default of 0.0.0, meaning version checks
fail with these newer compilers.
Reviewed by: emaste
Approved by: markj (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21413
We cannot use file (without :T) to name targets
but we can use the destination directory (with / replaced by _)
This has the benefit of minimizing the targets created.
Reviewed by: bdrewery
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org//D21283
The EXAMPLES section does not contain any examples of output formats for
the old-style scripts. Remove the misleading bits stating otherwise.
Reviewed by: bcr, imp
Approved by: src (imp)
Differential Revision: https://reviews.freebsd.org/D21552
A user may set ${name}_env variable in rc.conf(5) in order to set additional
environment variables for a service command. Unfortunately, at the moment
this variable is only honored when the command is specified via the command
variable. Those additional environment variables coming from ${name}_env
are never set if the service is started via the ${rc_arg}_cmd variable (for
example start_cmd).
PR: 239692
Reviewed by: bcr, jilles
Approved by: src (jilles)
Differential Revision: https://reviews.freebsd.org/D21228
The default package use to be FreeBSD-runtime but it should only contain
binaries and libs enough to boot to single user and repair the system, it
is also very handy to have a package that can be tranform to a small mfsroot.
So create a new package named FreeBSD-utilities and make it the default one.
Also move a few binaries and lib into this package when it make sense.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D21506
All of them are needed to be able to boot to single user and be able
to repair a existing FreeBSD installation so put them directly into
FreeBSD-runtime.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D21503
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure. This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.
With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets. Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only
items in excess of the estimate are reclaimed. Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.
Now, when under memory pressure, the page daemon will trim zones
rather than draining them. As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.
Reviewed by: jeff
Tested by: pho (an earlier version)
MFC after: 2 months
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16667
- Fix warnings from igor and mandoc.
- Provide a brief description of the separation between zones and their
backend slab allocators.
- Document cache zones and secondary zones.
- Document the kernel config options added in r350659.
- Document the uma_zalloc_pcpu() and uma_zfree_pcpu() wrappers.
- Document uma_zone_reserve(), uma_zone_reserve_kva() and
uma_zone_prealloc().
- Document uma_zone_alloc() and uma_zone_freef().
- Add some missing MLINKs and Xrefs.
MFC after: 2 weeks
This makes it possible to perform mathematical operations on
fractional values without using floating point. It operates on Q
numbers, which are integer-sized, opaque structures initialized
to hold a chosen number of integer and fractional bits.
For a general description of the Q number system, see the "Fixed Point
Representation & Fractional Math" whitepaper[1]; for the actual
API see the qmath(3) man page.
This is one of dependencies for the upcoming stats(3) framework[2]
that will be applied to the TCP stack in a later commit.
1. https://www.superkits.net/whitepapers/Fixed%20Point%20Representation%20&%20Fractional%20Math.pdf
2. https://reviews.freebsd.org/D20477
Reviewed by: bcr (man pages, earlier version), sef (earlier version)
Discussed with: cem, dteske, imp, lstewart
Sponsored By: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20116
KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports
offload of TLS for transmitted data. Key negotation must still be
performed in userland. Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option. All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.
Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type. Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.
At present, rekeying is not supported though the in-kernel framework
should support rekeying.
KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer. Each TLS frame is described by a single
ext_pgs mbuf. The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.
KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.
Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then
called to schedule TLS frames for encryption. In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed. For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().
A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue(). Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.
(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)
KTLS supports pluggable software encryption backends. Internally,
Netflix uses proprietary pure-software backends. This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames. As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.
Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready(). At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.
ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session. TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted. The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface. If so, the packet is tagged
with the TLS send tag and sent to the interface. The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation. If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped. In addition, a task is scheduled to refresh the TLS send
tag for the TLS session. If a new TLS send tag cannot be allocated,
the connection is dropped. If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag. (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another. As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)
ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.
Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option. They can also use this socket
option to toggle between software and ifnet TLS modes.
In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax. However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.
Various sysctls and counters are available under the kern.ipc.tls
sysctl node. The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default). The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.
KTLS is enabled via the KERN_TLS kernel option.
This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.
Reviewed by: gallatin, hselasky, rrs
Obtained from: Netflix
Sponsored by: Netflix, Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21277
As discussed on arch@, gcc 4.2.1 is on its way out. Turn off Werror on gcc
versions < 5.0 permantly. This will allow older platforms to continue to compile
w/o new errors once we take them out of universe by default. This will also free
developers from chasing down obsolete warnings that produce no beneficial
changes to the source.
Discussed on: arch@
Reviewed by: jhb@, emaste@, pfg@
Differential Revision: https://reviews.freebsd.org/D21378
They follow the conventions set by rw and sx lock probes. There is
an additional lockstat:::lockmgr-disown probe.
Update lockstat(1) to report on contention and hold events for
lockmgr locks. Document the new probes in dtrace_lockstat.4, and
deduplicate some of the existing probe descriptions.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21355
This adds safety net for the case of misconfigured NTB with too big
memory window, for which we may be unable to allocate a memory buffer,
which does not make much sense for the network interface. While there,
fix the code to really work with asymmetric window sizes setup.
This makes driver just print warning message on boot instead of hanging
if too large memory window is configured.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
expression howmany(BBSIZE, PAGE_SIZE), where BBSIZE is the size of the
boot block area. That can be less than 2 if PAGE_SIZE is big.
swapon(8) has an option to trim (delete) all the blocks of a device at
startup. However, if the first of those blocks is a bsd label, then
trimming those blocks is destructive. Change swapon to leave the
first BBSIZE bytes untrimmed.
Update manual pages to reflect changes in how swapon and how it may be
used, espeically in association with savecore.
Reviewed by: alc
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21191
In r342974 jhibbits added support to build crtsavres.o. This was the
blocker for BSD_CRTBEGIN to be enabled there. As such enable this
option again.
Reviewed by: jhibbits
Sponsored by: DARPA, AFRL
The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.
mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format. Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned. This happened in about 1 out of every 512
cases. The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect.
Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).
mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.
* Input block size validation was pulled out of individual compression
init routines into main().
* Init routines now validate a user-provided compression level or select
an algorithm-specific default, if none was provided.
* A new interface for calculating the maximal compressed size of an
incompressible input block was added for each driver. The generic code
uses it to validate against MAXPHYS as well as to allocate compression
result buffers in the generic code.
* Algorithm selection is now driven by a table lookup, to increase ease of
adding other formats in the future.
mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.
Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'. '-L' is considered deprecated, but will probably never be removed.
All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.
Relnotes: yes
Follow-up on r322318 and r322319 and remove the deprecated modules.
Shift some now-unused kernel files into userspace utilities that incorporate
them. Remove references to removed GEOM classes in userspace utilities.
Reviewed by: imp (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21249
rcmds were removed in r32435 and these three man pages can trivially
drop the references.
There's still a reference in pts.4 because it describes a mode
(TIOCPKT_NOSTOP), and only lists rlogin/rlogind as examples of programs
that use that mode. To update later.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
More extensive changes to this page are certainly needed, but at least
remove references to binaries that no longer exist.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The driver was originally written with the name ads1115, but at the last
minute it got renamed to ads111x to reflect its support for many related
chips, but I forgot to update the manpage to match the renaming before
committing it all.
As suggested in:
https://wiki.freebsd.org/WhatsGoing/FreeBSD13
We will be dropping the snd_ds1 driver. The driver is known to be buggy
and no one has been working on it for years now.
Users of old Yamaha cards may have luck with the OSS drivers instead.
MFC after: 3 days
Diferential Revision: https://reviews.freebsd.org/D21138
It is part of -Wformat, which is enabled by -Wall. Empty format strings are
well defined and it is perfectly reasonable to expect them in a formatting
interface.
Similar to what was done for device_printfs in r347229.
Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.
Reviewed by: markj
Discussed with: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21165
The API is used to gracefully terminate text line(s) with a single \n. If
the formatted buffer was empty or already ended in \n, it is unmodified.
Otherwise, a newline character is appended to it. The API, like other
sbuf-modifying routines, is only valid while the sbuf is not FINISHED.
Reviewed by: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21030
The goal is to avoid some kinds of low-memory deadlock when formatting
heap-allocated buffers.
Reviewed by: vangyzen
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21015
with Communication Device Class Ethernet Emulation Model (CDC EEM).
The driver supports both the device, and host side operation; there
is a new USB template (#11) for the former.
This enables communication with virtual USB NIC provided by iLO 5,
as found in new HPE Proliant servers.
Reviewed by: hselasky
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Hewlett Packard Enterprise
Don't mark nvme as broken on aarch64. It compiles, at least, and people are
testing it out. This only enables the userland parts of the nvme stack.
Submitted by: greg at unrelenting technologies
Differential Revision: https://reviews.freebsd.org/D21168
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
PR: 199934 216391 233783 234581 235773 235774 235775
PR: 236226 236231 236236 236291 236329 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557
PR: 236560 236844 237052 237181 237588 238565
Reviewed by: bcr (man pages)
Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit
review on project branch)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Pull Request: https://reviews.freebsd.org/D21110
Instances of the device can be configured using hints or FDT data.
Interfaces to reconfigure the chip and extract voltage measurements from
it are available via sysctl(8).
Having the full uname output can be useful on head even with
unmodified trees or trees that newvers.sh fails to recognize as
modified.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D20895
Some warning flags are valid for C++ but not C. GCC 8 complains if you pass
such flags when building a C file. Using a separate variable for these
flags allows building both C and C++ files in the same directory (such as
the fusefs tests) under GCC.
Reviewed by: cem, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21116
* Initialize the alpha parameter to a conservative value (like Linux)
* Improve handling of arithmetic.
* Improve man-page
Obtained from: Richard Scheffenegger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20549
r350315 created a Linux compatible copy_file_range(2) syscall.
It uses a VOP method called VOP_COPY_FILE_RANGE so that file systems,
such as the NFSv4.2 client can do file system specific copying.
For NFSv4.2, this allows the copying to be done locally on the NFS server,
avoiding transferring the data across the wire twice.
This is a new man page (content changed).
Reviewed by: kib, asomers
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
Now that we have a way to obtain entropy in capability mode
(getrandom(2)), libcap_random is obsolete. Remove it.
Bump __FreeBSD_version in case anything happens to use it, though I've
found no consumers.
Reviewed by: delphij, emaste, oshogbo
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21033
This will be used to gate the fusefs tests. It's a partial merge of r348281
from projects/fuse2.
Reviewed by: kib, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21044
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.
Reported by: devgs@ukr.net
Reviewed by: rrs@
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D20980
Describe missed functions.
Give some hint about refcount_release(9) memory ordering guarantees.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21020
It was changed from int to register_t in r22521 and from register_t to long
in r328099, but the man page wasn't updated either time.
MFC after: 2 weeks
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21014
Update login(1), its manual pages, similar utilities, and motd.5 to refer to
the new location.
Suggested by: delphij@ (re: r349256)
Reviewed by: bcr (manpages), delphij
Differential Revision: https://reviews.freebsd.org/D20721
Allow ABI to be over ridden to allow (with other changes) programs to be
built targeting ABIs other than the default. This is used in CheriBSD.
Reviewed by: imp
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D21001