Postings were sent to -arch@ on 2019/09/13 and 2019/10/01, proposing and
confirming a removal of these scripts on 2019/10/31, due to significant work
needed to bring this into the modern world and nobody having done this work
in the past couple of years. No objections or proposed work was raised in
response to these postings. The tinyware may see a resurrection into a
separate repo for archival purposes if any users of it show interest in
doing so.
MFC after: never
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap
Reviewed by: hrs, bcr, lwhsu, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22146
Replace explicit TARGET_* variables with COMPAT_* versions defined based
on where the file is being included.
Also, require that bsd.compat.mk be included directly. It's not going to
be widely used so always loading it in bsd.prog.mk doesn't make sense.
Instead users can include it directly.
Reviewed by: imp, bdrewery (prior revision)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22059
The source part of the review will be addressed in a different way.
Reviewed by: emaste, brooks
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21492
Summary:
Historically, we have built toolchain components such as cc, ld, etc as
statically linked executables. One of the reasons being that you could
sometimes save yourself from botched upgrades, by e.g. recompiling a
"known good" libc and reinstalling it.
In this day and age, we have boot environments, virtual machine
snapshots, cloud backups, and other much more reliable methods to
restore systems to working order. So I think the time is ripe to flip
this default, and link the toolchain components dynamically, just like
almost all other executables on FreeBSD.
Maybe at some point they can even become PIE executables by default! :)
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22061
Notices appear both in picobsd(8) (near the top for easy notice) and are
also printed to stderr on every invocation of picobsd for visibility.
The tentative date for removal is October 31st, as no volunteers have
stepped forward at all from postings to -arch@ at least.
No objection from: -arch@
MFC after: 3 days
The change is for the example in textdump.4 and the default ddb.conf.
First of all, doadump now requires an argument and it won't do a
textdump if the argument is not 'true'.
And 'textdump dump' is more idiomatic anyway.
For what it's worth, ddb 'dump' command seems to always request a vmcore
dump even if a textdump was requested earlier, e.g., by 'textdump set'.
Finally, ddb 'call' command is not documented.
MFC after: 2 weeks
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.
There are three pieces in the complete NetGDB system.
First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.
Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).
Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.
The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU). There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with: emaste, markj
Relnotes: sure
Differential Revision: https://reviews.freebsd.org/D21568
Loosen requirements for connecting to debugnet-type servers. Only require a
destination address; the rest can theoretically be inferred from the routing
table.
Relax corresponding constraints in netdump(4) and move ifp validation to
debugnet connection time.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21482
Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine
to the generic debugnet code. The imagined use is both netdump, shown here,
and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions,
to parse out IPv4 addresses.
'Netdump' uses the generic debugnet routine to load a configuration and
start a dump, without any netdump configuration prior to panic.
Loosely derived from work by: John Reimer <john.reimer AT emc.com>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21460
This allows to remove a bunch of low level code.
Also, superio(4) provides safer interaction with other drivers
that work with Super I/O configuration registers.
Tested only on PCengines APU2:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
wbwd0: <Nuvoton NCT6102 (0xc4/0x53) Watchdog Timer> at WDT ldn 0x08 on superio0
The watchdog output is incorrectly wired on that system and the watchdog
does not really do it its job, but the pulse can be seen with a signal
analyzer.
Reviewed by: delphij, bcr (man page)
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21979
Linkage is controlled by two make knobs:
WANT_COMPAT - Prefer to link against the compat ABI.
NEED_COMPAT - Link against the compat ABI or fail to build.
Supported values are "32", "soft", and "any". The latter meaning pick
the first[0] supported compat ABI.
This can be used to provide test binaries for compat ABIs or to link
ABI-specific programs.
[0] We currently support only one compat ABI at a time, but this may
change in the future and some code in this commit is structured to ease
that change.
Reviewed by: bdrewery, jhb
Obtained from: CheriBSD (in concept)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22023
This will allow us to link against internal libraries when building
programs for the system's LIBCOMPAT ABI.
Reviewed by: bdrewery
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
This is the first step if refactoring the definitions to allow programs
to be selectively linked against libcompat libraries.
Reviewed by: bdrewery
Sponsored by: DARPA, AFRL
This adds basic documentation on what the superio driver is and how
other drivers can interact with it. I decided to also document
superio's ivar accessors.
Reviewed by: bcr, brueffer (both manual contents only)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21958
Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.
Add total zone size ("Memory Use") as a new column for UMA.
For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.
Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by: Dell EMC Isilon
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was
inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this
is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
Discussed with: kib
MFC after: 4 weeks
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
The new sysctl was not added to the siftr.4 man page at the time.
This updates the man page, and removes one left over trailing whitespace.
Submitted by: Richard Scheffenegger
Reviewed by: bcr@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21619
This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.
See the stats(3) manual page for details.
This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.
The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.
Reviewed by: sef (earlier version)
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20477
Diff partially stolen from CheriBSD; these bits need -Wl,-z,notext in order
to build in an LLVM world. They are needed for all flavors/sizes of MIPS.
This will eventually get fixed in LLVM, but it's unclear when.
Reported by: arichardson, emaste
Differential Revision: https://reviews.freebsd.org/D21696
At least some of the characters in E000-F8FF range are used by Powerline
fonts, and having no attributes for these ranges in UnicodeData.txt
other than "Other, Private Use" it should be safe to mark all of them as
printable. Some actually were before r340491, so this fixes the
regression introduced there as well.
PR: 240911
Reviewed by: bapt
Tested by: Daniel Ponte <amigan@gmail.com>
Differential Revision: https://reviews.freebsd.org/D21850
This change allows to specify a watchdog(9) timeout for a system
shutdown. The timeout is activated when the watchdogd daemon is
stopped. The idea is to a prevent any indefinite hang during late
stages of the shutdown. The feature is implemented in rc.d/watchdogd,
it builds upon watchdogd -x option.
Note that the shutdown timeout is not actiavted when the watchdogd
service is individually stopped by an operator. It is also not
activated for the 'shutdown' to the single-user mode. In those cases it
is assumed that the operator knows what they are doing and they have
means to recover the system should it hang.
Significant subchanges and implementation details:
- the argument to rc.shutdown, completely unused before, is assigned to
rc_shutdown variable that can be inspected by rc scripts
- init(8) passes "single" or "reboot" as the argument, this is not
changed
- the argument is not mandatory and if it is not set then rc_shutdown is
set to "unspecified"
- however, the default jail management scripts and jail configuration
examples have been updated to pass "jail" to rc.shutdown, just in case
- the new timeout can be set via watchdogd_shutdown_timeout rc option
- for consistency, the regular timeout can now be set via
watchdogd_timeout rc option
- watchdogd_shutdown_timeout and watchdogd_timeout override timeout
specifications in watchdogd_flags
- existing configurations, where the new rc options are not set, should
keep working as before
I am not particularly wed to any of the implementation specifics.
I am open to changing or removing any of them as long as the provided
functionality is the same (or very close) to the proposed one.
For example, I think it can be implemented without using watchdogd -x,
by means of watchdog(1) alone. In that case there would be a small
window between stopping watchdogd and running watchdog, but I think that
that is acceptable.
Reviewed by: bcr (man page changes)
MFC after: 5 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D21221
can handle. Instead using an array on node private data, use per-hook
private data.
- Use NG_NODE_FOREACH_HOOK() to traverse through hooks instead of array.
PR: 240787
Submitted by: Lutz Donnerhacke <lutz donnerhacke.de>
Differential Revision: https://reviews.freebsd.org/D21803
GCC uses "dynamic" TLS models when -fpic or -fPIC is explicitly
specified on the command line (which is only true for shared libraries).
It uses "static" (or "exec") TLS models otherwise. In particular, GCC
does _not_ use dynamic TLS models when PIC is implicitly enabled (which
it is on MIPS), only if a PIC flag is explicitly provided.
llvm uses "dynamic" TLS models if PIC is enabled either via a PIC flag
or if it is implicily enabled (as on MIPS64). This means that llvm on
MIPS64 always uses "dynamic" TLS models. However, dynamic TLS models
do not work for static binaries and libraries as the __tls_get_addr
function they invoke is only defined in rtld.
Written by: jhb
Reviewed by: arichardson
Differential Revision: https://reviews.freebsd.org/D21699
in mlx5core. The EEPROM information is not only a property of the
mlx5en(4) driver.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
This setup will add the trusted certificates from the Mozilla NSS bundle
to base.
This commit includes:
- CAROOT option to opt out of installation of certs
- mtree amendments for final destinations
- infrastructure to fetch/update certs, along with instructions
A follow-up commit will add a certctl(8) utility to give the user control
over trust specifics. Another follow-up commit will actually commit the
initial result of updatecerts.
This work was done primarily by allanjude@, with minor contributions by
myself.
No objection from: secteam
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16856
STAGE_DIR.${${_${group}DIR_${file}}:C,[/*],_,g} was getting
${STAGE_OBJTOP}BINDIR rather than
${STAGE_OBJTOP}${BINDIR} when FILESDIR=BINDIR
Reviewed by: stevek
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21858
picobsd/tinyware has had this compact HTTPD server for a long time, and some
people do use it. Move it out into usr.sbin well in advance of any action
being taken on picobsd.
This has been gated behind an HTTPD option defaulted to *off*, primarily for
two reasons:
1.) This code likely needs a good audit, as it's been living off in picobsd
land for a long time, and
2.) We don't currently ship an httpd and this may not be a welcome surprise.
Reviewed by: eugen
Differential Revision: https://reviews.freebsd.org/D21724
and reinserting it back with an updated key.
This is one of dependencies for the upcoming stats(3) code.
Reviewed by: cem
Obtained from: Netflix
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D21786
Summary: When powerpc64 switches to LLVM, use this patch to enable
OpenMP as well. OpenMP on PPC is only for 64-bits, so don't make a
32-bit libomp. A change to openmp files is necesssary (under review on
https://reviews.llvm.org/D67190), because it determines ELF format
version based on endianness, which is incorrect.
Reviewed by: alfredo.junior_eldorado.org.br, #manpages
Differential Revision: https://reviews.freebsd.org/D21532
chflags -R on it, otherwise the command will error out. (Note that
adding -f to the chflags invocation does not help, unlike with rm.)
MFC after: 3 days
Parts of the fusefs tests trigger a bug in current versions of llvm: IR
representation of some routine for the MIPS targets is a function with a
large number of arguments. This then leads the compiler on an hour+ long
goose chase, which is OK if you build the current tree but less-so if you're
trying external toolchain or doing a universe build involving mips when it
eventually gets switched over to LLVM.
Better, accurate details can be found in LLVM PR43263.
has become very trigger-happy with libc++ 9.0.0.
It does not help that gcc's implementation of this warning is even more
trigger-happy, in the sense that it already warns on the declaration
itself, not when you are using it. This is very annoying with our use
of -Wsystem-headers. That should really be disabled for gcc.
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
to the traditional tree(3) RB trees, but using an array (preallocated,
linear chunk of memory) to store the tree.
This avoids allocation overhead, improves memory locality,
and makes it trivially easy to share/transfer/copy the entire tree
without the need for marshalling. The downside is that the size
is fixed at initialization time; there is no mechanism to resize
it.
This is one of the dependencies for the new stats(3) framework
(https://reviews.freebsd.org/D20477).
Reviewed by: bcr (man pages), markj
Discussed with: cem
MFC after: 2 weeks
Sponsored by: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20324
std::auto_ptr in a whole bunch of individual Makefiles, make the warning
globally non-fatal instead. This is similar to what was done to many
more non-fatal warnings from newer gcc versions.
These commands show the route resolved for a specified destination, or
print out the entire routing table for a given address family (or all
families, if none is explicitly provided).
Discussed with: emaste
Differential Revision: https://reviews.freebsd.org/D21510
There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.
The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.
Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486
If ipv4_prefer is specified, Section 10.3 is relevant.
If ipv6_prefer is specified, Section 2.1 is relevant.
This change makes the corresponding options/sections 'respective'
PR: docs/234249
Submitted by: David Fiander <david@fiander.info>
Both clang and gcc development branches have reached version 10. Since we
only parse for a single digit in the major version number, this causes
COMPILER_VERSION to be set to its default of 0.0.0, meaning version checks
fail with these newer compilers.
Reviewed by: emaste
Approved by: markj (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21413
We cannot use file (without :T) to name targets
but we can use the destination directory (with / replaced by _)
This has the benefit of minimizing the targets created.
Reviewed by: bdrewery
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org//D21283
The EXAMPLES section does not contain any examples of output formats for
the old-style scripts. Remove the misleading bits stating otherwise.
Reviewed by: bcr, imp
Approved by: src (imp)
Differential Revision: https://reviews.freebsd.org/D21552
A user may set ${name}_env variable in rc.conf(5) in order to set additional
environment variables for a service command. Unfortunately, at the moment
this variable is only honored when the command is specified via the command
variable. Those additional environment variables coming from ${name}_env
are never set if the service is started via the ${rc_arg}_cmd variable (for
example start_cmd).
PR: 239692
Reviewed by: bcr, jilles
Approved by: src (jilles)
Differential Revision: https://reviews.freebsd.org/D21228
The default package use to be FreeBSD-runtime but it should only contain
binaries and libs enough to boot to single user and repair the system, it
is also very handy to have a package that can be tranform to a small mfsroot.
So create a new package named FreeBSD-utilities and make it the default one.
Also move a few binaries and lib into this package when it make sense.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D21506
All of them are needed to be able to boot to single user and be able
to repair a existing FreeBSD installation so put them directly into
FreeBSD-runtime.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D21503
The page daemon periodically invokes uma_reclaim() to reclaim cached
items from each zone when the system is under memory pressure. This
is important since the size of these caches is unbounded by default.
However it also results in bursts of high latency when allocating from
heavily used zones as threads miss in the per-CPU caches and must
access the keg in order to allocate new items.
With r340405 we maintain an estimate of each zone's usage of its
(per-NUMA domain) cache of full buckets. Start making use of this
estimate to avoid reclaiming the entire cache when under memory
pressure. In particular, introduce TRIM, DRAIN and DRAIN_CPU
verbs for uma_reclaim() and uma_zone_reclaim(). When trimming, only
items in excess of the estimate are reclaimed. Draining a zone
reclaims all of the cached full buckets (the previous behaviour of
uma_reclaim()), and may further drain the per-CPU caches in extreme
cases.
Now, when under memory pressure, the page daemon will trim zones
rather than draining them. As a result, heavily used zones do not incur
bursts of bucket cache misses following reclamation, but large, unused
caches will be reclaimed as before.
Reviewed by: jeff
Tested by: pho (an earlier version)
MFC after: 2 months
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16667
- Fix warnings from igor and mandoc.
- Provide a brief description of the separation between zones and their
backend slab allocators.
- Document cache zones and secondary zones.
- Document the kernel config options added in r350659.
- Document the uma_zalloc_pcpu() and uma_zfree_pcpu() wrappers.
- Document uma_zone_reserve(), uma_zone_reserve_kva() and
uma_zone_prealloc().
- Document uma_zone_alloc() and uma_zone_freef().
- Add some missing MLINKs and Xrefs.
MFC after: 2 weeks
This makes it possible to perform mathematical operations on
fractional values without using floating point. It operates on Q
numbers, which are integer-sized, opaque structures initialized
to hold a chosen number of integer and fractional bits.
For a general description of the Q number system, see the "Fixed Point
Representation & Fractional Math" whitepaper[1]; for the actual
API see the qmath(3) man page.
This is one of dependencies for the upcoming stats(3) framework[2]
that will be applied to the TCP stack in a later commit.
1. https://www.superkits.net/whitepapers/Fixed%20Point%20Representation%20&%20Fractional%20Math.pdf
2. https://reviews.freebsd.org/D20477
Reviewed by: bcr (man pages, earlier version), sef (earlier version)
Discussed with: cem, dteske, imp, lstewart
Sponsored By: Klara Inc, Netflix
Obtained from: Netflix
Differential Revision: https://reviews.freebsd.org/D20116
KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports
offload of TLS for transmitted data. Key negotation must still be
performed in userland. Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option. All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.
Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type. Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.
At present, rekeying is not supported though the in-kernel framework
should support rekeying.
KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer. Each TLS frame is described by a single
ext_pgs mbuf. The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.
KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.
Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then
called to schedule TLS frames for encryption. In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed. For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().
A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue(). Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.
(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)
KTLS supports pluggable software encryption backends. Internally,
Netflix uses proprietary pure-software backends. This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames. As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.
Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready(). At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.
ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session. TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted. The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface. If so, the packet is tagged
with the TLS send tag and sent to the interface. The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation. If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped. In addition, a task is scheduled to refresh the TLS send
tag for the TLS session. If a new TLS send tag cannot be allocated,
the connection is dropped. If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag. (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another. As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)
ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.
Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option. They can also use this socket
option to toggle between software and ifnet TLS modes.
In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax. However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.
Various sysctls and counters are available under the kern.ipc.tls
sysctl node. The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default). The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.
KTLS is enabled via the KERN_TLS kernel option.
This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.
Reviewed by: gallatin, hselasky, rrs
Obtained from: Netflix
Sponsored by: Netflix, Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21277
As discussed on arch@, gcc 4.2.1 is on its way out. Turn off Werror on gcc
versions < 5.0 permantly. This will allow older platforms to continue to compile
w/o new errors once we take them out of universe by default. This will also free
developers from chasing down obsolete warnings that produce no beneficial
changes to the source.
Discussed on: arch@
Reviewed by: jhb@, emaste@, pfg@
Differential Revision: https://reviews.freebsd.org/D21378
They follow the conventions set by rw and sx lock probes. There is
an additional lockstat:::lockmgr-disown probe.
Update lockstat(1) to report on contention and hold events for
lockmgr locks. Document the new probes in dtrace_lockstat.4, and
deduplicate some of the existing probe descriptions.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21355
This adds safety net for the case of misconfigured NTB with too big
memory window, for which we may be unable to allocate a memory buffer,
which does not make much sense for the network interface. While there,
fix the code to really work with asymmetric window sizes setup.
This makes driver just print warning message on boot instead of hanging
if too large memory window is configured.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
expression howmany(BBSIZE, PAGE_SIZE), where BBSIZE is the size of the
boot block area. That can be less than 2 if PAGE_SIZE is big.
swapon(8) has an option to trim (delete) all the blocks of a device at
startup. However, if the first of those blocks is a bsd label, then
trimming those blocks is destructive. Change swapon to leave the
first BBSIZE bytes untrimmed.
Update manual pages to reflect changes in how swapon and how it may be
used, espeically in association with savecore.
Reviewed by: alc
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21191
In r342974 jhibbits added support to build crtsavres.o. This was the
blocker for BSD_CRTBEGIN to be enabled there. As such enable this
option again.
Reviewed by: jhibbits
Sponsored by: DARPA, AFRL
The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.
mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format. Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned. This happened in about 1 out of every 512
cases. The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect.
Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).
mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.
* Input block size validation was pulled out of individual compression
init routines into main().
* Init routines now validate a user-provided compression level or select
an algorithm-specific default, if none was provided.
* A new interface for calculating the maximal compressed size of an
incompressible input block was added for each driver. The generic code
uses it to validate against MAXPHYS as well as to allocate compression
result buffers in the generic code.
* Algorithm selection is now driven by a table lookup, to increase ease of
adding other formats in the future.
mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.
Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'. '-L' is considered deprecated, but will probably never be removed.
All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.
Relnotes: yes
Follow-up on r322318 and r322319 and remove the deprecated modules.
Shift some now-unused kernel files into userspace utilities that incorporate
them. Remove references to removed GEOM classes in userspace utilities.
Reviewed by: imp (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21249
rcmds were removed in r32435 and these three man pages can trivially
drop the references.
There's still a reference in pts.4 because it describes a mode
(TIOCPKT_NOSTOP), and only lists rlogin/rlogind as examples of programs
that use that mode. To update later.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
More extensive changes to this page are certainly needed, but at least
remove references to binaries that no longer exist.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The driver was originally written with the name ads1115, but at the last
minute it got renamed to ads111x to reflect its support for many related
chips, but I forgot to update the manpage to match the renaming before
committing it all.
As suggested in:
https://wiki.freebsd.org/WhatsGoing/FreeBSD13
We will be dropping the snd_ds1 driver. The driver is known to be buggy
and no one has been working on it for years now.
Users of old Yamaha cards may have luck with the OSS drivers instead.
MFC after: 3 days
Diferential Revision: https://reviews.freebsd.org/D21138
It is part of -Wformat, which is enabled by -Wall. Empty format strings are
well defined and it is perfectly reasonable to expect them in a formatting
interface.
Similar to what was done for device_printfs in r347229.
Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.
Reviewed by: markj
Discussed with: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21165
The API is used to gracefully terminate text line(s) with a single \n. If
the formatted buffer was empty or already ended in \n, it is unmodified.
Otherwise, a newline character is appended to it. The API, like other
sbuf-modifying routines, is only valid while the sbuf is not FINISHED.
Reviewed by: rlibby
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21030
The goal is to avoid some kinds of low-memory deadlock when formatting
heap-allocated buffers.
Reviewed by: vangyzen
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21015
with Communication Device Class Ethernet Emulation Model (CDC EEM).
The driver supports both the device, and host side operation; there
is a new USB template (#11) for the former.
This enables communication with virtual USB NIC provided by iLO 5,
as found in new HPE Proliant servers.
Reviewed by: hselasky
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Hewlett Packard Enterprise
Don't mark nvme as broken on aarch64. It compiles, at least, and people are
testing it out. This only enables the userland parts of the nvme stack.
Submitted by: greg at unrelenting technologies
Differential Revision: https://reviews.freebsd.org/D21168
This commit imports the new fusefs driver. It raises the protocol level
from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and
adds many new features. New features include:
* Optional kernel-side permissions checks (-o default_permissions)
* Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK
* Allow interrupting FUSE operations
* Support named pipes and unix-domain sockets in fusefs file systems
* Forward UTIME_NOW during utimensat(2) to the daemon
* kqueue support for /dev/fuse
* Allow updating mounts with "mount -u"
* Allow exporting fusefs file systems over NFS
* Server-initiated invalidation of the name cache or data cache
* Respect RLIMIT_FSIZE
* Try to support servers as old as protocol 7.4
Performance enhancements include:
* Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags
* Cache file attributes
* Cache lookup entries, both positive and negative
* Server-selectable cache modes: writethrough, writeback, or uncached
* Write clustering
* Readahead
* Use counter(9) for statistical reporting
PR: 199934 216391 233783 234581 235773 235774 235775
PR: 236226 236231 236236 236291 236329 236381 236405
PR: 236327 236466 236472 236473 236474 236530 236557
PR: 236560 236844 237052 237181 237588 238565
Reviewed by: bcr (man pages)
Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit
review on project branch)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Pull Request: https://reviews.freebsd.org/D21110
Instances of the device can be configured using hints or FDT data.
Interfaces to reconfigure the chip and extract voltage measurements from
it are available via sysctl(8).
Having the full uname output can be useful on head even with
unmodified trees or trees that newvers.sh fails to recognize as
modified.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D20895
Some warning flags are valid for C++ but not C. GCC 8 complains if you pass
such flags when building a C file. Using a separate variable for these
flags allows building both C and C++ files in the same directory (such as
the fusefs tests) under GCC.
Reviewed by: cem, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21116
* Initialize the alpha parameter to a conservative value (like Linux)
* Improve handling of arithmetic.
* Improve man-page
Obtained from: Richard Scheffenegger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20549
r350315 created a Linux compatible copy_file_range(2) syscall.
It uses a VOP method called VOP_COPY_FILE_RANGE so that file systems,
such as the NFSv4.2 client can do file system specific copying.
For NFSv4.2, this allows the copying to be done locally on the NFS server,
avoiding transferring the data across the wire twice.
This is a new man page (content changed).
Reviewed by: kib, asomers
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584
Now that we have a way to obtain entropy in capability mode
(getrandom(2)), libcap_random is obsolete. Remove it.
Bump __FreeBSD_version in case anything happens to use it, though I've
found no consumers.
Reviewed by: delphij, emaste, oshogbo
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21033
This will be used to gate the fusefs tests. It's a partial merge of r348281
from projects/fuse2.
Reviewed by: kib, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21044
of the TCP TS offset from taking the IP addresses and the TCP port
numbers into account to a version just taking only the IP addresses
into account. This works around broken middleboxes or endpoints.
The default is to keep the behaviour, which is also the behaviour
recommended in RFC 7323.
Reported by: devgs@ukr.net
Reviewed by: rrs@
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D20980
Describe missed functions.
Give some hint about refcount_release(9) memory ordering guarantees.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21020
It was changed from int to register_t in r22521 and from register_t to long
in r328099, but the man page wasn't updated either time.
MFC after: 2 weeks
There are some explicit comparisions of refcount_release(9) result
with 0/1, which are fine.
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21014
Update login(1), its manual pages, similar utilities, and motd.5 to refer to
the new location.
Suggested by: delphij@ (re: r349256)
Reviewed by: bcr (manpages), delphij
Differential Revision: https://reviews.freebsd.org/D20721
Allow ABI to be over ridden to allow (with other changes) programs to be
built targeting ABIs other than the default. This is used in CheriBSD.
Reviewed by: imp
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D21001
We don't split the other man pages in their own package so do the same for runtime.
Reviewed by: bapt, gjb
Differential Revision: https://reviews.freebsd.org/D20962
This reworks my last commit in r301285 to more closely match what was in
r241298 (but reverted in r294878).
This is addressing "missing .meta file" rebuilds but also ensuring that
files are always generated when needed in each case.
Note that this is not a complete rework of the problem areas identified
in r301285 as most are "good enough" right now as the new pattern
is too verbose. It's only worth making this current change where headers
may be generated in the INCS list; where missing .meta file rebuilds are
spotted.
--- Technical details follow ---
Several attempts to deal with this problem of multi-output targets, with and
without META MODE, were explained in r241298, r294878, and r301285.
The general problem is with multi-output targets such as:
foo.c foo.h:
touch foo.c foo.h
foo.c foo.h:
touch foo.c
touch foo.h
foo.c foo.h: foo.in
./generator ${.ALLSRC}
This pattern is problematic in jobs mode as both files end up being
built concurrently and leads to races. With META MODE it is worse
as both targets end up rebuilding if they lack a .meta file. So the
generator is force built twice even though it is only needed once.
There are also problems in that 'make foo.h' may be ran before 'make foo.c';
The order of make generating the targets is not guaranteed.
An older attempted workaround to this (discussed in r294878) was:
foo.h: foo.c
foo.c: foo.in
./generator ${.ALLSRC}
This appears fine except that if foo.h is missing and foo.c exists then
foo.h will never be regenerated. This pattern is close to the solution
in this commit though:
foo.h: foo.c .NOMETA
.if !exists(foo.h)
foo.c: .PHONY .META
.endif
foo.c: foo.in
./generator ${.ALLSRC}
There's 2 differences here:
1. foo.h will never expect to have a .meta file since the foo.c target
will generate both and own the .meta file.
2. If foo.h does not exist then it needs to force foo.c to be rebuilt
with .PHONY. That normally disables META MODE though so .META is
given to tell bmake we do really expect a .meta file.
This pattern cannot work with implicit suffix rules since the .c and .h files
may be generated at different times (buildincludes vs depend/all).
Sponsored by: Dell EMC
MFC after: 2 weeks
The nvme drive dumps only the most relevant details about a command when it
fails. However, there are times this is not sufficient (such as debugging weird
issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1
in loader.conf will enable more complete debugging information about each
command that fails.
Reviewed by: rpokala
Sponsored by: Netflix
Differential Version: https://reviews.freebsd.org/D20988
zone.tab is deprecated. Install zone1970.tab alongside it, and use it
for tzsetup(8). This is also useful for other applications that need
the modern better maintained file.
Reviewed by: philip
Approved by: allanjude (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20646
with various laptops using hdaa(4) sound devices. We don't seem to know
the "correct" configurations for these devices and the defaults are far
superiour, e.g. they work if you don't nuke the default configs.
PR: 200526
Differential Revision: https://reviews.freebsd.org/D17772
Add format capability to core file names to include signal
that generated the core. This can help various validation workflows
where all cores should not be considered equally (SIGQUIT is often
intentional and not an error unlike SIGSEGV or SIGBUS)
Submitted by: David Leimbach (leimy2k@gmail.com)
Reviewed by: markj
MFC after: 1 week
Relnotes: sysctl kern.corefile can now include the signal number
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20970
Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely. Otherwise, rogue userspace livelocks the
kernel.
To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison. The primitive no longer retries
the operation if it failed spuriously. Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.
On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.
Reviewed by: andrew (arm64, previous version), markj
Tested by: pho
Reported by: https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D20772
The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.
This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.
No functional change is intended. __FreeBSD_version is bumped.
Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247
Change to use registers instead of register, as it is customary to use
plural when talking about PCI registers.
This was missed in r349150.
MFC after: 3 days
Submitted by: Ka Ho Ng <khng300 at gmail.com>
Reviewed by: mckusick
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20695
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages. It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.
For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer. This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers). It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused. To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.
Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.
NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability. This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.
If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output. If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Discussed with: ae, kp (firewalls)
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
This change was originally in D20378. Making it in a new diff since it's a
bugfix.
Submitted by: alfredo.junior_eldorado.org.br
Reviewed by: emaste, luporl
Differential Revision: https://reviews.freebsd.org/D20756
The epoch_drain_callbacks() function is used to drain all pending
callbacks which have been invoked by prior epoch_call() function calls
on the same epoch. This function is useful when there are shared
memory structure(s) referred to by the epoch callback(s) which are not
refcounted and are rarely freed. The typical place for calling this
function is right before freeing or invalidating the shared
resource(s) used by the epoch callback(s). This function can sleep and
is not optimized for performance.
Differential Revision: https://reviews.freebsd.org/D20109
MFC after: 1 week
Sponsored by: Mellanox Technologies
Previously fusefs would never recycle vnodes. After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them. This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.
Sponsored by: The FreeBSD Foundation
counter(9) is more performant than using atomic instructions to update
sysctls that just report statistics to userland.
Sponsored by: The FreeBSD Foundation
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
Reviewed by: mizhka
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20459
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis. If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache. If not, then
they'll get the writethrough cache. If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.
The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later. However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.
This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
bypassed.
Sponsored by: The FreeBSD Foundation
"at" keyword is documented in device.hints(5) for all buses, but it does
hurt to add another reference to it.
"pins" keyword is specific to gpiobus.
At least these two hints should be configured for any gpiobus device on
a hints based system.
MFC after: 10 days
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.
Numerous posts to arch@ and other locations have found no actual users
for this software.
Relnotes: Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
Summary:
Toolchain follow-up to r349350. LLVM patches will be submitted upstream for
9.0 as well.
The bsd.cpu.mk change is required because GNU ld assumes BSS-PLT if it
cannot determine for certain that it needs Secure-PLT, and some binaries do
not compile in such a way to make it know to use Secure-PLT.
Reviewed By: nwhitehorn, bdragon, pfg
Differential Revision: https://reviews.freebsd.org/D20598
Summary:
Adds a list of valid CPUTYPE flags for arm and arm64 architectures
List taken from share/mk/bsd.cpu.mk
Submitted by: Daniel Engberg
Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D20315
'-E' appears on the swapon command line, or if "trimonce" appears as
an fstab option.
Discussed at: BSDCAN
Tested by: markj
Reviewed by: markj
Approved by: markj (mentor)
Differential Revision:https://reviews.freebsd.org/D20599
Sort methods alphabetically. Wrap long lines. Start sentences on a new
line. Remove contractions (not because it's a good idea, just to silence
igor). Add some explanation of the units for the period and duty arguments
and the convention for channel numbers.
interfaces were unified into pwmbus(9), and the PWMBUS_CHANNEL_MAX method
was renamed PWMBUS_CHANNEL_COUNT. The pwmbus_attach_bus() function just
went away completely. Also, fix a few typos such as s/is/if/.
As reported in review D20709 by brooks calling vm_map_protect to set a
new max_protection value downgrades existing mappings if necessary (as
opposed to returning an error).
Reported by: brooks
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
It's implied by the man page's RETURN VALUES section, but be explicit in
the description that vm_map_protect can not set new protection bits that
are already in each entry's max_protection.
Reviewed by: brooks
MFC After: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20709
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
This is still targeting bin/sh cyclic dependency issues. Only apply
guessed dependencies that are explicitly set for an object (which
gnu/lib/cc/cc_tools needs) and if no custom target exists with its
own dependencies.
This was manifesting as a missing yacc.h in usr.bin/mkesdb_static when
built without -j (or -B). No actual yacc.h dependency ordering was
defined but with -j it got lucky and built fine.
Before r349061 the behavior was different for META_MODE but that logic
difference isn't needed.
X-MFC-With: r349061
Sponsored by: DellEMC
NetBSD 7.0 was a separate branch, subsequent 8.x releases did not emerge from
this branch.
Clean up minor visual nits, centre OpenBSD listing on the B, DragonFly
listings on the y.
Add missing words after PCI in the description of the PCIOCWRITE and
PCIOCATTACHED ioctls.
Use singular in PCIOCREAD, we only read one register at the time.
Reviewed by: bcr, bjk, rgrimes, cem
MFC after: 2 weeks
X-MFC-with: r349133
Differential Revision: https://reviews.freebsd.org/D20671
Document the PCIOCATTACHED ioctl(2) in the pci(4) manual.
PCIOCATTACHED is used to query if a driver has attached to a PCI.
Reviewed by: bcr, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20652