For now, just count batched page queue state operations.
vm.stats.page.queue_ops counts the number of batch entries that
successfully completed, while queue_nops counts entries that had no
effect, which occurs when the queue operation had been completed before
the batch entry was processed.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21782
Convert all remaining references to that field to "ref_count" and update
comments accordingly. No functional change intended.
Reviewed by: alc, kib
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21768
leaf 0x15 is not functional.
This should improve automatic TSC frequency determination on
Skylake/Kabylake/... families, where 0x15 exists but does not provide
all necessary information. SDM contains relatively strong wording
against such uses of 0x16, but Intel does not give us any other way to
obtain the frequency. Linux did the same in the commit
604dc9170f2435d27da5039a3efd757dceadc684.
Based on submission by: Neel Chauhan <neel@neelc.org>
PR: 240475
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21777
Check for teken.fg_color and teken.bg_color and prepare the color
attributes accordingly.
When white background is used, make it light to improve visibility.
When black background is used, make kernel messages light.
Both IBM and Freescale programming examples presume the cmpset operands will
favor equal, and pessimize the non-equal case instead. Do the same for
atomic_cmpset_* and atomic_fcmpset_*. This slightly pessimizes the failure
case, in favor of the success case.
MFC after: 3 weeks
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called and the iod lock is held when the NFS node lock
is acquired, the iod mutex will need to be changed to an sx lock as well.
To simply the future commit that changes both the NFS node lock and iod lock
to sx locks, this commit replaces all mtx_lock()/mtx_unlock() calls on the
iod lock with macros.
There is no semantic change as a result of this commit.
I don't know when the future commit will happen and be MFC'd, so I have
set the MFC on this commit to one week so that it can be MFC'd at the same
time.
Suggested by: kib
MFC after: 1 week
that instead of functions only being inside the _KERNEL and
the absence of RATELIMIT causing us to have NULL/error returning
interfaces we ended up with non-kernel getting the error path.
opps..
Doing some tests with very high interrupt rates I've noticed that one of
conditions I added in r232207 to make interrupt threads in most cases
run on local CPU never worked as expected (worked only if previous time
it was executed on some other CPU, that is quite opposite). It caused
additional CPU usage to run full CPU search and could schedule interrupt
threads to some other CPU.
This patch removes that code and instead reuses existing non-interrupt
code path with some tweaks for interrupt case:
- On SMT systems, if current thread is idle, don't look on other threads.
Even if they are busy, it may take more time to do fill search and bounce
the interrupt thread to other core then execute it locally, even sharing
CPU resources. It is other threads should migrate, not bound interrupts.
- Try hard to keep interrupt threads within LLC of their original CPU.
This improves scheduling cost and supposedly cache and memory locality.
On a test system with 72 threads doing 2.2M IOPS to NVMe this saves few
percents of CPU time while adding few percents to IOPS.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
is a completely separate TCP stack (tcp_bbr.ko) that will be built only if
you add the make options WITH_EXTRA_TCP_STACKS=1 and also include the option
TCPHPTS. You can also include the RATELIMIT option if you have a NIC interface that
supports hardware pacing, BBR understands how to use such a feature.
Note that this commit also adds in a general purpose time-filter which
allows you to have a min-filter or max-filter. A filter allows you to
have a low (or high) value for some period of time and degrade slowly
to another value has time passes. You can find out the details of
BBR by looking at the original paper at:
https://queue.acm.org/detail.cfm?id=3022184
or consult many other web resources you can find on the web
referenced by "BBR congestion control". It should be noted that
BBRv1 (which this is) does tend to unfairness in cases of small
buffered paths, and it will usually get less bandwidth in the case
of large BDP paths(when competing with new-reno or cubic flows). BBR
is still an active research area and we do plan on implementing V2
of BBR to see if it is an improvement over V1.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D21582
Instead of predicting the MSI-X bar index based on the device's MAC
type, read it from the device's PCI configuration instead.
PR: 239704
Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by: erj@
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21547
From Piotr:
r351152 introduced iflib_deregister() function calling
EVENTHANDLER_DEREGISTER() to unregister VLAN events. This patch removes
duplicate of EVENTHANDLER_DEREGISTER() calls placed in
iflib_device_deregister() as this function is now calling
iflib_deregister(). This is to avoid deregistering same event twice.
This patch also adds check in iflib_vlan_register() to prevent
registering VLAN while being in detach.
Patch co-authored by Krzysztof Galazka <krzysztof.galazka@intel.com>,
erj <erj@FreeBSD.org> and Jacob Keller <jacob.e.keller@intel.com>.
Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by: gallatin@, erj@
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21711
For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.
I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.
Suggested by: kib
MFC after: 1 week
- track the total count of hot entries
- pre-read the lock when shrinking since it is typically already taken
- place the lock in its own cacheline
- shorten the hold time of hot lock list when zapping
Sponsored by: The FreeBSD Foundation
- For each queue pair precalculate CPU and domain it is bound to.
If queue pairs are not per-CPU, then use the domain of the device.
- Allocate most of queue pair memory from the domain it is bound to.
- Bind callouts to the same CPUs as queue pair to avoid migrations.
- Do not assign queue pairs to each SMT thread. It just wasted
resources and increased lock congestions.
- Remove fixed multiplier of CPUs per queue pair, spread them even.
This allows to use more queue pairs in some hardware configurations.
- If queue pair serves multiple CPUs, bind different NVMe devices to
different CPUs.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
There is no reason for these routines to be written in assembly. In
the ports of DTrace to other platforms, they are already written in C.
No functional change intended.
MFC after: 1 week
Sponsored by: Netflix
The direct map is never used for execution of code, so we might as well
set NX in the direct map's PML4Es. Also clarify the intent of the code
in create_pagetables() that restricts access protections on the region
of the direct map mapping the kernel text.
Reviewed by: alc, kib (previous version)
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21759
This is required for DPCPU and VNET data variable definitions to work when
KLDs are linked as DSOs. R_X86_64_RELATIVE relocations should not appear
in object files, so assert this in elf_relocaddr().
Reviewed by: kib
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21755
There does not appear to be any existing need for such mappings to be
executable.
Reviewed by: alc, kib
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21754
SYSINIT usage was added, but the <sys/kernel.h> dependency was not added.
This worked by coincidence, as most of the mips configs have DDB enabled and
pmap.c gets <sys/kernel.h> via ddb.h pollution.
Reported by: dim
The two options are
* nocover/cover: Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir: Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.
Neither of these options is intended to be a default, for historical and
compatibility reasons.
Reviewed by: allanjude, kib
Differential Revision: https://reviews.freebsd.org/D21458
by defining pg_nx as zero for non-PAE and correspondingly simplifying
some expressions.
Suggested and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21757
Decode PAT_UNCACHED.
When unknown pat mode is encountered, print the pte bits combination
instead of the index, which is always 8.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21738
Clang sees this construct and warns that adding an int to a string like this
does not concatenate the two. Fortunately, this is not what octeon-sdk
actually intended to do, so we take the path towards remediation that clang
offers: use array indexing instead.
These appear in octeon-sdk -- there are new releases, but they don't seem to
address the running issues in octeon-sdk. GCC4.2 is more than happy, but
clang is much less-so and most of them are fairly innocuous and perhaps a
by-product of their style guide, which may make some of the changes harder
to upstream (if this is even possible anymore).
This avoids a double lock bug in the NAT colliding state processing
of SCTP. Thanks to Felix Weinrank for finding and reporting this issue in
https://github.com/sctplab/usrsctp/issues/374
He found this bug using fuzz testing.
MFC after: 3 days
before computing the RTO.
This should fix an overflow issue reported by Felix Weinrank in
https://github.com/sctplab/usrsctp/issues/375
for the userland stack and found by running a fuzz tester.
MFC after: 3 days
Add a small wrapper around libzfs_core's lzc_send_space() to libzfs so
that every legacy ZFS_IOC_SEND consumer, along with their userland
counterpart estimate_ioctl(), can leverage ZFS_IOC_SEND_SPACE to
request send space estimation.
The legacy functionality in zfs_ioc_send() is left untouched for
compatibility purposes.
Obtained from: ZoL
Obtained from: zfsonlinux/zfs@cf7684bc8d
Author: loli10K <ezomori.nozomu@gmail.com>
MFC after: 2 weeks
within a critical section, we must perform a lock-free check on the
faulting address.
Reported by: andrew
Reviewed by: andrew, markj
X-MFC with: r350579
Differential Revision: https://reviews.freebsd.org/D21685