r336020 introduced pcpu_page_alloc(), replacing page_alloc() as the
backend allocator for PCPU UMA zones. Unlike page_alloc(), it does
not honour malloc(9) flags such as M_ZERO or M_NODUMP, so fix that.
r336020 also changed counter(9) to initialize each counter using a
CPU_FOREACH() loop instead of an SMP rendezvous. Before SI_SUB_CPU,
smp_rendezvous() will only execute the callback on the current CPU
(i.e., CPU 0), so only one counter gets zeroed. The rest are zeroed
by virtue of the fact that UMA gratuitously zeroes slabs when importing
them into a zone.
Prior to SI_SUB_CPU, all_cpus is clear, so with r336020 we weren't
zeroing vm_cnt counters during boot: the CPU_FOREACH() loop had no
effect, and pcpu_page_alloc() didn't honour M_ZERO. Fix this by
iterating over the full range of CPU IDs when zeroing counters,
ignoring whether the corresponding bits in all_cpus are set.
Reported and tested by: pho (previous version)
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D16190
- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
(defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)
- Allocate page from the correct domain for a given cpu.
- Don't initialize pc_domain to non-zero value if NUMA is not defined
There are some misconceptions surrounding this field. It is the
_VM_ NUMA domain and should only ever correspond to valid domain
values as understood by the VM.
The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.
Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15933
counter_rate will be greater than ticks, resulting in counter_ratecheck()
failure. To fix this take an absolute value of the difference between
ticks and cr_ticks.
Reported by: jtl
Sponsored by: Netflix
Turns out there is code which ends up passing M_ZERO to counters.
Since counters zero unconditionally on their own, just ignore drop the
flag in that place.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Issues were noted by Bruce Evans and are present on all architectures.
On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.
On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot. The effect would be a counter going off by
arbitrary value after zeroing. Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive. On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.
PowerPC64 is treated the same as amd64.
For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching. It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm). If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.
Noted by: bde
Sponsored by: The FreeBSD Foundation
prevents us from creating UMA_ZONE_PCPU zones earlier.
As bandaid shift initialization of counter(9) zone later.
Reviewed by: kib
Reported & tested by: Lytochkin Boris <lytboris gmail.com>
Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.
See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.
In collaboration with: kib
Reviewed by: luigi
Tested by: ae, ray
Sponsored by: Nginx, Inc.