pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.
While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.
core2_intr(cpu, tf) ->
pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
pmc_add_sample(cpu, ring, pm, tf, inuserspace)
In the process of optimizing the tail call the tf pointer was getting
clobbered:
(kgdb) up
at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709 pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
resulting in a crash in pmc_save_kernel_callchain.
Nothing in the tree uses it and pcpu zones have a fundamentally different use
case than the regular zones - they are not supposed to be allocated and freed
all the time.
This reduces pollution in the allocation fast path.
(or peel off the band-aid, whatever floats your boat)
This addresses two separate issues:
1.) Nothing within bsdgrep actually knew whether it cared about line numbers
or not.
2.) The file layer knew nothing about the context in which it was being
called.
#1 is only important when we're *not* processing line-by-line. #2 is
debatably a good idea; the parsing context is only handy because that's
where we store current offset information and, as of this commit, whether or
not it needs to be line-aware.
memset fills the target buffer from a byte-sized value passed in as the
second argument.
The fully-sized (8 bytes) register containing it is named %rsi. Lower 4 bytes
can be referred to as %esi and finally the lowest byte is %sil.
Vast majority of all the callers just zero the target buffer and set it up by
doing xor %esi,%esi which has a side-effect of zeroing the upper parts of
the register as well. Some others do a word-sized move to %esi which has the
same result.
However, there are callers which only fill %sil. This does *not* clear up
the rest of the register.
The value of %rsi is multiplied by $0x0101010101010101 to create a 8-byte sized
pattern for 8-byte stores.
Prior to the patch, the func just blindly took %rsi assuming the unwanted bytes
are zeroed out. Since this is not the case for the callers which only play with
%sil (the rest of the register can have absolutely anything), the resulting
pattern can be garbage.
This has potential for funny bugs. One side effect (which was not amusing)
after enabling it instead of bzero was that the kernel was hanging on boot
as a xen domU.
Reported by: Trond Endrestøl <Trond.Endrestol fagskolen.gjovik.no>
Pointy hat: me
trashing freed memory and checking that allocated memory is properly
trashed, and also of keeping a bitset of freed items. Trashing/checking
creates a lot of CPU cache poisoning, while keeping debugging bitsets
consistent creates a lot of contention on UMA zone lock(s). The performance
difference between INVARIANTS kernel and normal one is mostly attributed
to UMA debugging, rather than to all KASSERT checks in the kernel.
Add loader tunable vm.debug.divisor that allows either to turn off UMA
debugging completely, or turn it on only for a fraction of allocations,
while still running all KASSERTs in kernel. That allows to run INVARIANTS
kernels in production environments without reducing load by orders of
magnitude, but still doing useful extra checks.
Default value is 1, meaning debug every allocation. Value of 0 would
disable UMA debugging completely. Values above 1 enable debugging only
for every N-th item. It isn't possible to strictly follow the number,
but still amount of debugging is reduced roughly by (N-1)/N percent.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15199
given interval, which is counted in seconds since exit of the previous
invocation of the job. Example user crontab entry:
@25 sleep 10
The example will launch 'sleep 10' every 35 seconds. This is a rather
useless example above, but clearly explains the functionality.
The practical goal here is to avoid overlap of previous job invocation
to a new one, or to avoid too short interval(s) for jobs that last long
and doesn't have any point of immediate launch soon after previous run.
Another useful effect of interval jobs can be noticed when a cluster of
machines periodically communicates with a single node. Running the task
time based creates too much load on the node. Running interval based
spreads invocations across machines in cluster. Note that -j/-J won't
help in this case.
Sponsored by: Netflix
Changed excise_initrd_region to support both 32- and 64-bit
values for linux,initrd-start and linux,initrd-end.
This fixes the boot problem on some machines after rS334485.
Submitted by: Luis Pires <lffpires@ruabrasil.org>
Reviewed by: jhibbits, leitao
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15667
myriad ways that the various compliers treat this. The
only safe prefetch appears to be for AMD. The other
compilers either are not volatile or are not const :(
Reported by: Michael Tuexen
When we're at our vnode limit, getnewvnode will call into the vnode LRU
cache to free up vnodes. If the vnode we try to recycle is a ZFS vnode we
end up, eventually, in zfs_rmnode. If the ZFS vnode we're recycling
represents something with extended attributes, zfs_rmnode will call
zfs_zget which will attempt to allocate another vnode. If the next vnode we
try to recycle is also a ZFS vnode representing something with extended
attributes we can recurse further. This ends up being unbounded and can end
up overflowing the stack.
In order to avoid this, restructure zfs_rmnode to simply add the extended
attribute directory's object ID to the unlinked set, thus not requiring the
allocation of a vnode. We then schedule a task that calls zfs_unlinked_drain
which will do the work of properly marking the vnodes for unlinking.
zfs_unlinked_drain is also called on mount so these will be cleaned up
there.
Reviewed by: avg, mav
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D15342
Admittedly, this is a clang-scan complaint... but it wasn't wrong. fts_flags
is initialized by all cases in the switch(), which should be fairly obvious.
Annotate this anyways.
Neither procfile nor grep_tree return anything meaningful to their callers.
None of the callers actually care about how many lines were matched in all
of the files they processed; it's all about "did anything match?"
This is generally just a light refactoring to remind me of what actually
matters as I'm rewriting these bits to care less about 'stuff'.
than $4. Introduce $BASEBITSDIR for clarity and to avoid repeating this
mistake in the future. Fixing this ensures that we pick up newly built
boot bits native to the target rather for/from the host.
- Apply some of the argument quoting fixes done in r287635 but missing in
later revisions.
Rack includes the following features:
- A different SACK processing scheme (the old sack structures are not used).
- RACK (Recent acknowledgment) where counting dup-acks is no longer done
instead time is used to knwo when to retransmit. (see the I-D)
- TLP (Tail Loss Probe) where we will probe for tail-losses to attempt
to try not to take a retransmit time-out. (see the I-D)
- Burst mitigation using TCPHTPS
- PRR (partial rate reduction) see the RFC.
Once built into your kernel, you can select this stack by either
socket option with the name of the stack is "rack" or by setting
the global sysctl so the default is rack.
Note that any connection that does not support SACK will be kicked
back to the "default" base FreeBSD stack (currently known as "default").
To build this into your kernel you will need to enable in your
kernel:
makeoptions WITH_EXTRA_TCP_STACKS=1
options TCPHPTS
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15525
pagetables.
physmap[] can be inconsistent with the physical memory limit due to
buggy bios, or to the hw.physmem tunable. Since bootstrap pagetables
are initialized by accesses through the DMAP, we must ensure that DMAP
really cover the selected pages. This is only relevant when machine
has less than 4G RAM and buggy BIOS, which is the combination on Acer
Chromebook 720.
The call to mp_bootaddress() is moved later to have Maxmem initialized.
An alternative could be to always cover 4G for DMAP, but this change
seems to be simpler.
Reported and tested by: grembo
Reviewed by: royger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15675
installworld should not be executing this anyhow but there is some
obscure case doing it still. The head(1) binary is not part of
ITOOLS and there's no need to add it.
MFC after: 1 week
Sponsored by: Dell EMC
Fix the behavior of ofw_fdt_getprop() and ofw_fdt_getprop() functions to match
the documentation as the non-fdt code.
Submitted by: Luis Pires <lffpires@ruabrasil.org>
Reviewed by: manu, jhibbits
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D15680
Expected NMI-s are those than are either generated by the software (such
as a CPU sending NMI to other CPU) or generated by the hardware after
the software configured it to do so (such as NMI-s on PMC events).
Some unexpected NMI-s can be caused by hardware failures and it is
possible to inquire the hardware about them (somewhat like MCA but much
more primitive) using an EISA mechanism. In some cases the origin of
the NMI can remain truly unknown.
This commit should not change any functionality. It just reorganizes
the code, so that it is easier to extend with new checks for the origin
of the NMI. Also, it frees the code that has nothing to do with ISA
from DEV_ISA.
MFC after: 3 weeks
On PowerNV systems, the rootfs is passed through kexec, which loads the rootfs
into memory and set two fdt entries to describe where the file is located in
the memory;
I need to pass this memory region to the md device as a mfs_root, but, current
md driver does not support two things:
* Just getting a pointer from an external (bootloader) memory. If I need to
workaround it, I would need to declare a static array and memcopy from this
external memory to this static variable.
* The size of the image. The usage of mfs_root_end, which is not a pointer,
seems to be not possible for this prestaged scenario.
This patch simply adds a new way to load mfs_root from memory.
Differential Revision: https://reviews.freebsd.org/D15625
Approved by: kib, jhibbits (mentor)
If the ipfw module is not loaded the net.inet.ip.fw.enable OID does not exist,
which leads the script to report errors and incorrectly report that ipfw is
enabled.
ixl(4) (when it switches over to using iflib) devices need the TCP header
length in order to do TCP checksum offload.
Reviewed by: gallatin@, shurd@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D15558
PowerPC has PAGE_SIZE as a long, not an int. This causes the compiler to throw
a format mismatch warning on this print. To work around the difference, print
it as a long instead of an int, and force the argument to a long.
Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D15653
controller that tries to handle early invocations of the controller,
in other words, invocations before the expected end of the interval.
However, there were some calculation errors in this early invocation
case. Notably, if an early invocation occurred while the error was
negative, the derivative term was off by a large amount. One visible
effect of this error was that processes were being killed by the
virtual memory system's OOM killer when in fact there was plentiful
free memory.
Correct a couple minor errors in the sysctl descriptions, and apply
some style fixes.
Reviewed by: jeff, markj
- add '-j' options to filter to enable converting native pmc
log format to json lines format to enable the use of scripts
and external tooling
% pmc filter -j pmc.log pmc.jsonl
- Record the tsc value in sampling interrupts as opposed to
recording nanotime when the sample is copied to a global log
in hardclock - potentially many milliseconds later.
- At initialize record the tsc_freq and the time of day to give
us an offset for translating the tsc values in callchain records