in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions. Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.
Deduplicate the implementations for the rest of the platforms. This
will make it easier to implement in_cksum() for unmapped mbufs. On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.
No functional change intended.
Reviewed by: kp, glebius
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33095
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998. No functional change intended.
Reviewed by: kp, glebius, cy
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33093
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31991
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
It is useful for quickly checking an address against the DMAP region.
These definitions exist already on arm64 and riscv.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D32962
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
pipe requires no special handling.
ofreebsd32_sigpending did differ from osigpending in that it acted
on the siglist rather than the sigqueue, but this appears to be an
oversight in 3fbdb3c215.
ogetpagesize could theoretically have ABI-dependent results, but in
practice does not. If it does it would be easy handle in the central
implementation and be the least of the problems in changing the value of
PAGE_SIZE.
Reviewed by: kevans
We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.
Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers. Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().
Reviewed by: alc, kib
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32792
This is in preference to simply filling the cpuset, and allows the
conditional in pmap_invalidate_cpu_mask() to be elided.
Also export pmap_invalidate_cpu_mask() outside of pmap.c for use in a
subsequent commit.
Suggested by: kib
Reviewed by: alc, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32792
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm. Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.
Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision: https://reviews.freebsd.org/D32964
The MINIMAL configs were overlooked. They are compiled as part of
universe, so this broke universe builds. Add the same defafults as for
GENERIC.
Sponsored by: Netflix
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
This moves linux_ptrace.c from sys/amd64/linux/ to sys/compat/linux/,
making it possible to use it on architectures other than amd64.
It also enables Linux ptrace(2) on arm64.
Relnotes: yes
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32868
When working on the ports these functions were slightly different, but
now there's no reason for them to be separate.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Make sys/amd64/linux/linux_ptrace.c machine-independent,
in preparation for moving it into sys/compat/linux/.
No functional changes.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32756
Note that this is largely untested at this point, as was
the previous version; I'm committing this mostly to get
rid of `struct linux_pt_reg`.
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32735
Previously it returned a shorter struct. I can't find any
modern software that uses it, but tests/ptrace from strace(1)
repo complained.
Differential Revision: https://reviews.freebsd.org/D32601
Translate ERESTART into Linux "internal" errno ERESTARTSYS.
This fixes the erestartsys.gen.test from strace(1).
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32623
to match the added accounting of the top-level page table pages.
Reviewed by: markj
Tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32569
both for kernel and user page tables, the later exist in the PTI case.
Reviewed by: markj
Tested by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32569
This fixes panic when trying to run strace(8) from Focal.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32355
The virtual LAPIC driver uses callouts to implement the LAPIC timer.
Callouts are armed using callout_reset_sbt(), which currently puts
everything on CPU 0. On systems running many bhyve VMs this results in
a large amount of contention for CPU 0's callout lock.
Modify vlapic to schedule callouts on the local CPU instead. This
allows timer interrupts to be scheduled more evenly among CPUs where
bhyve is running.
Reviewed by: grehan, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32559
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
This is a re-application of commit
9068f6ea69.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
The root page is not zeroed at allocation time since with 4-level tables
each entry is copied from a template. However, with 5-level tables only
a single entry is filled, so the rest need to be cleared.
Reported by: alc
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32525
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.
PR: 259036
Requested by: John Hay <john@sanren.ac.za>
Discussed with: ian, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
We actually do not know is it safe or not to flush cache for random
BAR/register page existing in the system. It is well-known that for
instance LAPICs cannot tolerate cache flush. As report indicates,
there are more such devices.
This issue typically affects AMD machines which do not report self-snoop,
causing real CLFLUSH invocation on the mapped pages. Intels do self-snoop,
so this change should be nop for them, and unsafe devices, if any, are
already ignored.
Reported and tested by: manu
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
Similar to pmap_page_set_memattr() by setting MD page cache attribute
to the argument. Unlike pmap_page_set_memattr(), does not flush cache
for the direct mapping of the page.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884