Commit Graph

563 Commits

Author SHA1 Message Date
Mark Johnston
31218f3209 riscv: Add support for enabling SV48 mode
This increases the size of the user map from 256GB to 128TB.  The kernel
map is left unchanged for now.

For now SV48 mode is left disabled by default, but can be enabled with a
tunable.  Note that extant hardware does not implement SV48, but QEMU
does.

- In pmap_bootstrap(), allocate a L0 page and attempt to enable SV48
  mode.  If the write to SATP doesn't take, the kernel continues to run
  in SV39 mode.
- Define VM_MAX_USER_ADDRESS to refer to the SV48 limit.  In SV39 mode,
  the region [VM_MAX_USER_ADDRESS_SV39, VM_MAX_USER_ADDRESS_SV48] is not
  mappable.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34280
2022-03-01 09:39:44 -05:00
Mark Johnston
6ce716f7c3 riscv: Add support for dynamically allocating L1 page table pages
This is required in SV48 mode.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34279
2022-03-01 09:39:44 -05:00
Mark Johnston
1321117200 riscv: Handle four-level page tables in various pmap traversal routines
Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34278
2022-03-01 09:39:44 -05:00
Mark Johnston
ceed61483c riscv: Maintain the allpmaps list only in SV39 mode
When four-level page tables are used, there is no need to distribute
updates to the top-level page to all pmaps.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34277
2022-03-01 09:39:44 -05:00
Mark Johnston
5cf3a8216e riscv: Add pmap helper functions required by four-level page tables
No functional change intended.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34276
2022-03-01 09:39:44 -05:00
Mark Johnston
4337979236 riscv: Try to improve the comments for locore's page table setup
No functional change intended.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34275
2022-03-01 09:39:44 -05:00
Mark Johnston
ecaf115434 riscv: Conditionally modify the ELF64 sysentvec for SV48
A sysinit determines whether the pmap has enabled SV48 mode and modifies
the corresponding fields which describe the user memory map.

Reviewed by:	kib, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34274
2022-03-01 09:39:43 -05:00
Mark Johnston
35d0f443cf riscv: Define a SV48 memory map
No functional change intended.

Reviewed by:	kib, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34273
2022-03-01 09:39:43 -05:00
Mark Johnston
59f192c507 riscv: Add various pmap definitions needed to support SV48 mode
No functional change intended.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34272
2022-03-01 09:39:43 -05:00
Mark Johnston
2e956c30ca riscv: Use generic CSR macros for writing SATP
Instead of having the one-off load_satp(), just use csr_write().  No
functional change intended.

Reviewed by:	alc, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34271
2022-03-01 09:39:43 -05:00
Mark Johnston
82f4e0d0f0 riscv: Rename struct pmap's pm_l1 field to pm_top
In SV48 mode, the top-level page will be an L0 page rather than an L1
page.  Rename the field accordingly.  No functional change intended.

Reviewed by:	alc, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34270
2022-03-01 09:39:43 -05:00
Mark Johnston
d5c0a7b6d3 riscv: Fix another race in pmap_pinit()
Commit c862d5f2a7 ("riscv: Fix a race in pmap_pinit()") did not really
fix the race.  Alan writes,

 Suppose that N entries in the L1 tables are in use, and we are in the
 middle of the memcpy().  Specifically, we have read the zero-filled
 (N+1)st entry from the kernel L1 table.  Then, we are preempted.  Now,
 another core/thread does pmap_growkernel(), which fills the (N+1)st
 entry.  Finally, we return to the original core/thread, and overwrite
 the valid entry with the zero that we earlier read.

Try to fix the race properly, by copying kernel L1 entries while holding
the allpmaps lock.  To avoid doing unnecessary work while holding this
global lock, copy only the entries that we expect to be valid.

Fixes:		c862d5f2a7 ("riscv: Fix a race in pmap_pinit()")
Reported by:	alc, jrtc27
Reviewed by:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34267
2022-02-22 09:26:33 -05:00
Emmanuel Vadot
092a42a6a3 riscv: conf: Remove options EXT_RESOURCES
It is now unused in kernel code.

Reviewed by:	mhorne
MFC after:      1 month
Differential Revision:	https://reviews.freebsd.org/D33838
2022-02-21 17:29:13 +01:00
Navdeep Parhar
e9e7bc8250 cxgbe(4): Changes to the fatal error handler.
* New error_flags that can be used from the error ithread and elsewhere
  without a synch_op.
* Stop the adapter immediately in t4_fatal_err but defer most of the
  rest of the handling to a task.  The task is allowed to sleep, unlike
  the ithread.  Remove async_event_task as it is no longer needed.
* Dump the devlog, CIMLA, and PCIE_FW exactly once on any fatal error
  involving the firmware or the CIM block.  While here, dump some
  additional info (see dump_cim_regs) for these errors.
* If both reset_on_fatal_err and panic_on_fatal_err are set then attempt
  a reset first and do not panic the system if it is successful.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2022-02-18 09:16:14 -08:00
Warner Losh
0987dc5be5 riscv: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-factor ABI options and cannot change size ever.

Differential Revision:	https://reviews.freebsd.org/D34214
2022-02-10 14:32:21 -07:00
Mark Johnston
c862d5f2a7 riscv: Fix a race in pmap_pinit()
All pmaps share the top half of the address space.  With 3-level page
tables, the top-level kernel map entries are not static: they might
change if the kernel map is extended (via pmap_growkernel()) or a 1GB
mapping in the direct map is demoted (not implemented yet).  Thus the
riscv pmap maintains the allpmaps list to synchronize updates to
top-level entries.

When a pmap is created, it is inserted into this list after copying
top-level entries from the kernel pmap.  The copying is done without
holding the allpmaps lock, and it is possible for pmap_pinit() to race
with kernel map updates.  In particular, if a thread is modifying L1
entries, and a concurrent pmap_pinit() copies the old version of the
entries, it might not receive the update.

Fix the problem by copying the kernel map entries after inserting the
pmap into the list.  This ensures that the nascent pmap always receives
updates, though pmap_distribute_l1() may race with the page copy.

Reviewed by:	mhorne, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34158
2022-02-08 13:31:55 -05:00
Mitchell Horne
4e1bc961bb arm64, riscv: handle RB_KDB
This allows entering the debugger at the earliest possible time, if
the '-d' argument is passed to the kernel.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34120
2022-02-01 13:59:54 -04:00
Mitchell Horne
e6ee2b6506 riscv: add ALT_BREAK_TO_DEBUGGER to GENERIC
It allows quickly entering ddb(4) over a serial line.

Reviewed by:	jhb
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D34119
2022-02-01 13:59:54 -04:00
Andrew Turner
548a2ec49b Add PT_GETREGSET
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.

The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.

Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19831
2022-01-27 11:40:34 +00:00
Mitchell Horne
eb81812fb7 riscv: fix unused var in page_fault_handler()
clang warns that p is set-but-not-used, so let's use it.
2022-01-19 17:21:25 -04:00
Mark Johnston
706f4a81a8 exec: Introduce the PROC_PS_STRINGS() macro
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro.  With stack address randomization the
ps_strings address is no longer fixed.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:11:54 -05:00
Mark Johnston
3fc21fdd5f sysent: Add a sv_psstringssz field to struct sysentvec
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 11:42:07 -05:00
Brooks Davis
0910a41ef3 Revert "syscallarg_t: Add a type for system call arguments"
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.

This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
2022-01-12 23:29:20 +00:00
Brooks Davis
1544e0f5d1 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-01-12 22:51:25 +00:00
Mitchell Horne
d72e944812 riscv: gdb(4) support
Add the MD portion required for the gdb stub.

Reviewed by:	jhb (earlier version)
Discussed with:	jrtc27
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33734
2022-01-10 13:40:12 -04:00
John Baldwin
7def1e10b3 bus_dma: Deduplicate locking helper functions.
- Move busdma_lock_mutex to subr_bus_dma.c.

- Move _busdma_lock_dflt to subr_bus_dma.c.  This function was named a
  couple of different things previously.  It is not a public API but
  an internal helper used in place of a NULL pointer.  The prototype
  is in <sys/bus_dma.h> as not all backends include
  <sys/bus_dma_internal.h>.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33694
2022-01-05 13:50:40 -08:00
John Baldwin
85b4607324 Deduplicate bus_dma bounce code.
Move mostly duplicated code in various MD bus_dma backends to support
bounce pages into sys/kern/subr_busdma_bounce.c.  This file is
currently #include'd into the backends rather than compiled standalone
since it requires access to internal members of opaque bus_dma
structures such as bus_dmamap_t and bus_dma_tag_t.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33684
2022-01-05 13:50:40 -08:00
Doug Moore
f1e7a532d1 busdma: _bus_dmamap_addseg repaired
A recent change introduced a one-off error into a test allowing
coalescing chunks into segments.  This fixes that error.

broke a check in _bus_dmamap_addseg on many architectures. This change makes it clear that it is not a particular range that is being boundary-checked, but the proposed union of the two adjacent ranges.
Reported by:	se
Reviewed by:	se
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
Differential Revision:	https://reviews.freebsd.org/D33715
2022-01-02 12:37:05 -06:00
Doug Moore
b496126886 riscv-busdma: Balance parens.
Reported by:	jenkins
Fixes:	c606ab59e7 vm_extern: use standard address checkers everywhere
2021-12-31 02:01:58 -06:00
Doug Moore
c606ab59e7 vm_extern: use standard address checkers everywhere
Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D33685
2021-12-30 22:09:08 -06:00
John Baldwin
254e4e5b77 Simplify swi for bus_dma.
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages.  For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm).  However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi.  I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux.  However, in the current scheme there's no need for the mux.

Instead, remove swi_vm and vm_ih.  Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly.  This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.

One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.

Reviewed by:	imp, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33447
2021-12-28 13:51:25 -08:00
Alan Cox
e161dfa918 Fix pmap_is_prefaultable() on arm64 and riscv
The current implementations never correctly return TRUE. In all cases,
when they currently return TRUE, they should have returned FALSE.  And,
in some cases, when they currently return FALSE, they should have
returned TRUE.  Except for its effects on performance, specifically,
additional page faults and pointless calls to pmap_enter_quick() that
abort, this error is harmless.  That is why it has gone unnoticed.

Add a comment to the amd64, arm64, and riscv implementations
describing how their return values are computed.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33659
2021-12-27 19:17:14 -06:00
Jessica Clarke
434cb1c4a6 riscv: Fix PLIC -Wunused-but-set-variable warnings 2021-12-10 04:51:32 +00:00
Alexander Motin
8493918868 busdma: Remove outdated comments about Giant.
MFC after:	2 weeks
2021-12-09 22:18:53 -05:00
John Baldwin
1a62e9bc00 Add <machine/tls.h> header to hold MD constants and helpers for TLS.
The header exports the following:

- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)

Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).

Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr.  libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).

A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.

Reviewed by:	kib (older version of x86/tls.h), jrtc27
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33351
2021-12-09 13:17:13 -08:00
Brooks Davis
547566526f Make struct syscall_args machine independent
After a round of cleanups in late 2020, all definitions are
functionally identical.

This removes a rotted __aligned(8) on arm. It was added in
b7112ead32 and was intended to align the
args member so that 64-bit types (off_t, etc) could be safely read on
armeb compiled with clang. With the removal of armev, this is no
longer needed (armv7 requires that 32-bit aligned reads of 64-bit
values be supported and we enable such support on armv6).  As further
evidence this is unnecessary, cleanups to struct syscall_args have
resulted in args being 32-bit aligned on 32-bit systems.  The sole
effect is to bloat the struct by 4 bytes.

Reviewed by:	kib, jhb, imp
Differential Revision:	https://reviews.freebsd.org/D33308
2021-12-08 18:45:33 +00:00
Mitchell Horne
0d2224733e Implement GET_STACK_USAGE on remaining archs
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.

Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.

PR:		259157
Reviewed by:	mav, kib, markj (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32580
2021-11-30 11:15:56 -04:00
Mark Johnston
ecbbe83144 netinet: Deduplicate most in_cksum() implementations
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions.  Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.

Deduplicate the implementations for the rest of the platforms.  This
will make it easier to implement in_cksum() for unmapped mbufs.  On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.

No functional change intended.

Reviewed by:	kp, glebius
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33095
2021-11-24 13:31:16 -05:00
Warner Losh
d2bf8c544a riscv: Make machine/regs.h self-contained
Make sys/reg.h self-contained by making riscv's machine/reg.h
self-contained.

Sponsored by:		Netflix
2021-11-23 21:21:17 -07:00
Mitchell Horne
10fe6f80a6 minidump: Use the provided dump bitset
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.

To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.

Reviewed by:	kib, markj, jhb
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31992
2021-11-19 15:05:52 -04:00
Mitchell Horne
1d2d1418b4 minidump: Use provided msgbuf pointer
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.

Reviewed by:	markj, jhb
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31991
2021-11-19 15:05:52 -04:00
Mitchell Horne
681bd71047 minidump: reduce the amount direct accesses to page tables
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.

Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.

Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.

Reviewed by:	kib, markj
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31990
2021-11-19 15:05:52 -04:00
Mitchell Horne
1adebe3cd6 minidump: Parameterize minidumpsys()
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.

This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.

Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.

The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.

Reviewed by:	kib, markj, jhb
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D31989
2021-11-19 15:05:52 -04:00
Kristof Provost
4e85b64890 Add a COMPAT_FREEBSD13 kernel option
Use it wherever COMPAT_FREEBSD11 is currently specified.

Reviewed by:	jhb (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33005
2021-11-17 03:08:40 +01:00
Kristof Provost
23e1961e78 riscv: add COMPAT_FREEBSD12 option
Turn on compat option for older FreeBSD versions (i.e. 12). We do not
enable the compat options for 11 or older because riscv was never
supported in those versions.

Reviewed by:	jrtc27 (previous version)
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33015
2021-11-17 03:08:14 +01:00
Warner Losh
7e3c9ec906 tcp: better congestion control defaults
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm.  Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.

Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision:	https://reviews.freebsd.org/D32964
2021-11-12 12:16:11 -07:00
Randall Stewart
b8d60729de tcp: Congestion control cleanup.
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!

This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.

This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.

Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
2021-11-11 06:28:18 -05:00
Kyle Evans
6a8ea6d174 sched: split sched_ap_entry() out of sched_throw()
sched_throw() can no longer take a NULL thread, APs enter through
sched_ap_entry() instead.  This completely removes branching in the
common case and cleans up both paths.  No functional change intended.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D32829
2021-11-05 15:45:51 -05:00
Kyle Evans
589aed00e3 sched: separate out schedinit_ap()
schedinit_ap() sets up an AP for a later call to sched_throw(NULL).

Currently, ULE sets up some pcpu bits and fixes the idlethread lock with
a call to sched_throw(NULL); this results in a window where curthread is
setup in platforms' init_secondary(), but it has the wrong td_lock.
Typical platform AP startup procedure looks something like:

- Setup curthread
- ... other stuff, including cpu_initclocks_ap()
- Signal smp_started
- sched_throw(NULL) to enter the scheduler

cpu_initclocks_ap() may have callouts to process (e.g., nvme) and
attempt to sched_add() for this AP, but this attempt fails because
of the noted violated assumption leading to locking heartburn in
sched_setpreempt().

Interrupts are still disabled until cpu_throw() so we're not really at
risk of being preempted -- just let the scheduler in on it a little
earlier as part of setting up curthread.

Reviewed by:	alfredo, kib, markj
Triage help from:	andrew, markj
Smoke-tested by:	alfredo (ppc), kevans (arm64, x86), mhorne (arm)
Differential Revision:	https://reviews.freebsd.org/D32797
2021-11-03 15:54:59 -05:00
Philip Paeps
91feb4f420 riscv: add iicbus and iicoc to GENERIC
The iicoc driver supports the OpenCores I2C IP.  This is included in at
least the SiFive "Unleashed" and "Unmatched" cores and probably others.

Suggested by:	jrtc27
2021-11-01 13:19:55 +08:00