Commit Graph

108 Commits

Author SHA1 Message Date
Mark Johnston
c3d326fd44 Define MAXCPU consistently between the kernel and KLDs.
This reverts r177661.  The change is no longer very useful since
out-of-tree KLDs will be built to target SMP kernels anyway.  Moveover
it breaks the KBI in !SMP builds since cpuset_t's layout depends on the
value of MAXCPU, and several kernel interfaces, notably
smp_rendezvous_cpus(), take a cpuset_t as a parameter.

PR:		243711
Reviewed by:	jhb, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23512
2020-02-05 19:08:21 +00:00
Konstantin Belousov
9a52756044 i386: Merge PAE and non-PAE pmaps into same kernel.
Effectively all i386 kernels now have two pmaps compiled in: one
managing PAE pagetables, and another non-PAE. The implementation is
selected at cold time depending on the CPU features. The vm_paddr_t is
always 64bit now. As result, nx bit can be used on all capable CPUs.

Option PAE only affects the bus_addr_t: it is still 32bit for non-PAE
configs, for drivers compatibility. Kernel layout, esp. max kernel
address, low memory PDEs and max user address (same as trampoline
start) are now same for PAE and for non-PAE regardless of the type of
page tables used.

Non-PAE kernel (when using PAE pagetables) can handle physical memory
up to 24G now, larger memory requires re-tuning the KVA consumers and
instead the code caps the maximum at 24G. Unfortunately, a lot of
drivers do not use busdma(9) properly so by default even 4G barrier is
not easy. There are two tunables added: hw.above4g_allow and
hw.above24g_allow, the first one is kept enabled for now to evaluate
the status on HEAD, second is only for dev use.

i386 now creates three freelists if there is any memory above 4G, to
allow proper bounce pages allocation. Also, VM_KMEM_SIZE_SCALE changed
from 3 to 1.

The PAE_TABLES kernel config option is retired.

In collaboarion with: pho
Discussed with:	emaste
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D18894
2019-01-30 02:07:13 +00:00
Konstantin Belousov
d86c1f0dc1 i386 4/4G split.
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap.  The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.

There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold().  An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls.  The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline.  If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.

The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done.  The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging.  I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.

Tested by: pho
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D14633
2018-04-13 20:30:49 +00:00
Conrad Meyer
3f6867ef63 i386: Bump KSTACK_PAGES default to match amd64
Logically, extend r286288 to cover all threads, by default.

The world has largely moved on from i386.  Most FreeBSD users and developers
test on amd64 hardware.  For better or worse, we have written a non-trivial
amount of kernel code that relies on stacks larger than 8 kB, and it "just
works" on amd64, so there has been little incentive to shrink it.

amd64 had its KSTACK_PAGES bumped to 4 back in Peter's initial AMD64 commit,
r114349, in 2003.  Since that time, i386 has limped along on a stack half
the size.  We've even observed the stack overflows years ago, but neglected
to fix the issue; see the 20121223 and 20150728 entries in UPDATING.

If anyone is concerned with this change, I suggest they configure their
AMD64 kernels with KSTACK_PAGES 2 and fix the fallout there first.  Eugene
has identified a list of high stack usage functions in the first PR below.

PR:		219476, 224218
Reported by:	eugen@, Shreesh Holla <hshreesh AT yahoo.com>
Relnotes:	maybe
Sponsored by:	Dell EMC Isilon
2017-12-11 04:32:37 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Conrad Meyer
2744a0b69b Drop CACHE_LINE_SIZE to 64 bytes on x86
The actual cache line size has always been 64 bytes.

The 128 number arose as an optimization for Core 2 era Intel processors.  By
default (configurable in BIOS), these CPUs would prefetch adjacent cache
lines unintelligently.  Newer CPUs prefetch more intelligently.

The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400
series, "Dunnington").  If you are still using one of these CPUs, especially
in a multi-socket configuration, consider locating the "adjacent cache line
prefetch" option in BIOS and disabling it.

Reported by:	mjg
Reviewed by:	np
Discussed with:	jhb
Sponsored by:	Dell EMC Isilon
2017-08-28 22:28:41 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Konstantin Belousov
3208d3ff46 Give large kernel stack to the initial thread . Otherwise, ZFS
overflows the stack during root mount in some configurations.

Tested by:	Fabian Keil <freebsd-listen@fabiankeil.de> (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 13:50:52 +00:00
Konstantin Belousov
6fdfd88220 Use single instance of the identical INKERNEL() and PMC_IN_KERNEL()
macros on amd64 and i386.  Move the definition to machine/param.h.
kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb
version to PINKERNEL().

On i386, correct the lowest kernel address.  After the shared page was
introduced, USRSTACK no longer points to the last user address + 1 [*]

Submitted by:	Oliver Pinter [*]
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-02 14:37:21 +00:00
Konstantin Belousov
34c15db9cd Add config option PAE_TABLES for the i386 kernel. It switches pmap to
use PAE format for the page tables, but does not incur other
consequences of the full PAE config.  In particular, vm_paddr_t and
bus_addr_t are left 32bit, and max supported memory is still limited
by 4GB.

The option allows to have nx permissions for memory mappings on i386
kernel, while keeping the usual i386 KBI and avoiding the kernel data
sizing problems typical for the PAE config.

Intel documented that the PAE format for page tables is available
starting with the Pentium Pro, but it is possible that the plain
Pentium CPUs have the required support (Appendix H).  The goal is to
enable the option and non-exec mappings on i386 for the GENERIC
kernel.  Anybody wanting a useful system on 486, have to reconfigure
the modern i386 kernel anyway.

Discussed with:	alc, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-04-13 15:22:45 +00:00
Attilio Rao
941646f5ec Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept.  The change should also be useful
for consolidation and consistency.

Sponsored by:	EMC / Isilon storage division
Obtained from:	jeff
Reviewed by:	alc
2013-05-07 22:46:24 +00:00
Tijl Coosemans
c67f5b54d9 Remove redundant definitions of _ALIGN and _ALIGNBYTES. 2013-04-21 11:12:44 +00:00
Konstantin Belousov
d4e9009cc8 Fix the VM_BCACHE_SIZE_MAX definition on i386 to match the maximal
buffer map size, auto-tuned on the 4GB machine.  Having the maxbcache
bigger than the buffer map causes the transient bio map sizing logic
to assume that there is enough KVA to use approximately 90MB (buffer
map is sized to 110MB, and maxbcache is 200MB).  The increase in the
KVA usage caused other big KVA consumers, like nvidia.ko, to fail the
initialization.

Change the definition for both PAE and non-PAE cases, since PAE is
even more KVA-starved.

Reported and tested by:	David Wolfskill
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
2013-03-27 10:52:18 +00:00
Dag-Erling Smørgrav
ae7f84a9a4 Parly revert r239255: reinstate a default maxswzone on i386, where KVA is
scarce, but set it slightly higher so we can handle 8 GB of swap.
2012-08-27 13:22:27 +00:00
Dag-Erling Smørgrav
e2082935f0 As discussed on -current, remove the hardcoded default maxswzone.
MFC after:	3 weeks
2012-08-14 17:01:21 +00:00
Attilio Rao
68b739cd6f Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by:	pluknet
Approved by:	re (kib)
2011-07-19 00:37:24 +00:00
John Baldwin
1368987ae4 Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to
the x86 tree.  The $PIR code is still only enabled on i386 and not amd64.
While here, make the qpi(4) driver on conditional on 'device pci'.
2011-06-22 21:04:13 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Poul-Henning Kamp
a254d1f16d Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
2009-09-08 20:45:40 +00:00
Sam Leffler
8c393fd1f0 Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by:	bde, imp, marcel
Approved by:	re (kensmith)
2009-07-05 17:45:48 +00:00
John Baldwin
8aba835b8e Bump CACHE_LINE_SIZE to 128 for x86. Intel's manuals explicitly recommend
using 128 byte alignment for locks.  (See IA-32 SDM Vol 3A 7.11.6.7)
2009-05-18 19:33:59 +00:00
Robert Watson
9725389e1e Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by:	bde [1], jhb [2]
MFC after:	2 weeks
2009-04-20 12:59:23 +00:00
Robert Watson
22037b2d2c Add description and cautionary note regarding CACHE_LINE_SIZE.
MFC after:	2 weeks
Suggested by:	alc
2009-04-19 21:26:36 +00:00
Robert Watson
a93fa8f2bb For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant.  These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after:	2 weeks
Discussed on:   arch@
2009-04-19 20:19:13 +00:00
John Baldwin
3d074cf37b Bump MAXCPU to 32 now that 32 CPU x86 systems exist.
Tested by:	rwatson, mdtansca
Approved by:	peter
2008-10-01 21:59:04 +00:00
Kip Macy
dec9f63538 add NPGPTD_SHIFT for the nkpt calculation
MFC after:	1 month
2008-09-25 07:05:17 +00:00
John Birrell
e483943791 When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.
2008-03-27 05:03:26 +00:00
Warner Losh
d5e61c97a6 By popular demand, move __HAVE_ACPI and __PCI_REROUTE_INTERRUPT into
param.h.  Per request, I've placed these just after the
_NO_NAMESPACE_POLLUTION ifndef.  I've not renamed anything yet, but
may since we don't need the __.

Submitted by: bde, jhb, scottl, many others.
2006-01-09 06:05:57 +00:00
Ruslan Ermilov
224d140293 Drop _MACHINE_ARCH and _MACHINE defines (not to be confused with
MACHINE_ARCH and MACHINE).  Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.
2005-12-06 13:27:21 +00:00
David Schultz
ab44ebf537 Remove UAREA_PAGES.
Reviewed by:	arch@
2004-11-20 02:29:50 +00:00
Scott Long
9923b511ed Turn PREEMPTION into a kernel option. Make sure that it's defined if
FULL_PREEMPTION is defined.  Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c).  This
is a possible MT5 candidate.
2004-09-02 18:59:15 +00:00
Julian Elischer
df3a834f7e Give up trying to make preemption dependent on SCHED_4BSD
the list of breakages was getting too long
2004-09-01 20:41:18 +00:00
Julian Elischer
6222ded017 Don't ask for this for modules. no modules need to know about preemption at the moment 2004-09-01 18:29:57 +00:00
Scott Long
f164d4148e Protect the PREEMPTION logic with #ifdef _KERNEL to fix the build. 2004-09-01 10:12:08 +00:00
Julian Elischer
02ea3bcab9 Only turn preemption for 4bsd.
it's still poison for ULE.
2004-09-01 09:01:32 +00:00
Julian Elischer
6804a3ab6d Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after:	5 days
2004-09-01 06:42:02 +00:00
Scott Long
9352fe30a0 Turn off PREEMPTION by default while it gets debugged. It's been causing
4 weeks of problems including deadlocks and instant panics.  Note that the
real bugs are likely in the scheduler.
2004-08-01 14:31:45 +00:00
John Baldwin
0c0b25ae91 Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
  determine if a thread about to be added to a run queue should be
  preempted to directly.  If it is not safe to preempt or if the new
  thread does not have a high enough priority, then the function returns
  false and sched_add() adds the thread to the run queue.  If the thread
  should be preempted to but the current thread is in a nested critical
  section, then the flag TDF_OWEPREEMPT is set and the thread is added
  to the run queue.  Otherwise, mi_switch() is called immediately and the
  thread is never added to the run queue since it is switch to directly.
  When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
  then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
  setrunqueue() now does all the correct work.  This also removes the
  do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
  supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
  chance to run if the architecture supports native preemption since
  the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
  architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
  preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by:	scottl (with his re@ hat)
2004-07-02 20:21:44 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Alan Cox
49a2507bd1 Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM.  At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES.  The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES.  To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new.  In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().
2003-06-14 23:23:55 +00:00
Jake Burkholder
7ab9b220d9 - Add support for PAE and more than 4 gigs of ram on x86, dependent on the
kernel opition 'options PAE'.  This will only work with device drivers which
  either use busdma, or are able to handle 64 bit physical addresses.

Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine
with 6 gigs of ram.

Sponsored by:	DARPA, Network Associates Laboratories, FreeBSD Systems
2003-03-30 05:24:52 +00:00
Jake Burkholder
227f9a1c58 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
Jake Burkholder
910548dea7 - Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the
page directory.
- Use these instead of the magic constants 1 or PAGE_SIZE where appropriate.
  There are still numerous assumptions that the page directory is exactly
  1 page.

Sponsored by:	DARPA, Network Associates Laboratories
2003-02-23 21:20:00 +00:00
Peter Wemm
66422f5b7a Initiate deorbit burn for the i386-only a.out related support. Moves are
under way to move the remnants of the a.out toolchain to ports.  As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL.  It has already lasted WAY beyond
that.

Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.

Tested on: i386 (extensively), alpha
2002-09-17 01:49:00 +00:00
Matthew Dillon
ec61f55d42 Reduce the maximum KVA reserved for swap meta structures from 70 to 32 MB.
Reduce the swap meta calculation by a factor of 2, it's still massive overkill.

X-MFC after: immediately
2002-08-31 21:15:29 +00:00
Poul-Henning Kamp
6380601f64 Move MI stuff out of MD param.h files.
It can all still be overridden in the MD files should need suddenly arise.
2002-05-14 20:35:29 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Peter Wemm
268bdb43f9 Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places.  The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point.  -current doesn't really need this so much since each interrupt
runs on its own kstack.
2001-08-25 02:20:02 +00:00
Matthew Dillon
2f9e4e8025 Limit the amount of KVM reserved for the buffer cache and for swap-meta
information.  The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache.  This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating.  The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.
2001-08-20 00:41:12 +00:00
Jeroen Ruigrok van der Werven
6b1d8ceabe Fix CMSG and ALIGN macro usage.
Previously we had to include <machine/param.h> or <sys/param.h> bogusly
due to the fact that <sys/socket.h> CMSG macros needed the ALIGN macro,
which was defined in param.h.  However, including param.h was a disaster
for namespace pollution.
This solution, as contributed by shin a while ago, fixes it elegantly
by wrapping the definitions around some namespace pollution preventer
definitions.
This patch was long overdue.
This should allow any network programmer to use <sys/socket.h> as
before.

PR:		19971, 20530
Submitted by:	Martin Kaeske <MartinKaeske@lausitz.net>
		Mark Andrews <Mark.Andrews@nominum.com>
Patch submitted by:	shin
Reviewed by:	bde
2000-11-08 16:59:25 +00:00