As long as interrupts are disabled and there is not explicit call to
sched_add() there can't be any preemption there, thus the calls may be
consistent.
Reported by: kib, jhb
are served via an interrupt gate.
However, that doesn't explicitly prevent preemption and thread
migration thus scheduler pinning may be necessary in some handlers.
Fix that.
Tested by: gianni
MFC after: 1 month
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.
Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.
In collabouration with: Yahoo! Incorporated (via sbruno and peter)
Tested by: gianni
MFC after: 1 month
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.
Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month
VM86 calls instead of the real mode emulator as a backend. VM86 has been
proven reliable for very long time and it is actually few times faster than
emulation. Increase maximum number of page table entries per VM86 context
from 3 to 8 pages. It was (ridiculously) low and insufficient for new VM86
backend, which shares one context globally. Slighly rearrange and clean up
the emulator backend to accommodate new code. The only visible change here
is stack size, which is decreased from 64K to 4K bytes to sync. with VM86.
Actually, it seems there is no need for big stack in real mode.
MFC after: 1 month
Xeon 5500/5600 series:
- Utilize IA32_TEMPERATURE_TARGET, a.k.a. Tj(target) in place
of Tj(max) when a sane value is available, as documented
in Intel whitepaper "CPU Monitoring With DTS/PECI"; (By sane
value we mean 70C - 100C for now);
- Print the probe results when booting verbose;
- Replace cpu_mask with cpu_stepping;
- Use CPUID_* macros instead of rolling our own.
Approved by: rpaulo
MFC after: 1 month
from the inline assembly. This allows the compiler to cache invocations of
curthread since it's value does not change within a thread context.
Submitted by: zec (i386)
MFC after: 1 week
Fix another fallout from r208833. savectx() is used to save CPU context
for crash dump (dumppcb) and kdb (stoppcbs). For both cases, we cannot
have a valid pointer in pcb_save. This should restore the previous
behaviour.
zones for each malloc bucket size. The purpose is to isolate
different malloc types into hash classes, so that any buffer overruns
or use-after-free will usually only affect memory from malloc types in
that hash class. This is purely a debugging tool; by varying the hash
function and tracking which hash class was corrupted, the intersection
of the hash classes from each instance will point to a single malloc
type that is being misused. At this point inspection or memguard(9)
can be used to catch the offending code.
Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files.
The suggestion to have this on by default came from Kostik Belousov on
-arch.
This code is based on work by Ron Steinke at Isilon Systems.
Reviewed by: -arch (mostly silence)
Reviewed by: zml
Approved by: zml (mentor)
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.
Reviewed by: alc
systems with PnP/ACPI not reporting i8254 timer. In some cases it can be
fatal, as i8254 can be the only available time counter hardware. From other
side we are now heavily depend on i8254 timer and till the last time it's
init/usage was completely hardcoded. So this change just restores previous
behavior in more regular fashion.
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown. Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns. The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown. With this change, that is no longer so.
For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.
MFC after: 3 weeks
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.
Submitted by: alc [1]
Reviewed by: alc
MFC after: 3 weeks
It has more features than acpi_aiboost(4) and it will eventually replace
acpi_aiboost(4).
Submitted by: Constantine A. Murenin <cnst at FreeBSD.org>
Reviewed by: freebsd-acpi, imp
MFC after: 1 month
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.
The DF clearing for signal handlers was done some time ago.
MFC after: 1 week
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().
Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.
Noted by: bde
removal, MFi386 r209198:
Use critical sections instead of disabling local interrupts to ensure
the consistency between PCPU fpcurthread and the state of FPU.
Reviewed by: bde
Tested by: pho
believed that all 486-class CPUs FreeBSD is capable to run on, either
have no FPU and cannot use external coprocessor, or have FPU on the
package and can use #MF.
Reviewed by: bde
Tested by: pho (previous version)
FPU registers for copying. Remove the switch table and jumps from
bcopy/bzero/... to the actual implementation.
As a side-effect, i486-optimized bzero is removed.
Reviewed by: bde
Tested by: pho (previous version)
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.
Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.
For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.
This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.
Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.
Add counters for timer-related IPIs.
Reviewed by: jhb@ (previous version)