IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
enabled if an attempt is made to send an IPI_STOP IPI. If the kernel
option is enabled, there is also a sysctl to change the behavior at
runtime (debug.stop_cpus_with_nmi which defaults to enabled). This
includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
that is restarted making use of atomic_readandclear() rather than
assuming that the BSP is always included in the set of restarted CPUs.
Also, the NMI handler didn't clear the function pointer meaning that
subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
of stoppedpcbs[] and always enable it for i386 and amd64 instead of
being dependent on KDB_STOP_NMI. It works fine in both the NMI and
non-NMI cases.
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.
Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.
Reviewed by: alc, jhb
MFC after: 1 week
Fix a debugging printf to printf after a variable is first assigned,
not before.
These are purely build fixes, and need inspection to make sure they
were what the original author of the previous changes intended.
- Add newer CPUID definitions for future use.
Many thanks to Mike Tancsa <mike at sentex dot net> for providing test
cases for Intel Pentium D and AMD Athlon 64 X2.
Approved by: anholt (mentor)
originally wrote it for 4.x and hasn't really had the time to fully update
it to 5.x and later. Also, the author doesn't use the hardware anymore as
well. If someone does need this driver they can always resurrect it from
the Attic.
Requested by: Frank Mayhar frank at exit dot com
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.
2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.
3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.
4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.
5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.
6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.
Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
previous revision only restored the MP optimization.
Describe the optimization strategy for TLB invalidations in a comment.
Reviewed by: ups@
MFC after: 3 days
TLB shootdown requirements. Otherwise a CPU may not get the needed
TLB invalidation.
The PTE valid and access flags can not be used here to avoid TLB
shootdowns unless sf->cpumask == all_cpus.
( Otherwise some CPUs may still hold an even older entry in the TLB)
Since sf_buf_alloc mappings are normally always used this is
also not really useful and presetting accessed and modified
allows the CPU to speculatively load the entry into the TLB.
Both bugs can cause random data corruption.
MFC after: 3 days
o Axe poll in trap.
o Axe IFF_POLLING flag from if_flags.
o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.
o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.
o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.
Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.
Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.
Reviewed by: ru, sam, jhb
can be enabled by enabling COUNT_IPIS in smptests.h. When enabled, each
CPU provides an interrupt counter for nearly all of the IPIs it receives
(IPI_STOP currently doesn't have a counter) that can be examined using
vmstat -i, etc.
MFC after: 3 days
Requested by: rwatson
and do some preparations for handling 12x22 fonts (currently lots of code
implies and/or hardcodes a font width of 8 pixels). This will be required
on sparc64 which uses a default font size of 12x22 in order to add font
loading and saving support as well as to use a syscons(4)-supplied mouse
pointer image.
This API breakage is committed now so it can be MFC'ed in time for 6.0
and later on upcoming framebuffer drivers destined for use on sparc64
and which are expected to rely on using font loading internally and on
a syscons(4)-supplied mouse pointer image can be easily MFC'ed to
RELENG_6 rather than requiring a backport.
Tested on: i386, sparc64, make universe
MFC after: 1 week
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:
Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.
This aligns this code more with the eventual locking of ttys.
Suggested by: bde
variable and returns the previous value of the variable.
Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).
Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.
With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.
NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.
NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.
NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.
MFC after: 1 week
This kernel config briefly describes some of the major MAC policies
available on FreeBSD. The hope is that this will raise the awareness
about MAC and get more people interested.
Discussed with: scottl
with some Dell servers that booted w/o a problem[*] on 5.4, but failed
with 6.0-BETA.
On the PCI bus, when we do lazy resource allocation, we narrow the
range requested as we pass through bridges to reflect how the bridges
are programmed and what addresses they pass. However, when we're
doing an allocation on a bus that's directly connected to a host
bridge, no such translation can take place. We already had a fallback
range for memory requests, but none for ioports. As such, provide a
fallback for I/O ports so we don't allocate location 0, which will
have undesired side effects when the resources are actually used.
This fixes a problem with booting a Dell server with usb in the
kernel. However, it is an unsatisfying solution. I don't like the
hard coded value, and I think we should start narrowing the resources
returned to not be in the so-called isa alias area (where the ranage &
0x0300 must be 0 iirc). Doing such filtering will have to wait for
another day.
This may be a good 6 candidate, maybe after its had a chance to be
refined.
Tested by: glebius@
constraint is actually only allowed for register operands. Instead, use
separate input and output memory constraints.
Education from: alc
Reviewed by: alc
Tested on: i386, alpha
MFC after: 1 week
and reloading it in i386_extend_pcb() rather than trying to force a context
switch to reload the TSS via the TDF_NEEDRESCHED flag. Optimizations to
avoid calling cpu_switch() when the new thread was identical to the old
thread defeated the attempt to force a TSS reload. Explicitly loading the
new TSS is what we really want to do anyway.
PR: i386/84842
Reported by: Alexander Best arundel at h3c dot de
MFC after: 1 week
Reviewed by: bde (mostly)
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.
Reviewed by: tegge
earlier as no one has stepped up to test recent changes to the driver.
Oddly, the module was actually turned on on ia64 though I'm fairly certain
that no ia64 machine has ever had or will ever have an ISA slot.
Axe borrowed from: phk
it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>.
This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.
Discussed with: bde
r_gdt -> saved_gdt
r_idt -> saved_idt
r_ldt -> saved_ldt
in order to prevent clashes with variables with same names
defined in <machine/segments.h>. This fixes compilation of this
file with GCC 4.0.
Reviewed by: njl
- Add locked variants of el_init and el_start.
- Don't initialize the mutex and lock it during el_probe().
- Do initialize the mutex during attach. (el_probe() did destroy the mutex
to cleanup, so this meant the driver was always using a destroyed mutex
when it was running.)
- Setup the interrupt handler after ether_ifattach().
- Fix locking in el_detach() and el_ioctl().
Note: Since I couldn't actually find anyone with this hardware, I'm going
ahead and committing these changes so they won't be lost. I'll remove the
driver in a week (real purpose of the MFC after below) unless someone pipes
up to test this.
MFC after: 1 week
Tested by: gcc(1)
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Reviewed by: pjd, bz
MFC after: 7 days
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.
The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.
the vm_page_t associated with a pte using only the lower 32-bits of the pte
instead of the full 64-bits.
Submitted by: Greg Taleck greg at isilon dot com
Reviewed by: jeffr, alc
MFC after: 3 days
not mask the ExtINT pin on the first I/O APIC as at least one PIII chipset
seems to need this even though all of the pins in the 8259A's are masked.
The default is still to mask the ExtINT pin.
Reported by: Mike Tancsa mike at sentex.net
MFC after: 3 days
(i.e., smart battery) and fix various bugs found during the cleanup.
API changes:
* kernel access:
Access to individual batteries is now via devclass_find("battery").
Introduce new methods ACPI_BATT_GET_STATUS (for _BST-formatted data) and
ACPI_BATT_GET_INFO (for _BIF-formatted data). The helper function
acpi_battery_get_battinfo() now takes a device_t instead of a unit #
argument. If dev is NULL, this signifies all batteries.
* ioctl access:
The ACPIIO_BATT_GET_TYPE and ACPIIO_BATT_GET_BATTDESC ioctls have been
removed. Since there is now no need for a mapping between "virtual" unit
and physical unit, usermode programs can just specify the unit directly and
skip the old translation steps. In fact, acpiconf(8) was actually already
doing this and virtual unit was the same as physical unit in all cases
since there was previously only one battery type (acpi_cmbat). Additionally,
we now map the ACPIIO_BATT_GET_BIF and ACPIIO_BATT_GET_BST ioctls for all
batteries, if they provide the associated methods.
* apm compatibility device/ioctls: no change
* sysctl: no change
Since most third-party applications use the apm(4) compat interface, there
should be very few affected applications (if any).
Reviewed by: bruno
MFC after: 5 days
and return a printable representation.
This fixes recognition of the PC Engines WRAP and improves the
recognition of the Soekris boards (Bios version can now be
seen in the dmesg output for instance).
Also, add watchdog support for PCM-582x platforms.
Submitted by: Adrian Steinmann <ast@marabu.ch>
Slightly changed by: phk
PR: 81360
by Vladimir Dergachev for inclusion in DRM CVS, with minor modifications for
FreeBSD CVS and the appropriate license from Nicolai Haehnle on r300_reg.h.
Fixes hangs when using r300.sf.net userland, tested on a Radeon 9600 on amd64.
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.
MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm
- Make sure timer0_max_count is set to a correct value in the lapic case.
- Revert i8254_restore() to explicitly reprogram timer 0 rather than
calling set_timer_freq() to do it. set_timer_freq() only reprograms
the counter if the max count changes which it never does on resume. This
unbreaks suspend/resume for several people.
Tested by: marks, others
Reviewed by: bde
MFC after: 3 days
in the PCI config registers) that are > 15 as $PIR can only route PCI
interrupts to ISA IRQs which are limited to the 0 to 15 range.
- Remove an extra word from a printf.
Reported by: othermark atkin901 at yahoo dot com
MFC after: 3 days
enabling of interrupts inside of trap(). Fix a typo in a comment.
Revert rev 1.113 of "sys/i386/i386/exception.s" as it is no longer
needed.
Reviewed by: bde
MFC after: 3 days
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.
Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)
exit via 'doreti_exit'.
Since the NMI interrupt may be taken at any time, including when
the processor has masked external interrupts, it is not safe to
call ast() as is done for normal interrupts.
Approved by: re (scottl)
further changes and fixes in the future:
- Use aliases via macros rather than duplicated inlines wherever possible.
- Move all the aliases to the bottom of these files and the inline
functions to the top.
- Add various comments.
- On alpha, drop atomic_{load_acq,store_rel}_{8,char,16,short}().
- On i386 and amd64, don't duplicate the extern declarations for functions
in the two non-inline cases (KLD_MODULE and compiler doesn't do inlines),
instead, consolidate those two cases.
- Some whitespace fixes.
Approved by: re (scottl)
copied and pasted. I had actually tested without this change in my
trees as had the other testers.
Reported by: bde, Rostislav Krasny rosti dot bsd at gmail dot com
Approved by: re (scottl)
Pointy hat to: jhb
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).
This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.
IP_HDR_ALIGNED_P()
IP6_HDR_ALIGNED_P()
Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.
PR: ia64/81284
Obtained from: NetBSD
Approved by: re (dwhite), mlaier (mentor)
as they are already default for I686_CPU for almost 3 years, and
CPU_DISABLE_SSE always disables it. On the other hand, CPU_ENABLE_SSE
does not work for I486_CPU and I586_CPU.
This commit has:
- Removed the option from conf/options.*
- Removed the option and comments from MD NOTES files
- Simplified the CPU_ENABLE_SSE ifdef's so they don't
deal with CPU_ENABLE_SSE from kernel configuration. (*)
For most users, this commit should be largely no-op. If you used to
place CPU_ENABLE_SSE into your kernel configuration for some reason,
it is time to remove it.
(*) The ifdef's of CPU_ENABLE_SSE are not removed at this point, since
we need to change it to !defined(CPU_DISABLE_SSE) && defined(I686_CPU),
not just !defined(CPU_DISABLE_SSE), if we really want to do so.
Discussed on: -arch
Approved by: re (scottl)
by amd64 and i386: For buffered writes we collect data and write it
out a ${DEV_BSIZE}-sized block at a time. The fragsz variable is used
to keep track of how much data we have collected in the buffer so far
and it's reset to zero immediately after writing a block to the dump
device.
When the last, possibly partially filled buffer is flushed, we didn't
reset fragsz to 0 and as such would stop reflecting reality. Since we
currently only need to do buffered writes once, this isn't a problem.
However, when kernel dumps are made by hand (say by callling doadump
from within DDB), the improperly cleared state from the first call to
dumpsys causes the next call to dumpsys to create an invalid code file.
This change resets fragsz after flushing the partially filled buffer so
that it fixes the two problems at once.
Approved by: re (scottl)
timer since irq0 isn't being driven at hz in that case and we don't need to
try to handle edge cases with rollover, etc. that require irq0 to be firing
for the timecounter to actually work.
Submitted by: phk
Tested by: schweikh
Approved by: re (scottl)
that newer Intel cpu hardware implements them too. This includes things
like the NX (pte no-execute) flag for execute protection. We'll need to
reference this for implementing no-exec in pmap.c at some point.
Some feature flags are duplicated in both the Intel-orignated bits and
the AMD bits. Suppress the the duplicates correctly - the old code
assumed they were a 1:1 mapping which is not correct. We can't just mask
off the bits present in cpu_feature.
Converge with amd64 where this originated from.
Intel cpu's that implement any AMD features will report them in dmesg now.
Approved by: re
dump format. The key reason to do this is so that we can dump sparse
address space. For example, we need to be able to skip the PCI hole
just below the 4GB boundary. Trying to destructively dump MMIO device
registers is Really Bad(TM). The frequent result of trying to do a
crash dump on a machine with 4GB or more ram was ugly (lockup or reboot).
This code has been taken directly from the IA64 dump_machdep.c code,
with just a few (mostly minor) mods.
Introduce a dump_avail[] array in the machdep.c code so that we have a
source of truth for what memory is present in a machine that needs to be
dumped. We can't use phys_avail[] because all sorts of things slice
memory out of it that we really need to dump. eg: the vm page array
and the dmesg buffer. dump_avail[] is pretty much an unmolested version
of phys_avail[]. It does have Maxmem correction.
Bump the i386 and amd64 dump format to version 2, but nothing actually
uses this. amd64 was actually using the i386 dump version number.
libkvm support to follow.
Approved by: re
fields for each system call, I missed two system call files because
they weren't named syscalls.master. Catch up with this last two,
mapping the system calls to the NULL event for now.
Spotted by: jhb
Approved by: re (scottl)
files after they were repo-copied to sys/dev/atkbdc. The sources of
atkbdc(4) and its children were moved to the new location in preparation
for adding an EBus front-end to atkbdc(4) for use on sparc64; i.e. in
order to not further scatter them over the whole tree which would have
been the result of adding atkbdc_ebus.c in e.g. sys/sparc64/ebus. Another
reason for the repo-copies was that some of the sources were misfiled,
e.g. sys/isa/atkbd_isa.c wasn't ISA-specific at all but for hanging
atkbd(4) off of atkbdc(4) and was renamed to atkbd_atkbdc.c accordingly.
Most of sys/isa/psm.c, i.e. expect for its PSMC PNP part, also isn't
ISA-specific.
- Separate the parts of atkbdc_isa.c which aren't actually ISA-specific
but are shareable between different atkbdc(4) bus front-ends into
atkbdc_subr.c (repo-copied from atkbdc_isa.c). While here use
bus_generic_rl_alloc_resource() and bus_generic_rl_release_resource()
respectively in atkbdc_isa.c instead of rolling own versions.
- Add sparc64 MD bits to atkbdc(4) and atkbd(4) and an EBus front-end for
atkbdc(4). PS/2 controllers and input devices are used on a couple of
Sun OEM boards and occur on either the EBus or the ISA bus. Depending on
the board it's either the only on-board mean to connect a keyboard and
mouse or an alternative to either RS232 or USB devices.
- Wrap the PSMC PNP part of psm.c in #ifdef DEV_ISA so it can be compiled
without isa(4) (e.g. for EBus-only machines). This ISA-specific part
isn't separated into its own source file, yet, as it requires more work
than was feasible for 6.0 in order to do it in a clean way. Actually
philip@ is working on a rewrite of psm(4) so a more comprehensive
clean-up and separation of hardware dependent and independent parts is
expected to happen after 6.0.
Tested on: i386, sparc64 (AX1105, AXe and AXi boards)
Reviewed by: philip
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.
Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.
Remove stale comments from pmap_init().
Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.
Tested by: cognet, kensmith, marcel
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used. The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.
Submitted by: wsalamon
Obtained from: TrustedBSD Project
exist on other architectures yet.
- While I'm here, fix the formatting of the options line. The keyword
"options" should be followed by a space and then a tab, not 2 tabs.
of MCOUNT to have a C version and an asm version with the same name and
not have LOCORE ifdefs to distinguish them. <machine/profile.h> provides
a C version and <machine/asmacros.h> provides an assembler version.
Discussed with: bde
i8253reg.h, and add some defines to control a speaker.
- Move PPI related defines from i386/isa/spkr.c into ppireg.h and use them.
- Move IO_{PPI,TIMER} defines into ppireg.h and timerreg.h respectively.
- Use isa/isareg.h rather than <arch>/isa/isa.h.
Tested on: i386, pc98
- Move MD files into <arch>/<arch>.
- Move bus dependent files into <arch>/<bus>.
Rename some files to more suitable names.
Repo-copied by: peter
Discussed with: imp
in as-yet uncommitted code for 32 bit binary compatability on 64 bit
kernels, some of the 32 bit registers come from the pcb. Moving the
initialization here means fill_regs32() etc are laid out the same.
Have pmcstat(8) and pmccontrol(8) use these APIs.
Return PMC class-related constants (PMC widths and capabilities)
with the OP GETCPUINFO call leaving OP PMCINFO to return only the
dynamic information associated with a PMC (i.e., whether enabled,
owner pid, reload count etc.).
Allow pmc_read() (i.e., OPS PMCRW) on active self-attached PMCs to
get upto-date values from hardware since we can guarantee that the
hardware is running the correct PMC at the time of the call.
Bug fixes:
- (x86 class processors) Fix a bug that prevented an RDPMC
instruction from being recognized as permitted till after the
attached process had context switched out and back in again after
a pmc_start() call.
Tighten the rules for using RDPMC class instructions: a GETMSR
OP is now allowed only after an OP ATTACH has been done by the
PMC's owner to itself. OP GETMSR is not allowed for PMCs that
track descendants, for PMCs attached to processes other than
their owner processes.
- (P4/HTT processors only) Fix a bug that caused the MI and MD
layers to get out of sync. Add a new MD operation 'get_config()'
as part of this fix.
- Allow multiple system-mode PMCs at the same row-index but on
different CPUs to be allocated.
- Reject allocation of an administratively disabled PMC.
Misc. code cleanups and refactoring. Improve a few comments.
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.
KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.
Submitted by: ups
Only allow a process to use the x86 RDPMC instruction if it has
allocated and attached a PMC to itself.
Inform the MD layer of the "pseudo context switch out" that needs
to be done when the last thread of a process is exiting.
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
instead of assuming fixed offsets within the GDT. The hard-coded
values here have been incorrect since Peter's GDT rearranging around
10 days ago, causing ACPI resume problems.
Reviewed by: peter
inclusion of <sys/pmc.h> and depending on being included from
that header file.
o Include any MD specific header files that otherwise need to be
included from MI files.
Ok'd: jkoshy@
settings and is an older version of the same design used for ICH SpeedStep.
It is only known to be available on PIIX4 chipsets.
Many thanks to Bruno Ducrot for writing the driver and Jon Noack for
testing.
Submitted by: Bruno Ducrot
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).
Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb
- Split core DRM routines back into their own module, rather than using the
nasty templated system like before.
- Development-class R300 support in radeon driver (requires userland pieces, of
course).
- Mach64 driver (haven't tested in a while -- my mach64s no longer fit in the
testbox). Covers Rage Pros, Rage Mobility P/M, Rage XL, and some others.
- i915 driver files, which just need to get drm_drv.c fixed to allow attachment
to the drmsub device. Covers i830 through i915 integrated graphics.
- savage driver files, which should require minimal changes to work. Covers the
Savage3D, Savage IX/MX, Savage 4, ProSavage.
- Support for color and texture tiling and HyperZ features of Radeon.
Thanks to: scottl (much p4 handholding)
Jung-uk Kim (helpful prodding)
PR: [1] kern/76879, [2] kern/72548
Submitted by: [1] Alex, lesha at intercaf dot ru
[2] Shaun Jurrens, shaun at shamz dot net
Specifically, if the BIOS has programmed an IRQ for a device that doesn't
match the list of valid IRQs for the link, use it anyway as some BIOSes
don't correctly list the valid IRQs in the $PIR. Also, allow the user
to specify an IRQ that $PIR claims is invalid as an override, but emit a
warning in that case.
when using an APIC. This simplifies the APIC code somewhat and also allows
us to be pedantically more compliant with ACPI which mandates no use of
mixed mode.
of the kernel address space already. Intel recommend this anyway, because
using a non-4GB limit adds an additional clock cycle to address generation.
We were able to install 4GB segments into the LDT, so any limits we imposed
on %cs and %ds were academic anyway. More importantly, this allows us to
make a page in the kernel readable to user applications, for holding things
like the signal trampoline and other fun things.
Move the user %cs/%ds segments from the LDT to the GDT. There was no good
reason for them to be there anyway. The old LDT entries are still there
but we can now relax the restriction that prevented users from emptying
the default LDT entries.
Putting user and kernel %cs and %ds together allows us to access the fast
sysenter/sysexit/syscall/sysret instructions. syscall/sysret in particular
require that the user/kernel segments be laid out this way. Reserve a slot
specifically for NDIS while here.
Create two user controllable slots in the GDT that are context switched
with the (kernel) thread. This allows user applications to set two
user privilige selectors to arbitary values. Create
i386_set_fsbase(void *base) and friends. (get/set, fs/gs). For i386,
%gs is used by tls and the thread libraries and this means that user
processes no longer have to have the cost of having a custom LDT, and
we will no longer to do a ldt switch when activating a kthread/ithread in
the usual case any more.
In other words, we can now set the base address for %fs and %gs to arbitary
addresses without the pain of messing with ldt segments.
of the __pcb_spare longs. Except that fields were changed and one of the
spare values was used and the __pcb_spare field was reduced from two to one
long. Now VM86 bios calls can trash the first 4 bytes of the next page
following the kernel stack/pcb. This Is Bad(TM). This bug has been
present in 5.2-release and onwards, and is still in RELENG_5.
Instead of tempting fate and trying to use "spare" fields, explicitly
reserve them.
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.
Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.
This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).
Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
compiler features tests. This is ok, since machine/ieeefp.h is an internal
interface. But floatingpoint.h is a public interface and some ports use it,
so include sys/cdefs.h in the amd64 and i386 version of floatingpoint.h.
Note: some architectures don't provide recursive inclusion protection in
floatingpoint.h, namely alpha and ia64. Except for this part and now the
include of sys/cdefs.h, all those files are equal (from a compiler POV),
so they could be moved to only one version in src/include/.
Approved by: joerg
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
vnodes and anonymous memory. Note that mmaping a cdev directly does not
currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
cdev the ioctl is acting on rather than trying to find a suitable vnode
to map from.
Reviewed by: alc, arch@
checks, including cpuid_is_k7(), will catch CPUs that really don't support
this method.
Submitted by: Bruno Ducrot
Tested by: Jari Kirma (kirma cs.hut.fi)
and AMD Cool&Quiet PowerNow! (k8) cpufreq control. This driver is enabled
for both i386 and amd64 architectures. It has both acpi and legacy BIOS
attachments. Thanks to Bruno Ducrot for writing this driver and Jung-uk
Kim for testing.
Submitted by: Bruno Ducrot (ducrot:poupinou.org)
Instead, explicitly enable them when we setup the interrupt handler.
Also, move the setting of stathz and profhz down to the same place so
that the code flow is simpler and easier to follow.
- Don't setup an interrupt handler for IRQ0 if we are using the lapic timer
as it doesn't do anything productive in that case.
to kmem_alloc(). Failure to do this made it possible for user
processes to cause a hard lock on i386 kernels. I believe this only
affects 6-CURRENT on or after 2005-01-26.
Found by: Coverity Prevent analysis tool
Security: Local DOS
FreeBSD based on aue(4) it was picked by OpenBSD, then from OpenBSD ported
to NetBSD and finally NetBSD version merged with original one goes into
FreeBSD.
Obtained from: http://www.gank.org/freebsd/cdce/
NetBSD
OpenBSD
modern CPUs that have multiple VID#s that aren't detectable via public
methods. We use the control value from acpi_perf as the id16 for setting
a given frequency.
This is mentioned in the Handbook but it is not as obvious to new
users why bpf is needed compared to the other largely self-explanatory
items in GENERIC.
PR: conf/40855
MFC after: 1 week
last in the list rather than first.
This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x). This
also means that the pci code will once again print the resources in BAR
ascending order.
o Use IP_NPX in preference to hard coded value to write 0 to clear busy#
o Use md macro for a full reset of the npx
o Use IRQ_NPX in preference to hard coded value for each platform.
# The other two ifdefs in this file are hard to remove
rman_resource_resournce_bound wrt end parameter. The end parameter
here was the same as the start. However, it should be start + count -
1, so make it that instead.
where having this disabled was actually hurting us, since so many
BIOSes include legacy USB emulation that takes control of all usb
ports and only the ehci driver knows how to disable it.
to mistakes from day 1, it has always had semantics inconsistent with
SVR4 and its successors. In particular, given argument M:
- On Solaris and FreeBSD/{alpha,sparc64}, it clobbers the old flags
and *sets* the new flag word to M. (NetBSD, too?)
- On FreeBSD/{amd64,i386}, it *clears* the flags that are specified in M
and leaves the remaining flags unchanged (modulo a small bug on amd64.)
- On FreeBSD/ia64, it is not implemented.
There is no way to fix fpsetsticky() to DTRT for both old FreeBSD apps
and apps ported from other operating systems, so the best approach
seems to be to kill the function and fix any apps that break. I
couldn't find any ports that use it, and any such ports would already
be broken on FreeBSD/ia64 and Linux anyway.
By the way, the routine has always been undocumented in FreeBSD,
except for an MLINK to a manpage that doesn't describe it. This
manpage has stated since 5.3-RELEASE that the functions it describes
are deprecated, so that must mean that functions that it is *supposed*
to describe but doesn't are even *more* deprecated. ;-)
Note that fpresetsticky() has been retained on FreeBSD/i386. As far
as I can tell, no other operating systems or ports of FreeBSD
implement it, so there's nothing for it to be inconsistent with.
PR: 75862
Suggested by: bde
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.
create kernel threads and call rfork(2) with RFTHREAD flag set in this case,
which puts parent and child into the same threading group. As a result
all threads that belong to the same program end up in the same threading
group.
This is similar to what linuxthreads port does, though in this case we don't
have a luxury of having access to the source code and there is no definite
way to differentiate linux_clone() called for threading purposes from other
uses, so that we have to resort to heuristics.
Allow SIGTHR to be delivered between all processes in the same threading
group previously it has been blocked for s[ug]id processes.
This also should improve locking of the same file descriptor from different
threads in programs running under linux compat layer.
PR: kern/72922
Reported by: Andriy Gapon <avg@icyb.net.ua>
Idea suggested by: rwatson
place.
This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.
By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.
Submitted by: netchild
Reviewed by: various developers on arch@, some time ago
timer case:
- Remove the virtual fooclock interrupt counters as they have served their
purpose.
- Adjust the dividers for the different clock such that profhz is now a
multiple of stathz as in the non-lapic case, and the timer now runs at
hz * 2 rather than hz * 3. With the new divisors, the default clock
rates are:
kern.clockrate: { hz = 1000, tick = 1000, profhz = 666, stathz = 133 }
Also save the DAC state, increase the maximum save state size from
4k to 8k, and refuse to save the VESA state if the BIOS reports it
is larger than the maximum size we can handle.
It doesn't appear that anything currently uses this code, but it
turns out to be capable of restoring some notebook displays to a
working state after a suspend-resume cycle.
SMP systems. It appears all drivers except ichss should attach to each
CPU and that settings should be performed on each CPU. Add comments about
this. Also, add a guard for p4tcc's identify method being called more than
once.
a bugfix of clearing the On-Demand flag when going back to 100%. It
has been tested and works on an IBM R32. Note original work done by
Ted Unangst and sobomax@.
IRQ 0 and not an ExtINT pin. The MADT enumerators ignore the PC-AT flag
and ignore overrides that map IRQ 0 to pin 2 when this quirk is present.
- Add a block comment above the quirks to document each quirk so that we
can use more verbose descriptions quirks.
MFC after: 2 weeks
ACPI MADT only does if the PC-AT flag is set), then don't assume that pin 0
on the first I/O APIC is an ExtINT pin. Instead, assume that it is ISA
IRQ 0.
on the previous generation of Pentium-M processors (Banias). Support for
Dothan and later processors involves working with acpi_perf(4) to extract
information about supported states. This driver should work on MP systems
including HTT. It is experimental and may have a few bugs but has been
tested to not crash at least.
Thanks to Colin Percival for his initial work on this driver.
preliminary support for using the GCC-compatibility of ICC was committed
but couldn't be tested at that time due to problems with ICC itself. Since
ICC 8.1 it's possible to use its GCC-compatibility under FreeBSD and it
turned out that a typedef for __gnuc_va_list is required in that case.
Revert the part of rev. 1.8 which #ifdef'ed out __gnuc_va_list for ICC.
MFC after: 1 week
parent cpu device before passing it to pcpu_find(). Get the ivars from the
child, not parent cpu device. These bugs would cause a panic when
dereferencing the pcpu ivar, but weren't present in the acpi attachment
which it seems most people are using.
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.
Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.
As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.
Requested (sorta) by: rwatson
MFC after: 2 weeks