with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.
Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.
Approved by: jhb
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).
Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.
This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.
Reviewed by: core
Approved by: core
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.
Tested on: i386, alpha
to copy the sigframe to the user's stack. Useracc() takes a non-trivial
amount of time. Eliminating it speeds up signal delivery by 15% or more.
o Update some comments.
Submitted by: bde
older PCI BIOSes hate this and this leads to panics when it is done. Also,
assume that a uniquely routed interrupt is already routed. This also
seems to help some older laptops with feable BIOSes cope.
Problem:
selwakeup required calling pfind which would cause lock order
reversals with the allproc_lock and the per-process filedesc lock.
Solution:
Instead of recording the pid of the select()'ing process into the
selinfo structure, actually record a pointer to the thread. To
avoid dereferencing a bad address all the selinfo structures that
are in use by a thread are kept in a list hung off the thread
(protected by sellock). When a selwakeup occurs the selinfo is
removed from that threads list, it is also removed on the way out
of select or poll where the thread will traverse its list removing
all the selinfos from its own list.
Problem:
Previously the PROC_LOCK was used to provide the mutual exclusion
needed to ensure proper locking, this couldn't work because there
was a single condvar used for select and poll and condvars can
only be used with a single mutex.
Solution:
Introduce a global mutex 'sellock' which is used to provide mutual
exclusion when recording events to wait on as well as performing
notification when an event occurs.
Interesting note:
schedlock is required to manipulate the per-thread TDF_SELECT
flag, however if given its own field it would not need schedlock,
also because TDF_SELECT is only manipulated under sellock one
doesn't actually use schedlock for syncronization, only to protect
against corruption.
Proc locks are no longer used in select/poll.
Portions contributed by: davidc
machdep.guessed_bootdev, and add code to sysctl to parse its value
and give a (not necessarily correct) name to the device we booted
from (the main motivation for this code is to use the info in the
PicoBSD boot scripts, and the impact on the kernel is minimal).
NOTE: the information available in bootdev is not always reliable,
so you should not trust it too much. The parsing code is the same
as in boot2.c, and cannot cover all cases -- as it is, it seems to
work fine with floppies and IDE disks recognised by the BIOS. It
_should_ work as well with SCSI disks recognised by the BIOS.
Booting from a CDROM in floppy emulation will return /dev/fd0 (because
this is what the BIOS tells us).
Booting off the network (e.g. with etherboot) leaves bootdev unset so
the value will be printed as "invalid (0xffffffff)".
Finally, this feature might go away at some point, hopefully when we
have a more reliable way to get the same information.
MFC-after: 5 days
o In i386's <machine/endian.h>, macros have some advantages over
inlines, so change some inlines to macros.
o In i386's <machine/endian.h>, ungarbage collect word_swap_int()
(previously __uint16_swap_uint32), it has some uses on i386's with
PDP endianness.
Submitted by: bde
o Move a comment up in <machine/endian.h> that was accidentially moved
down a few revisions ago.
o Reenable userland's use of optimized inline-asm versions of
byteorder(3) functions.
o Fix ordering of prototypes vs. redefinition of byteorder(3)
functions, so that the non-GCC (libc asm) case has proper
prototypes.
o Add proper prototypes for byteorder(3) functions in <sys/param.h>.
o Prevent redundant duplicate prototypes by making use of the
_BYTEORDER_PROTOTYPED define.
o Move the bswap16(), bswap32(), bswap64() C functions into MD space
for platforms in which asm versions don't exist. This significantly
reduces the complexity of some things at the cost of duplicate code.
Reviewed by: bde
and commenting of NETSMB, NETWMBCRYPTO, and SMBFS. In NOTES, they
had all floated to the bottom of the file with the list of seemingly
random and unclassified kernel options. This change moves them back
up to the network protocol and file system areas, and also documents
the dependencies.
be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.
Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.
Reviewed by: jake
Approved by: jake
This makes other power-management system (APM for now) to be able to
generate power profile change events (ie. AC-line status changes), and
other kernel components, not only the ACPI components, can be notified
the events.
- move subroutines in acpi_powerprofile.c (removed) to kern/subr_power.c
- call power_profile_set_state() also from APM driver when AC-line
status changes
- add call-back function for Crusoe LongRun controlling on power
profile changes for a example
Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.
Submitted by: dillon
Reviewed by: silby
MFC after: 1 week
In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.
Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.
Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.
Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.
PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week
device drivers for bus system with other endinesses than the CPU (using
interfaces compatible to NetBSD):
- bwap16() and bswap32(). These have optimized implementations on some
architectures; for those that don't, there exist generic implementations.
- macros to convert from a certain byte order to host byte order and vice
versa, using a naming scheme like le16toh(), htole16().
These are implemented using the bswap functions.
- stream bus space access functions, which do not perform a byte order
conversion (while the normal access functions would if the bus endianess
differs from the CPU endianess).
htons(), htonl(), ntohs() and ntohl() are implemented using the new
functions above for kernel usage. None of the above interfaces is currently
exported to user land.
Make use of the new functions in a few places where local implementations
of the same functionality existed.
Reviewed by: mike, bde
Tested on alpha by: mike
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.
boot and run (and indeed I am committing from it) instead of exploding
during the int 0x15 call from inside the atkbd driver to get the keyboard
repeat rates.
shootdowns in a couple of key places. Do the same for i386. This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead. This will help for PAE down the road.
Obtained from: jake (MI code, suggestions for MD part)
enabled in critical sections and streamline critical_enter() and
critical_exit().
This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.
This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.
This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.
The following changes have been made:
* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.
Other architectures are unaffected.
* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.
The new code places the burden of entering the critical section
in the trampoline code where it belongs.
* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.
* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.
* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.
* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.
* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.
* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.
Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.
* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.
Performance issues:
One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.
The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.
The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).
The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().
Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.
This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.
New locks are:
- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.
Please refer to sys/proc.h for the coverage of these locks.
Changes on the pgrp/session interface:
- pgfind() needs the pgrpsess_lock held.
- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.
- Call enterthispgrp() in order to enter an existing pgrp.
- pgsignal() requires a pgrp lock held.
Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)
Reviewed by: dillon@freebsd.org
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
complexities associated with having MD (asm and inline) versions, and
having to prevent exposure of these functions in other headers that
happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.
Tested on: alpha, i386
Reviewed by: bde, jake, tmm
ucontext_t. Forward declare struct __ucontext in <sys/signal.h> and
remove reliance on <sys/ucontext.h> being included.
While I'm here, also hide osigcontext types from userland; suggested
by bde.
Namespace pollution noticed by: Kevin Day <toasty@shell.dragondata.com>
reaquiring it. In the same vein, don't bother dropping the thread cred
when goinf ot userland. We are guaranteed to nned it when we come back,
(which we are guaranteed to do).
Reviewed by: jhb@freebsd.org, bde@freebsd.org (slightly different version)
the structure definitions come from NetBSD to make it easier to share card
definitions. The driver only acts as a shim between the pci bus and the
sio driver. Later pci parallel ports could also be supported through this
driver. Support for most single and multiport pci serial cards should be
as simple as adding its definition to pucdata.c
Tested with the following pci cards:
Moxa Industio CP-114, 4 port RS-232,RS-422/485
Syba Tech Ltd. PCI-4S2P-550-ECP, 4 port RS-232 + 2 parallel ports
Netmos NM9835 PCI-2S-550, 2 port RS-232
patch from a year ago: give file flags their own type. This does not
(yet) change the type used by system calls or library functions.
The underlying type was chosen to match what is returned by stat().
ACPI_NO_SEMAPHORES, ASR_MEASURE_PERFORMANCE, AST_DEBUG, ATAPI_DEBUG,
ATA_DEBUG, BKTR_ALLOC_PAGES, BROOKTREE_ALLOC_PAGES, CAPABILITIES,
COMPAT_SUNOS, CV_DEBUG, MAXFILES, METEOR_TEST_VIDEO, NDEVFSINO,
NDEVFSOVERFLOW, NETGRAPH_BRIDGE, NETSMB, NETSMBCRYPTO, PFIL_HOOKS,
SIMOS, SMBFS, VESA_DEBUG, VGA_DEBUG.
Start using #! to comment out negative options and ## to comment out
broken options.
atapi-all.c:
Fixed rotted bits that were hiding under ATAPI_DEBUG.
atapi-cd.c:
#include "opt_ata.h" so that ACD_DEBUG is actually visible.
ata/atapi-tape.c
#include "opt_ata.h" so that AST_DEBUG is actually visible.
SMP we'd like as much feedback as possible from users about possible
locking problems as early as possible.
To negate most of the performance impact I've also enabled
WITNESS_SKIPSPIN. I've done this as we've been running WITNESS
over the spinlock code for a while without incident and it goes a
long way to making the performance problems of WITNESS much more
bearable.
Users who should be running current should know about turning WITNESS
off for performance reasons.
That said and done, WITNESS could/should be made into a tuneable,
but we'll leave that as an excersize to those that want to disable
it without a kernel recompile.
descriptors. This simplifies code for jumbo frames.
- Cleaned up coding conventions to make code more unix-like.
- Cleaned up code in if_em_fxhw.c and if_em_phy.c.
Added relevant comments.
MFC after: 1 week
slower, and may be impeding adoption of -CURRENT by developers. We
recommend turning on WITNESS by default on crash boxes, and when doing
locking development. It will probably get turned on by default for a week
or two following any major locking commits, also.
Approved by: all and sundry (jhb, phk, ...)
feature bit on newer Athlon CPUs if the BIOS has forgotten to enable
it.
This patch was constructed using some info made available by John
Clemens at http://www.deater.net/john/PavilionN5430.html
Reviewed by: -audit
MFC after: 3 weeks
- Collected i486 identification codes in one place like
586 and 686.
- Merged two cases (0x470 and 0x490) for `Enhanced Am486DX4
Write-Back.'
- Replaced `unknown' into `Unknown'.
Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
from old signal handlers. This is simpler and faster, and fixes (new)
sigreturn(2) when %eip in the new signal context happens to match the
magic value (0x1d516). 0x1d516 is below the default ELF text section,
so this probably never broken anything in practice.
locore.s:
In addition, don't build the signal trampoline for old signal handlers
when it is not used.
alpha:
Not fixed, but seems to be even less broken in practice due to more
advanced magic. A false match occurs for register #32 in mc_regs[].
Since there is no hardware register #32, a false match is only possible
for direct calls to sigreturn(2) that happen to have the magic number
in the spare mc_regs[32] field.
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.
ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.
powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.
sparc64: copy the cleaned up the ia64 stub, since there was no stub before.
for SMP in the plain profiling case. It seems to work too.
This error was not detected by LINT because LINT only compiles the
GUPROF profiling case, which is is a superset of the plain profiling
case for !SMP but which is so broken for SMP that the buggy code is
not compiled.
the packet transfer routines, since rev.1.468 of machdep.c does this
better. I'm surprised that disabling interrupts helped much. Disabling
them in the packet receive routine is too late.
Fixed some minor style bugs in rev.1.14.
to fetch the magic word instead of useracc() plus a direct access.
This is more efficient as well as simpler and less incorrect:
- it was inefficent because useracc() takes much longer than just
accessing the data using a correct access method, at least on i386's.
- it was incorrect because direct access is incorrect unless the address
has been mapped. This and nearby direct accesses are mostly handled
better for other arches because they have to be (direct accesses don't
work).
- using magic in sigreturn is still fundamentally broken because false
matches are possible. On i386's, a false match occurs when %eip in a
new signal context happens to equal the magic value. This is not
handled better for other arches.
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.
This was fixed in 1999 but was broken by unconditionalizing PNPBIOS.
prior ICP Vortex models. This driver was developed by Achim Leubner
of Intel (previously with ICP Vortex) and Boji Kannanthanam of Intel.
Submitted by: "Kannanthanam, Boji T" <boji.t.kannanthanam@intel.com>
MFC after: 2 weeks
This typo keeps us from properly routing an interrupt for CardBus
bridges on this machine. So, now we look for $PIR and then _PIR to
cope. With these changes, the Libretto L1 now works properly.
Evidentally, the idea comes from patch that the Japanese version of
RedHat (or against a Japanese version of Red Hat), but my Japanese
isn't good enough to to know for sure.
Reported by: Hiroyuki Aizu-san <eyes@navi.org>
# This may be an MFC candidate, but I'm not yet sure.
cpuid with %eax=1 will return a logical cpu count in bits 16-23 of %ebx.
Bit 29 is actually 'TM' according to AP-485. This signifies the presence
of the thermal control circuit (which I believe can slow the clock down
to reduce core temperature).
they make it through to userland. This should fix the p5-smp problem
without affecting the other cpus (eg: cyrix, see initcpu.c and the special
cache handling for these cpu types).
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and
adapting it for KSE.
Locks:
1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.
1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp);
/* increments reference count on a file */
struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */
struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
traps on the first instruction of signal handlers.
In trap.c:syscall(), fake a trace trap if the single-step flag was set
on entry to the kernel, not if it will be set on exit from the kernel.
This fixes bogus trace traps after the last instruction of signal handlers.
gdb-4.18 (the version in FreeBSD) still has problems with the program in
the PR. These seem to be due to bugs in gdb and not in FreeBSD, and are
fixed in gdb-5.1 (the distribution version).
PR: 33262
Tested by: k Macy <kip_macy@yahoo.com>
MFC after: 1 day