macro. The commit log clearly states that the index given to the
macro is one higher than previously used to index the array. This
wasn't represented in the code and resulted in kernel page faults.
Reported by: Andrew Atrens <atrens@nortelnetworks.com>
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.
XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.
Reviewed by: jake, tmm, dillon
with the last change to the way the pmap_emulate_reference() works. This
should fix a number of memory corruption problems and also should stop the
mtimes of executables changing all the time.
o Introduce private types for use in linux syscalls for two reasons:
1. establish type independence for ease in porting and,
2. provide a visual queue as to which syscalls have proper
prototypes to further cleanup the i386/alpha split.
Linuxulator types are prefixed by 'l_'. void and char have not
been "virtualized".
o Provide dummy functions for all syscalls and remove dummy functions
or implementations of truely obsolete syscalls.
o Sanitize the shm*, sem* and msg* syscalls.
o Make a first attempt to implement the linux_sysctl syscall. At this
time it only returns one MIB (KERN_VERSION), but most importantly,
it tells us when we need to add additional sysctls :-)
o Bump the kenel version up to 2.4.2 (this is not the same as the
KERN_VERSION MIB, BTW).
o Implement new syscalls, of which most are specific to i386. Our
syscall table is now up to date with Linux 2.4.2. Some highlights:
- Implement the 32-bit uid_t and gid_t bases syscalls.
- Implement a couple of 64-bit file size/offset bases syscalls.
o Fix or improve numerous syscalls and prototypes.
o Reduce style(9) violations while I'm here. Especially indentation
inconsistencies within the same file are addressed. Re-indenting
did not obfuscate actual changes to the extend that it could not
be combined.
NOTE: I spend some time testing these changes and found that if there
were regressions, they were not caused by these changes AFAICT.
It was observed that installing a RH 7.1 runtime environment
did make matters worse. Hangs and/or reboots have been observed
with and without these changes, so when it failed to make life
better in cases it doesn't look like it made it worse.
1. establish type independence for ease in porting and,
2. provide a visual queue as to which syscalls have proper
prototypes to further cleanup the i386/alpha split.
Linuxulator types are prefixed by 'l_'. void and char have not
been "virtualized".
o Provide dummy functions for all unimplemented syscalls, except for
the osf1 syscalls. This can only be done if the osfulator
implements at least all syscalls used by the linuxulator. Remove
dummy functions for syscalls that are now truely unimplemented.
o Set the syscall namespace as follows: Mark a syscall as OSF1 if
the Linux kernel has prefixed the syscall with 'osf_' and has
provided special implementations for it. Otherwise mark the
syscall as LINUX by default. Some of the LINUX syscalls remain
marked as BSD or POSIX.
o Rename syscalls so they match the names used in the Linux kernel.
Also, provide more accurate prototypes. This generally improves
cross-referencing and reduces head-scratching.
o Fix the (g|s)etresuid syscalls. They mapped to (g|s)etresgid.
o Sanitize the the shm*, sem* and msg* syscalls. Their prototypes
were dictated by the way these syscalls were used in the i386
code. That has been fixed. NOTE: linux_semctl now passes it's
'arg' parameter by value and not by reference.
o Fix prototype of linux_utime. It takes a struct timeval, not a
struct utimbuf.
o Fix the linux_sysfs syscall. It's index is not 255, but 254.
o Implement the following syscalls:
linux_sysctl
o Add the following new syscalls:
(g|s)etresgid
linux_pivot_root (dummy)
linux_mincore (dummy)
linux_pciconfig_iobase (dummy)
linux_getdents64
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.
Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).
sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE
ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)
Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.
Rebuild syscall tables.
o Unify <machine/endian.h>'s across all architectures.
o Make bswapXX() functions use a different spelling of u_int16_t and
friends to reduce namespace pollution. The bswapXX() functions
don't actually exist, but we'll probably import these at some
point. Atleast one driver (if_de) depends on bswapXX() for big
endian cases.
o Deprecate byteorder(3) prototypes from <sys/types.h>, these are
now prototyped indirectly in <arpa/inet.h>.
o Deprecate in_addr_t and in_port_t typedefs in <sys/types.h>, these
are now typedef'd in <arpa/inet.h>.
o Change byteorder(3) prototypes to use standards compliant uint32_t
(spelled __uint32_t to reduce namespace pollution).
o Document new preferred headers and standards compliance.
Discussed with: bde
PR: 29946
Reviewed by: bmilekic
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.
Reviewed by: freebsd-smp, jake
but it's better than the buggy behavior we have now. If we contigmalloc()
buffers in bus_dmamem_alloc(), then we must configfree() them in
bus_dmamem_free(). Trying to free() them is wrong, and will cause
a panic (at least, it does on the alpha.)
I tripped over this when trying to kldunload my busdma-ified if_rl
driver.
to see if it was malloc()ed first" bug. In bus_dmamap_create(), one of
two things can happen: either we need to allocate a special map due to
some manner of bounce buffering requirement, or we can DMA a buffer
in place. On the x86 platform, the "in place" case results in
bus_dmamap_create() returning a dmamap of NULL. The bus_dmamap_destroy()
routine later checks for NULL and won't bother free()ing the map if
it detects this condition.
But on the alpha, we don't use NULL, we use a statically allocated map
called nobounce_dmamap(). Unfortunately, bus_dmamap_destroy() does not
handle the condition where we attempt to destroy such a map: it tries
to free() the dmamap, which causes a panic.
Fix: test that map != &nobounce_dmamap before trying to free() it.
With this fix, my busdma-ified if_sis driver works on the alpha. I'm
a bit alarmed that I'm the first person ever to trip over this bug, since
we have been using busdma on the alpha for a while, and since it sort
of screams out "Hi! I'm a bug! Booga-booga!" when you look at it.
(Somewhere, somebody will say: "But Bill, why don't you just not bother
destroying the maps in this case." Because the API is supposed to be
a) symetrical and b) opaque to the caller. I can't know whether it's safe
to skip the bus_dmamap_destroy() step or not without sticking my fingers
into unsafe places, which is what I wanted to avoid in the first place.)
boot CPU. This was the reason reboots on SMP systems could result in
weird hangs. Unlike the x86, we do not need to switch back to the boot
CPU in order to reboot the machine. See Section 3.4.5 of Part III
(Console Interface Architecture) from the Alpha Architecture Reference
Manual (aka the Brown Book) for more info.
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.
Reviewed by: bde (mostly)
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.
gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.
dynamic symbol table buckets and chains. The sparc64 toolchain uses 32
bit .hash entries, unlike other 64 bits architectures (alpha), which use
64 bit entries.
Discussed with: dfr, jdp
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().
- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.
This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.
Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob
simply manipulates the pte which faulted instead of traversing the mapping
list for that page. This makes it possible to complete the trap without
needing locks and incidentally improves the accuracy of some statistics
used by the VM system.
blown over by the Hurricane and had a house dropped on you by the Tornado.
Now it's time to have your parade rained on by... the Typhoon!
This commit adds driver support for 3Com 3cR990 10/100 ethernet
adapters based on the Typhoon I and Typhoon II chipsets. This is actually
a port of the OpenBSD driver with many hacks by me.
No Virginia, there isn't any support for the hardware crypto yet. However
there is support for TCP/IP checksum offload and VLANs.
Special thanks go to Jason Wright, Aaron Campbell and Theo de Raadt for
squeezing enough info out of 3Com to get this written, and for doing
most of the hard work.
Manual page is included. Compiled as a module and included in GENERIC.
sure when things got so bad (JHB says preemption worked just fine for months
before the AlbertVM commit). Even post DillionVM locking commit, Miatas
(DEC Personal Workstations) are very fragile -- not making it thru a world
build. With this patch it does.
Those hacking on SMPng will want to locally back out this commit. The rest
of us will want to run with it until the SMPng guys figure out the problem(s).
Submitted by: peter
on Alpha 4100s.
Basically, if you're halting or you're rebooting, you should
tell all other processors to halt first. Define IPI_HALT- IPI_STOP
is not what we want for this purpose, which will call prom_halt(0)
on receipt.
The processor running the halt or reboot wil send an IPI_HALT to all
other processors, delay a bit, then continue to do what what it was
planning on doing (prom_halt({0|1})).
By default, we will end up with a duplicate set of hints if people have
a properly populated /boot/device.hints. So for now, remove the hints
here until Peter revisits the new hints processing from mid-June that
broke Alpha booting.
'dwatch'. The new commands install hardware watchpoints if supported
by the architecture and if there are enough registers to cover the
desired memory area.
No objection by: audit@, hackers@
MFC after: 2 weeks
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.
vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
.cvsignore file for [A-Za-z]* to keep these directories around rather
than waste a file on .keepme. This should also make people's built
trees place nice with CVS.
Idea for .cvsignore: peter (although I suggested the regexp)
Pointed out by: Makoto MATSUSHITA-san <matusita@jp.FreeBSD.org>
Llama's costuming by: Fernamdo Llamas
vm_page_t's.
- Add a KTR_TRAP tracepoint to trap() on the alpha that displays the
contents of a0, a1, and a2 to make debugging of nested traps that
panic before displaying any useful output easier.
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.
Reported by: tegge (signal race), bde (addupc_task a while back)
- Use db_printf() instead of printf().
- Clean up decode_syscall() to use regular if-then-else rather than goto's.
- Use the same method of parsing PID's for per-process traces as the x86
code does: that is, if the address passed in is not a valid kernel
address, treat it is a decimal pid.
- If the pid of the current process is specified, fall back to using the
"default" parameters for the trace as curproc's pcb is not valid at this
point.
MFC after: 1 week
cpu_mp_start() is never called, mp_ncpus will have a non-zero value.
This prevents systat from dying with an arithmatic exception caused
by a divide-by-zero error on UP alphas running a GENERIC kernel.
Replace the a.out emulation of 'struct linker_set' with something
a little more flexible. <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.
The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).
The macros declare a strongly typed set. They return elements with the
type that you declare the set with, rather than a generic void *.
For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.
For a.out, we use the old linker_set struct.
NOTE: the item lists are no longer null terminated. This is why
the code impact is high in certain areas.
The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.
linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.
Reviewed by: eivind
out nearly every platform but the one I tested on requires the intpin
to swizzle out the correct intline.
tested by: Martijn Pronk <mpkisbkl@xs4all.nl> (lca_pci)
greatly improved traceback code from Ross Harvey. This code
requires the use of more traceback friendly temporary labels
at kernel entry points, hence the changes to exception.s and
asm.h
Reviewed by: jhb, dfr
Obtained from: NetBSD
MFC after: 1 week
the interface conversion to platform.pci_intr_route(). I've left the
platform.pci_intr_route() function pointer in place, as well as
alpha_pci_route_interrupt(), but no platform currently implements it.
To work around the removal of alpha_platform_assign_pciintr(cfg);
from the pci probe code, I've hooked in calls to platform.pci_intr_map()
in pcib_read_config (similar to the x86 APIC_IO ifdef in pci_cfgregread)
for every chipset that has a platform which needs it.
While here, I've removed the interupt mapping/routing code from the
AS2x00 platform because its not required (it has never been present in
-stable).
Tested on: UP1000, Miata(GL), XP1000, AS2100, AS500
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs
Requested by: bde
Reviewed by: bde (some time ago)
all alphas with devices behind ppb's. I'm working on a better solution now.
Note that all alphas that use per-platform interrupt mapping are broken
again (as they have been for several months)
breakage:
- call PCIB_ROUTE_INTERRUPT() regardless of how valid the intline looks.
Some alphas leave garbage in the intline and leave the intr mapping
to OS platform support routines that map slots/buses to intlines
- Down in the alpha pci code, first try platform.pci_intr_route() and
if it doesn't exist or returns garbage, just read the intline out of
config space.
tested on AS500 (garbage in intline) and UP1000 (PC-like, intline is valid)
Note that a nice little hack like the APIC_IO section of pci_cfgregread()
is not workable. This is because the calling interface for
alpha_pci_route_interrupt() requires us to figure out the bus/slot/etc
from a device_t. At pci_read_device() time, we don't have a device_t
for the bus/slot/func in question.
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
the chipset. This is already how the multi-hose systems handle resource
allocation and it fixes a bug where dense and bwx memory allocations were
not handled properly.
Reviewed by: gallatin
systems were repo-copied from sys/miscfs to sys/fs.
- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.
- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.
- Install header files for the above file systems.
- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.
flags if it is safe to do so, otherwise it will just alter the pmap state
(eg, clear the appropriate PG_FOx bits).
This gets alpha booting in the face of the vm_mtx introduction.
Reviewed by: dfr
- Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be
toggled after boot.
- Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just
afer the display of the copyright message instead of doing it by hand in
three MD places.
registers better. Hold sched_lock not only for checking the flag but
also while performing the actual operation to ensure the process doesn't
get swapped out by another CPU while we the operation is being performed.
. FD_CLRERR clears the error counter, thus re-enables kernel error
printf()s,
. FD_GSTAT obtains the last FDC operation state, if any,
. FDOPT_NOERRLOG (temporarily) turns off kernel printf() floppy
error logging,
. FDOPT_NOERROR makes the kernel ignore an FDC error, thus can
enable the transfer of an erroneous sector to the user application
All options are being cleared on (last) close.
Prime consumer of the last features will be fdread(1), to be committed
shortly.
(FD_CLRERR should be wired into fdcontrol(8), but then fdcontrol(8)
needs a major rewrite anyway.)
If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.
Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.
other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
been made machine independent and various other adjustments have been made
to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter
Looked over by: eivind
It might be more correct to make stathz as close as possible to 128,
but that would involve adding complexity to the clock intr path, which
I don't want to do.
saves 32 registers) to do on every context switch. This is only required
for SMP, so only do it there.
We should also look at moving the critical enter/exit out to the callers
structure. This field keeps track of how many levels deep we are nested
into the kernel. The nesting level is bumped at the start of a trap,
interrupt, syscall, or exception and is decremented on return. This is
used to detect the case when the kernel is returning back to a kernel
context in exception_return(). If we are returning to the kernel we need
to update the globaldata pointer register saved in the stack frame in case
we have switched CPU's between taking the initial interrupt that saved the
frame and returning. If we don't do this fixup it is possible for a CPU to
use the wrong per-cpu data. On UP systems this is not a problem, so the
code is conditional on SMP.
A count was used instead of simply checking the process status register in
the frame during exception_return() since there are critical sections at
the very start and end of a trap, exception, or interrupt from userland in
which we could trash the t7 register being used in userland. The counter
is incremented after adn before these critical sections respectively so
that we will not overwrite the saved t7 register if we are interrupted
during one of these critical sections.
This is will be required to prevent lowering the ipl when a critical_enter()
is present in the interrupt path when handling a machine check.
reviewed by: jhb
and AS4100s into single user mode. This work was done jointly by jhb and
myself, and builds on dfr's earlier work.
smp_init_secondary() / smp_start_secondary()
- use the uniq val to pass the globalp (me)
- fancy footwork to take any pending machine checks (me)
- doing things the FreeBSD way and getting the per-cpu idleproc created
correctly, and synchronizing the startup of secondaries (jhb)
mp_start()
- better recognition of available cpus (jhb)
smp_rendezvous()
- if smp hasn't started, only run the rendezvous function on the current
cpu. Sleuthing and (prior) incorrect fix by me, correct fix by jhb
smp_handle_ipi()
- more verbose handling of console messages (jhb)
- grab sched lock around setting PS_ASTPENDING (jhb)
forward_*clock()
- commented out. Joint decision by dfr, jhb and myself
General synchronization improvements (more mb()s, etc) (jhb)
Printf cleanups (joint)
Whitespace cleanups (jhb)
- don't do the stack overflow sanity check on MP systems -- p->p_addr
will be malloc'ed memory (not K0SEG) and the check will fail.
- don't ignore clock interrupts on secondaries. Alphas apparently
roundrobin clock interrupts to all cpus, so we're going to take clock
interrupts on all CPUS and not forward them.
- use the unique value to save the per-cpu globalp struct like the
comment says
- don't lower the ipl to ALPHA_PSL_IPL_HIGH: we may have a pending machine
check to take and we're not prepared for that yet, as we haven't setup
our interrupt entry points. (this may only happen on sable/lynx)
- indicate the fact that the working version of smp_init_secondary() doesn't
return (this is tied up in other changes and hasn't yet been committed).
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.
"inside" of locked regions. That is, an acquire atomic operation will
always enforce a memory barrier after the atomic operation and a release
operation will always enforce a memory barrier before the atomic
operation.
- Explicitly use 'mb' instead of 'wmb' in release atomic operations. The
'wmb' memory barrier is not strong enough to guarantee coherence with
other processors. This is effectively a nop since alpha_wmb() actually
performs a 'mb' and not a 'wmb', but I wanted the code to be more
correct since at some point in the future alpha_wmb()'s implementation
may switch to being a real 'wmb'.
we should call ast(). This allows us to branch to a separate Lkernelret
label so we can fixup the saved t7 register in the trapframe. Otherwise
we can run into a problem on SMP systems where a process is interrupted by
a trap or interrupt on one CPU, migrates to another CPU, and then returns
with the t7 in the stack clobbering the CPU's t7. As a result, two CPU's
would both point to the same per-CPU data and things would go downhill from
there.
Sleuthing help by: gallatin
- Add a new ddb command: 'show pcpu' similar to the i386 command added
recently. By default it displays the current CPU's info, but an optional
argument can specify the logical ID of a specific CPU to examine.
badaddr_read(). This fixes 'machine check in pal mode' halts on
ev5 2100As.
MFC candidate -- after spending 6 hours tracking this down, I checked and
discovered that it has been in NetBSD for over a year, so it should be safe
for MFC into 4.3-RELEASE
than a NOP. bounds_check_with_label() would return -1 yet NOT set any
of the bio flags to show an error. This meant the caller would not
properly see that bounds_check_with_label() did not do any work. This
prevented newfs(8) from being able to write a file system on any partition
other than `c' on a `ccd'.
The logs of this file do not tell _why_ bounds_check_with_label() was
emasculated. Nor are there any `XXX' comments. So we'll unemasculated
it, and see what breaks.
Submitted by: gallatin
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.
and change the u_int mtx_saveintr member of struct mtx to a critical_t
mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules. Change all the
_mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
parameters and rename them to use a prefix of two underscores. Inside
of kern_mutex.c, generate wrapper functions for
_mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
that are called from modules. The macros mtx_{un,}lock_{spin,}_flags()
are mapped to the __mtx_* macros inside of the kernel to inline the
usual case of mutex operations and map to the internal _mtx_* functions
in the module case so that modules will use WITNESS and KTR logging if
the kernel is compiled with support for it.
sections.
- Add implementations of the critical_enter() and critical_exit() functions
and remove restore_intr() and save_intr().
- Remove the somewhat bogus disable_intr() and enable_intr() functions on
the alpha as the alpha actually uses a priority level and not simple bit
flag on the CPU.
- If there is no gdb device, just return without trying to return any
value since gdb_handle_exception() returns void.
- When calling prom_halt(), pass in a value telling it to actually halt
and not to randomly choose whether or not to halt or reboot depending on
whatever value happened to be in a0 when the call was made.
an AST results in a signal being delivered, we'll need to do a full register
restore so as to properly setup the signal handler. This is somewhat of
a pessimization, because ast() will be called twice in this case.
This fixes several problems that have been reported where signal intensive
userland apps (most notably dump) have been SEGV'ing for no fault of their
own.
Thanks to Peter Jeremy <peter.jeremy@alcatel.com.au> for presenting the
AST scenario which led to me fiinally figuring this out.
Reviewed by: jhb
Make the name cache hash as well as the nfsnode hash use it.
As a special tweak, create an unsigned version of register_t. This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).
The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.
Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is
around 45% faster with -march=pentiumpro on a p6 cpu.
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks. Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock. At
least on the i386. To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks. Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes. Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.
Consulting from: cp
- Don't try to grab Giant before postsig() in userret() as it is no longer
needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.
Submitted by: dfr
Debugging help: peter
Tested by: me
This lets us run programs containing newer (eg bwx) instructions
on older (eg EV5 and less) machines. One win is that we can
now run Acrobat4 on EV4s and EV5s.
Obtained from: NetBSD
Glanced at by: mjacob
MFS: bring the consistent `compat_3_brand' support
This should fix the linux-related panics reported
by naddy@mips.inka.de (Christian Weisgerber)
Forgotten by: obrien
- Don't hold sched_lock around addupc_task() as this apparently breaks
profiling badly due to sched_lock being held across copyin().
Reported by: bde (2)
work because opt_preemption.h wasn't #include'd. Instead, make use of the
do_switch parameter to ithread_schedule() and do the check in the alpha
interrupt code.
scheduling an interrupt thread to run when needed. This has the side
effect of enabling support for entropy gathering from interrupts on
all architectures.
- Change the software interrupt and x86 and alpha hardware interrupt code
to use ithread_schedule() for most of their processing when scheduling
an interrupt to run.
- Remove the pesky Warning message about interrupt threads having entropy
enabled. I'm not sure why I put that in there in the first place.
- Add more error checking for parameters and change some cases that
returned EINVAL to panic on failure instead via KASSERT().
- Instead of doing a documented evil hack of setting the P_NOLOAD flag
on every interrupt thread whose pri was SWI_CLOCK, set the flag
explicity for clk_ithd's proc during start_softintr().
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.
Please note:
When committing changes to this file, it is important to note that
linux is not freebsd -- their system call numbers (and sometimes names)
are different on different platforms. When in doubt (and you always need
to be) check the arch-specific unistd.h and entry.S files in the linux
kernel sources to see what the syscall numbers really are.
always on curproc. This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).
Debogotified the definition of aston(). aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.
Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.
clear MCPCIA_INT_MASK0 helps things substantially. So, why not indeed?
Rearrange irq and cookie calculation to use shifts/masks instead
of division. Fix things to correctly remember the intpin for that
one in a million non-INTA PCI device.
made no sense in the context of wrapping them within the _SYBRIDGE macro-
or anything like it- so we concluded that this must have been a typo
in the docs. This also doesn't use the same bridge offset as anything
else.
Add some defines for the INT_CTL register.
- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.
Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb
genassym here, but what I've also noticed is that we're dorking
with a mutex directly at assembler level- I'm not sure that this
is wise at this stage in the SMP port- I think it's going to be much
safer for a while to do things in C until SMP wunderkind figure out
what works and slow down this 3 order differential...
it as I was playing with some other ways of doing kernel preemption.
You must still specify the PREEMPTION option in your config file to get a
preemptive kernel.
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)