by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably
leads to unnecessary pmap_page_protect() calls if one of these pages is
paged out after unwiring.
Note: setting PG_MAPPED asserts that the page's pv list may be
non-empty. Since checking the status of the page's pv list isn't any
harder than checking this flag, the flag should probably be eliminated.
Alternatively, PG_MAPPED could be set by pmap_enter() exclusively
rather than various places throughout the kernel.
<stdint.h>. Previously, parts were defined in <machine/ansi.h> and
<machine/limits.h>. This resulted in two problems:
(1) Defining macros in <machine/ansi.h> gets in the way of that
header only defining types.
(2) Defining C99 limits in <machine/limits.h> adds pollution to
<limits.h>.
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).
identify this gadget on the CPUID result alone, so I intend to activate
the necessary magic (i8254 frequency for instance) for it based on the
precense of the on-chip host to PCI bridge.
machine will result in approximately a 4.2% loss of performance (buildworld)
and approximately a 5% reduction in power consumption (when idle). Add XXX
note on how to really make hlt work (send an IPI to wakeup HLTed cpus on
a thread-schedule event? Generate an interrupt somehow?).
running with interrupts disabled, other cpus locked down, and only
making a temporary local mapping that we immediately back out again.
Tested by: gallatin
-finstrument-functions instead of -mprofiler-epilogue. The former
works essentially the same as the latter but has a higher overhead
(about 22 more bytes per function for passing unused args to the
profiling functions).
Removed all traces of the IDENT Makefile variable, which had been
reduced to just a place for holding profiling's contribution to CFLAGS
(the IDENT that gives the kernel identity was renamed to KERN_IDENT).
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.
A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.
Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit
Reviewed by: peter, jhb
Approved by: peter
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.
Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.
Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.
I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.
I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.
New option: DISABLE_PG_G - In case I missed something.
the default) is now the only method for i386.
Remove the paraphanalia that supported critmode. Remove td_critnest, clean
up the assembly, and clean up (mostly remove) the old junk from
cpu_critical_enter() and cpu_critical_exit().
ipl.s except doreti which really belongs in with the exceptions as it's
just the other side of the same coin. Will remove ipl.s in a separate commit.
Agreed by: several including bde@freebsd.org
hardly MD, since all our platforms share the same macro. It's not
really compiler dependent either, but this helps in reducing
<machine/ansi.h> to only type definitions.
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.
We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.
Use Luigi's hack for preserving the (lack of) priority.
Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.
mappings from the page tables, which were mapped with PG_G! We could
reuse the page table entry for another mapping (pmap_mapdev) but it
would never have cleared any remaining PG_G TLB entries.
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.
While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.
These functions are always called on new memory so they can
not already be set up, so don't bother testing for that.
(This was left over from before we used UMA (which is cool))
and function) with existing configuration choices. Arguably if
ALT_BREAK_TO_DEBUGGER was present, so should have been
BREAK_TO_DEBUGGER. Regardless, it broke the option sort order in
these kernel configuration files.
Requested by: bde
The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
current/freebsd-current)
"Too many pages were prefaulted in pmap_object_init_pt, thus
the wrong physical page was entered in the pmap for the virtual
address where the .dynamic section was supposed to be."
Submitted by: tegge
Approved by: tegge's patches never fail
types are not required, as the overhead is unnecessary:
o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:
o Use a 64-bit type for the vm_object `size' and the size argument
to vm_object_allocate().
o Use the correct type for index variables in dev_pager_getpages(),
vm_object_page_clean() and vm_object_page_remove().
o Avoid an overflow in the i386 pmap_object_init_pt().
sysctl (machdep.cpu_idle_hlt) to off in the SMP case. This allows you to
turn it on if you wish and do not particularly care about the small window
where a cpu will remain halted even when a job is placed on the run queue
(until the next clock tick).
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.
checksumming. These bugs could possibly cause bad code to be
generated at elevated optimization levels.
First, eliminate the use of preprocessor magic to form the address
fields of asm instructions. It hid the actual addresses being
referenced from the compiler. Without knowledge of all the data
dependencies, the compiler might possibly use optimizations which
would result in incorrect code.
Use "__asm __volatile" rather than "__asm" for instruction sequences
that pass information through the condition codes (the carry bit, in
this case). Without __volatile, the compiler might add unrelated
code between consecutive __asm instructions, modifying the condition
codes. I have seen GCC insert stack pointer adjustments in this
way, for example. Unfortunately, GCC doesn't provide a way to
specify dependencies on the condition codes. You can specify that
they are clobbered, but not that you are going to use them as input.
Finally, simplify the LOAD macro. This macro is used as a poor
man's prefetch. The simpler version gives the compiler more leeway
about just how it performs the prefetch.
MFC after: 1 week
when machdep.tsc_freq returned a negative number on a 2.2GHz Xeon.
Submitted by: Brian Harrison <bharrison@ironport.com>
Reviewed by: phk
MFC after: 1 week
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.
Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.
Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
implementations can provide a base zero ffs function if they wish.
This changes
#define RQB_FFS(mask) (ffs64(mask))
foo = RQB_FFS(mask) - 1;
to
#define RQB_FFS(mask) (ffs64(mask) - 1)
foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().
Reviewed by: jake (in principle)
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.
- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.
- Use the ZONE_VM flag on vm map entries and pv entries on x86.
do not blunder around enabling interrupts and running trap handlers.
trap_pfault() will normally pass control to ddb's fault handler which
will normally do the right thing.
This bug is very old. but in old versions of FreeBSD it is probably only
serious for trap handling that involves sleeping. In -current, attempting
to examine unmapped memory while stopped at a breakpoint at mi_switch()
was always fatal.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.
mask on both input and output to fpsetmask(), but this was only done for
input, so fpsetmask() returned the complement of the old mask (ANDed with
the mask bitfield).
PR: 38170
MFC after: 4 weeks
Don't require pin be non-zero before we map bogus intlines, always do it.
This fixes a number of problems on HP Omnibook computers.
Tested/Reviewed by: Brooks Davis
MI API with empty cpu_pause() functions on other arch's, but this
functionality is definitely unique to IA-32, so I decided to leave it
as i386-only and wrap it in #ifdef's. I should have dropped the cpu_
prefix when I made that decision.
Requested by: bde
make_dev() to create device nodes for each of the serial port channels
(ttym%d and cuam%d respectively, as borrowed from MAKEDEV). This allows
the rc driver to work in 5.0. I've tested it with only one card, but
will try sticking in a second card tomorrow and see what happens.
checking, followed by a lookup of the process. Do not call
ptrace() for permission checking, but do it inline.
Spotted by: rwatson
o While here, copy-in arguments before we lock. This fixes
a possible permanent lock.
Reviewed by: rwatson
with 16-bit ints, since u_short is promoted when it is passed to a
varargs function. gcc now warns about this. We always pass small
integers (this is well obuscated), so there are no conversion problems.
Fixed a related style bug (bogus cast).
the case of VM86 calls from the kernel was broken, so this bug was not
a security hole.
PR: 36710
Submitted by: David Xu <davidx@viasoft.com.cn> (version for RELENG_4)
MFC after: 3 days
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)
ever connect a SCSI Cdrom/Tape/Jukebox/Scanner/Printer/kitty-litter-scooper
to your high-end RAID controller. The interface to the arrays is still
via the block interface; this merely provides a way to circumvent the
RAID functionality and access the SCSI buses directly. Note that for
somewhat obvious reasons, hard drives are not exposed to the da driver
through this interface, though you can still talk to them via the pass
driver. Be the first on your block to low-level format unsuspecting
drives that are part of an array!
To enable this, add the 'aacp' device to your kernel config.
MFC after: 3 days
timecounter will be used starting at the next second, which is
good enough for sysctl purposes. If better adjustment is needed
the NTP PLL should be used.
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.
Don't pass the symbol name to elf_reloc(), as it isn't used any more.
2, but that's not the case. This fixes the case where there were slots
in the PIR table that had no bits set, but we assumed they did and used
strange results as a result.
o Map invalid INTLINE registers to 255 in pci_cfgreg.c. This should allow
us to remove the bogus checks in MI code for non-255 values.
I put these changes out for review a while ago, but no one responded
to them, so into current they go.
This should help us work better on machines that don't route
interrupts in the traditional way.
MFC After: 4286 millifortnights
due to conditions that suggest the possible need for stack growth.
This has two beneficial effects: (1) we can
now remove calls to vm_map_growstack() from the MD trap handlers and (2)
simple page faults are faster because we no longer unnecessarily perform
vm_map_growstack() on every page fault.
o Remove vm_map_growstack() from the i386's trap_pfault().
o Remove the acquisition and release of Giant from i386's trap_pfault().
(vm_fault() still acquires it.)
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.
This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.
The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).
Reviewed by: peter
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)
Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
trying to run X on some Athlon systems where the BIOS does odd things
(mines an ASUS A7A266, but it seems to also help on other systems).
Here's a description of the problem and my fix:
The problem with the old MTRR code is that it only expects
to find documented values in the bytes of MTRR registers.
To convert the MTRR byte into a FreeBSD "Memory Range Type"
(mrt) it uses the byte value and looks it up in an array.
If the value is not in range then the mrt value ends up
containing random junk.
This isn't an immediate problem. The mrt value is only used
later when rewriting the MTRR registers. When we finally
go to write a value back again, the function i686_mtrrtype()
searches for the junk value and returns -1 when it fails
to find it. This is converted to a byte (0xff) and written
back to the register, causing a GPF as 0xff is an illegal
value for a MTRR byte.
To work around this problem I've added a new mrt flag
MDF_UNKNOWN. We set this when we read a MTRR byte which
we do not understand. If we try to convert a MDF_UNKNOWN
back into a MTRR value, then the new function, i686_mrt2mtrr,
just returns the old value of the MTRR byte. This leaves
the memory range type unchanged.
I'd like to merge this before the 4.6 code freeze, so if people
can test this with XFree 4 that would be very useful.
PR: 28418, 25958
Tested by: jkh, Christopher Masto <chris@netmonger.net>
MFC after: 2 weeks
the indentation more like the other multi-line assembley in
this file.
Someone who understands gcc constraints could update the
constraints for do_cpuid.
o Recent changes to osigreturn() and sigreturn() have made them MPSAFE. Add
a comment to this effect.
Submitted by: bde (bullet #1)
Reviewed by: jhb (bullet #2)
_BYTE_ORDER. These are far more useful than their non-underscored
equivalents as these can be used in restricted namespace environments.
Mark the non-underscored variants as deprecated.
drivers with MI portions into the MI notes. Device drivers such as busses
like the isa, eisa, and pci devices are now in the MD NOTES section even
though they have some MI code. This will ensure that only the proper bits
of device drivers will be included due to the optional bits dependent on
the busses in sys/conf/files. This commit also takes the stance that since
hints are ignored in NOTES anyways, it is ok to include hints for a bus
that may not be present.
Advice from: bde
PCPU_LAZY_INC() which increments elements in it for cases where we
can afford the occassional inaccuracy. Use of per-cpu stats counters
avoids significant cache stalls in various critical paths that would
otherwise severely limit our cpu scaleability.
Adjust all sysctl's accessing cnt.* elements to now use a procedure
which aggregates the requested field for all cpus and for the global
vmmeter.
The global vmmeter is retained, since some stats counters, like v_free_min,
cannot be made per-cpu. Also, this allows us to convert counters from
the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so
have at it!
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
In the i386 case, options BOOTP requires options NFS_ROOT as well as
options NFSCLIENT. With *both* the NFS options, a bootpc_init()
prototype is brought in by nfsclient/nfsdiskless.h.
In the ia64 case, it just doesn't work and my change just pushes it
further away from working.
Suggested to be wrong by: bde
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.
Avoid locking in userret() in most of the remaining cases.
Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon
Unfortunately, this level doesn't really provide enough granularity. We
probably need several MI NOTES type files for things that are shared by
several architectures but not by all. For example, the PCI options could
live in a NOTES.pci.
This also updates the Makefile for i386 to generate LINT. The only changes
in the generated LINT are the order of various options.
Suggestions for improvement welcome.
in dump byte order (=network byte order). Swap blocksize and dumptime
to avoid extraneous padding on 64-bit architectures. Use CTASSERT
instead of runtime checks to make sure the header is 512 bytes large.
Various style(9) fixes.
Reviewed by: phk, bde, mike
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.
Tested on: alpha, i386
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.
Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.
Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.
Reviewed by: jake