handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.
Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.
Obtained from: dfr (mostly, many tweaks from me).
- Sanity check the mount options list (remove duplicates) with
vfs_sanitizeopts().
- Fix some malloc(0)/free(NULL) bugs.
Reviewed by: rwatson (some time ago)
As this code is not actually used by any of the existing
interfaces, it seems unlikely to break anything (famous
last words).
The internal kernel interface to manipulate these attributes
is invoked using two new IO_ flags: IO_NORMAL and IO_EXT.
These flags may be specified in the ioflags word of VOP_READ,
VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that
you want to do I/O to the normal data part of the file and
IO_EXT means that you want to do I/O to the extended attributes
part of the file. IO_NORMAL and IO_EXT are mutually exclusive
for VOP_READ and VOP_WRITE, but may be specified individually
or together in the case of VOP_TRUNCATE. For example, when
removing a file, VOP_TRUNCATE is called with both IO_NORMAL
and IO_EXT set. For backward compatibility, if neither IO_NORMAL
nor IO_EXT is set, then IO_NORMAL is assumed.
Note that the BA_ and IO_ flags have been `merged' so that they
may both be used in the same flags word. This merger is possible
by assigning the IO_ flags to the low sixteen bits and the BA_
flags the high sixteen bits. This works because the high sixteen
bits of the IO_ word is reserved for read-ahead and help with
write clustering so will never be used for flags. This merge
lets us get away from code of the form:
if (ioflags & IO_SYNC)
flags |= BA_SYNC;
For the future, I have considered adding a new field to the
vattr structure, va_extsize. This addition could then be
exported through the stat structure to allow applications to
find out the size of the extended attribute storage and also
would provide a more standard interface for truncating them
(via VOP_SETATTR rather than VOP_TRUNCATE).
I am also contemplating adding a pathconf parameter (for
concreteness, lets call it _PC_MAX_EXTSIZE) which would
let an application determine the maximum size of the extended
atribute storage.
Sponsored by: DARPA & NAI Labs.
so it needs an explicit #include <machine/frame.h> to get 'struct
trapframe'. The fact that it needs this at this level is rather bogus
but it will not compile without it.
the filelist_lock and check nfiles. This closes a race where we had to
unlock the filedesc to re-lock the filelist_lock.
Reported by: David Xu
Reviewed by: bde (mostly)
after a panic which is not an interrupt thread, or the thread which
caused the panic. Also, remove panicstr checks from msleep() and from
cv_wait() in order to allow threads to go to sleep and yeild the cpu
to the panicing thread, or to an interrupt thread which might
be doing the crashdump.
Reviewed by: jhb (and it was mostly his idea too)
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.
Sponsored by: DARPA & NAI Labs.
semicolons from the end of macros:
#define FOO() bar(a,b,c);
becomes
#define FOO() bar(a,b,c)
Thus requiring the semicolon in the invocation of FOO. This is much
cleaner syntax and more consistent with expectations when writing
function-like things in source.
With both peril-sensitive sunglasses and flame-proof undies on, tighten
up some types, and work around some warnings generated by this. There
are some _horrible_ const/non-const issues in this code.
and a cluster in one shot.
o Introduce MBP_PERSIST and MBP_PERSISTENT control bits to mb_alloc();
MBP_PERSIST means "if you can allocate, then keep the cache lock
held on exit," and MBP_PERSISTENT means "a cache lock is alredy held
on entry, so allocate from the specified (already locked) cache."
They may be used in combination.
o m_getcl() uses the MBP_PERSIST/MBP_PERSISTENT interface so that it
doesn't drop the cache lock in between the mbuf and cluster allocations.
o m_getm(), which takes a size and allocates an mbuf + cluster "best fit"
chain, has been moved from uipc_mbuf.c to subr_mbuf.c and shown how to
use MBP_PERSIST/MBP_PERSISTENT to attempt to do a grouped allocation
without dropping the cache lock in between.
Why this is good: much less bus-locked lock acquires/drops when they're
not needed. Also, prototype for m_getcl():
struct mbuf * m_getcl(int how, short type, int flags);
"how" and "type" are self-explanatory. "flags" may be M_PKTHDR, in
which case m_getcl() will make the mbuf a pkthdr-mbuf.
While I'm in subr_mbuf.c:
o Every exported routine now has a nice comment with a description of
the expected arguments. Eventually, mbuf(9) needs to be re-vamped
but there's still more code to write/finalize before I get to that.
o internal macros have been changed a bit.
o consistently use 'short' for "type." This somehow slipped through
before (that 'type' was sometimes declared as int).
Alfred has been pushing for the MBP_PERSIST{,ENT} thing for almost a
year now. Luigi asked for m_getcl(), and will probably MFC that
part of this commit.
TODO [Related]: teach mb_free() about MBP_PERSIST{, ENT}.
1/ don't need to set td_state to TDS_RUNNING in fork_return.
it's already set in choosethread().
2/ Set a child process state to "normal" as opposed to "new"
when we allow it to be put on the run queue.
Allows child to receive signals from the parent if the parent
runs first and tries to immediatly signal he child.
Submitted by: (part 2) Thomas Moestl <tmoestl@gmx.net>
formulated. The correct states should be:
IDLE: On the idle KSE list for that KSEG
RUNQ: Linked onto the system run queue.
THREAD: Attached to a thread and slaved to whatever state the thread is in.
This means that most places where we were adjusting kse state can go away
as it is just moving around because the thread is..
The only places we need to adjust the KSE state is in transition to and from
the idle and run queues.
Reviewed by: jhb@freebsd.org
filedesc is already locked rather than having chroot() unlock the
filedesc so chroot_refuse_vdir_fds() can immediately relock it.
- Reorder chroot() a bitso that we do the namei lookup before checking
the process's struct filedesc. This closes at least one potential race
and allows us to only acquire the filedsec lock once in chroot().
- Push down Giant slightly into chroot().
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.
A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.
Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit
Reviewed by: peter, jhb
Approved by: peter
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.
Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.
Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.
I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.
I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.
New option: DISABLE_PG_G - In case I missed something.
This allows accton(1) to be used with an append-only file.
PR: 7169
Reported by: Joao Carlos Mendes Luis <jonny@jonny.eng.br>
Reviewed by: bde
Approved by: sheldonh (mentor)
MFC after: 2 weeks
methodology similar to the vm_map_entry splay and the VM splay that Alan
Cox is working on. Extensive testing has appeared to have shown no
increase in overhead.
Disadvantages
Dirties more cache lines during lookups.
Not as fast as a hash table lookup (but still N log N and optimal
when there is locality of reference).
Advantages
vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem
syncer operate more efficiently.
I get to rip out all the old hacks (some of which were mine) that tried
to keep the v_dirtyblkhd tailq sorted.
The per-vnode splay tree should be easier to lock / SMPng pushdown on
vnodes will be easier.
This commit along with another that Alan is working on for the VM page
global hash table will allow me to implement ranged fsync(), optimize
server-side nfs commit rpcs, and implement partial syncs by the
filesystem syncer (aka filesystem syncer would detect that someone is
trying to get the vnode lock, remembers its place, and skip to the
next vnode).
Note that the buffer cache splay is somewhat more complex then other splays
due to special handling of background bitmap writes (multiple buffers with
the same lblkno in the same vnode), and B_INVAL discontinuities between the
old hash table and the existence of the buffer on the v_cleanblkhd list.
Suggested by: alc