Commit Graph

10431 Commits

Author SHA1 Message Date
Alan Cox
e384d8a89b Initialize the vm object's flags to include OBJ_NOSPLIT, just like the
vm objects that are used by System V shared memory segments.
2008-04-13 21:08:34 +00:00
Attilio Rao
22dd228d5d Use a "rel" memory barrier for disowning the lock as it cames from an
exclusive locking operation.
2008-04-13 01:21:56 +00:00
Attilio Rao
0b0100db88 struct lock_instance and struct lock_list_entry don't need to be in the
public namespace for WITNESS as they are only used internally so just
move them in the private namespace for the subsystem (with all related
supporting definitions).
2008-04-13 01:20:47 +00:00
Poul-Henning Kamp
8d24f82310 fix printf type confusion on amd64 2008-04-12 21:51:54 +00:00
Poul-Henning Kamp
c9ad6040dd Emit summaries of struct c(alender)t(ime) <-> struct timespec conversions
under bootverbose.

Struct ct is used for setting/reading real time clocks and I'm about
to Do Things to some of those, so a bit of preemptive debugging is
in order.

Remove a pointless __inline.
2008-04-12 20:35:56 +00:00
Attilio Rao
e5f94314ad - Re-introduce WITNESS support for lockmgr. About the old implementation
the only one difference is that lockmgr*() functions now accept
  LK_NOWITNESS flag which skips ordering for the instanced calling.
- Remove an unuseful stub in witness_checkorder() (because the above check
  doesn't allow ever happening) and allow witness_upgrade() to accept
  non-try operation too.
2008-04-12 19:57:30 +00:00
Attilio Rao
872b7289fd - Remove a stale comment.
- Add an extra assertion in order to catch malformed requested operations.
2008-04-12 13:56:17 +00:00
Attilio Rao
1859cffaef Add missing stubs for spinlocks cpuset and intrcnt.
Submitted by:	kris
2008-04-12 13:51:18 +00:00
Xin LI
31c50f53da Instead of rolling our own jail number allocation procedure, use
alloc_unr() to do it.

Submitted by:	Ed Schouten <ed 80386 nl>
PR:		kern/122270
MFC after:	1 month
2008-04-11 21:31:15 +00:00
John Baldwin
03c7442d75 Use kthread_exit() to terminate a taskqueue thread rather than kproc_exit()
now that the taskqueue threads are kthreads rather than kprocs.

Reported by:	kris
2008-04-11 17:35:54 +00:00
Jeff Roberson
9b33b154b5 - Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number.  Ignore the irq# for soft intrs.
 - Add support to cpuset for binding hardware interrupts.  This has the
   side effect of binding any ithread associated with the hard interrupt.
   As per restrictions imposed by MD code we can only bind interrupts to
   a single cpu presently.  Interrupts can be 'unbound' by binding them
   to all cpus.

Reviewed by:	jhb
Sponsored by:	Nokia
2008-04-11 03:26:41 +00:00
Pawel Jakub Dawidek
b03d720760 - Use LK_TYPE_MASK where needed. Actually after sys/sys/lockmgr.h:1.69 it is
no longer needed, but for now we still want to be consistent with other
  similar checks in the tree.
- Call ASSERT_VOP_ELOCKED() only when vget() returns 0.

Reviewed by:	jeff
2008-04-09 20:19:55 +00:00
Sam Leffler
6c6eaea6dd Do image loading in a context known to have a root directory:
o create a private task queue thread that sets up root and current
  directories (hooking mountroot event as needed); this is necessary
  because task queue threads are parented from proc0 and it does not
  have a reference to rootvnode (lost when / mounting moved to init)
o bounce image load + unload requests through the private task q so
  we can load images even when the request is made from a thread that
  does not have sufficient context (e.g. task q thread)
o add a check in the task q thread to fail requests before root is
  mounted (just in case)

Reviewed by:	jhb, mlaier, luigi (glance)
MFC after:	1 month
2008-04-09 19:07:48 +00:00
Sam Leffler
00c71fb7c3 o add a mountroot event handler that fires when / is mounted; this information
was lost when root started being mounted by init
o remove SI_SUB_MOUNT_ROOT since it's no longer meaningful

MFC after:	2 weeks
2008-04-08 17:53:33 +00:00
Sam Leffler
175611b668 change taskqueue_start_threads to create threads instead of proc's
Reviewed by:	jhb
2008-04-08 17:48:02 +00:00
Konstantin Belousov
48b05c3f82 Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
    renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by:	rdivacky
Sponsored by:	Google Summer of Code 2007
Tested by:	pho
2008-04-08 09:45:49 +00:00
Attilio Rao
e0f62984c1 - Use a different encoding for lockmgr options: make them encoded by
bit in order to allow per-bit checks on the options flag, in particular
  in the consumers code [1]
- Re-enable the check against TDP_DEADLKTREAT as the anti-waiters
  starvation patch allows exclusive waiters to override new shared
  requests.

[1] Requested by:	pjd, jeff
2008-04-07 14:46:38 +00:00
Don Lewis
8a3724388b vfs_syscalls.c 1.452 mistakenly swapped the behavior of chown() and lchown(). 2008-04-07 00:29:32 +00:00
Attilio Rao
047dd67e96 Optimize lockmgr in order to get rid of the pool mutex interlock, of the
state transitioning flags and of msleep(9) callings.
Use, instead, an algorithm very similar to what sx(9) and rwlock(9)
alredy do and direct accesses to the sleepqueue(9) primitive.

In order to avoid writer starvation a mechanism very similar to what
rwlock(9) uses now is implemented, with the correspective per-thread
shared lockmgrs counter.

This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and
lockmgr_args_rw().  These two are like the 2 "normal" versions, but they
both accept a rwlock as interlock.  In order to realize this, the general
lockmgr manager function "__lockmgr_args()" has been implemented through
the generic lock layer. It supports all the blocking primitives, but
currently only these 2 mappers live.

The patch drops the support for WITNESS atm, but it will be probabilly
added soon. Also, there is a little race in the draining code which is
also present in the current CVS stock implementation: if some sharers,
once they wakeup, are in the runqueue they can contend the lock with
the exclusive drainer.  This is hard to be fixed but the now committed
code mitigate this issue a lot better than the (past) CVS version.
In addition assertive KA_HELD and KA_UNHELD have been made mute
assertions because they are dangerous and they will be nomore supported
soon.

In order to avoid namespace pollution, stack.h is splitted into two
parts: one which includes only the "struct stack" definition (_stack.h)
and one defining the KPI.  In this way, newly added _lockmgr.h can
just include _stack.h.

Kernel ABI results heavilly changed by this commit (the now committed
version of "struct lock" is a lot smaller than the previous one) and
KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction,
so manpages and __FreeBSD_version will be updated accordingly.

Tested by:      kris, pho, jeff, danger
Reviewed by:    jeff
Sponsored by:   Google, Summer of Code program 2007
2008-04-06 20:08:51 +00:00
Jeff Roberson
ce62b59c88 - Correct a major error introduced in the per-cpu timeout commit. Sleep
and wakeup require the same wait channel to function properly.

Found by:	kris
Pointy hat:	me
2008-04-06 11:08:49 +00:00
John Baldwin
8aa9e82e67 Move INTR_FILTER from opt_global.h to its own header. 2008-04-05 20:13:15 +00:00
John Baldwin
1ee1b68792 Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
  'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
  Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
  scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
  Also, don't bother unmasking the interrupt in the post_filter case if
  we never masked it.  The INTR_FILTER case had been doing this by having
  arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
  They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
  both the 'post_filter' and 'post_ithread' hooks to match what the
  non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
  'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
  designed to do even in 5.x).

Glanced at by:	piso
Reviewed by:	marius
Requested by:	marius [1], [5]
Tested on:	amd64, i386, arm, sparc64
2008-04-05 19:58:30 +00:00
Alan Cox
7630c26507 Reintroduce UMA_SLAB_KMAP; however, change its spelling to
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.)  UMA_SLAB_KERNEL is now required by the jumbo frame
allocators.  Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object.  In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT.  This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL.  This change teaches
UMA how to reset the object field to the kernel object.

Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks
2008-04-04 18:41:12 +00:00
Jeff Roberson
00ca09449d - Add sysctls at debug.rwlock to control the behavior of the speculative
spinning when readers hold a lock.  This spinning is speculative because,
   unlike the write case, we can not test whether the owners are running.
 - Add speculative read spinning for readers who are blocked by pending
   writers while a read lock is still held.  This allows the thread to
   spin until the write lock succeeds after which it may spin until the
   writer has released the lock.  This prevents excessive context switches
   when readers and writers both hold the lock for brief periods.

Sponsored by:	Nokia
2008-04-04 10:00:46 +00:00
Jeff Roberson
3bc8c68d9f - Add a Nokia copyright to cpuset to reflect their generous
contribution to this work.
2008-04-04 01:22:04 +00:00
Jeff Roberson
0502fe2e43 - Allow static_boost to specify no boost with '0', traditional kernel
fixed pri boost with '1' or any priority less than the current thread's
   priority with a value greater than two.  Default the boost to
   PRI_MIN_TIMESHARE to prevent regular user-space threads from starving
   threads in the kernel.  This prevents these user-threads from also
   being scheduled as if they are high fixed-priority kernel threads.
 - Restore the setting of lowpri in tdq_choose().  It has to be either here
   or in sched_switch().  I accidentally removed it from both places.

Tested by:	kris
2008-04-04 01:16:18 +00:00
Jeff Roberson
03d17db7d5 - Don't check for the ITHD pri class in tdq_load_add and rem. 4BSD doesn't
do this either.  Simply check P_NOLOAD.  It'd be nice if this was
   in a thread flag so we didn't have an extra cache miss every time we
   add and remove a thread from the run-queue.
2008-04-04 01:04:43 +00:00
Jeff Roberson
e4b1aa6210 - Fix a mis-merge that crept in during the softclock changes.
Spotted by:	jhb
2008-04-04 01:03:23 +00:00
David Xu
44253336b6 let umtxq_busy() only spin on mp machine. make function name
do_rwlock_unlock to be consistent with others.
2008-04-03 11:49:20 +00:00
Jeff Roberson
e8245292a7 - Convert two timeout users to the new callout_reset_curcpu() api.
Sponsored by:	Nokia
2008-04-02 11:21:42 +00:00
Jeff Roberson
8d809d5061 Implement per-cpu callout threads, wheels, and locks.
- Move callout thread creation from kern_intr.c to kern_timeout.c
 - Call callout_tick() on every processor via hardclock_cpu() rather than
   inspecting callout internal details in kern_clock.c.
 - Remove callout implementation details from callout.h
 - Package up all of the global variables into a per-cpu callout structure.
 - Start one thread per-cpu.  Threads are not strictly bound.  They prefer
   to execute on the native cpu but may migrate temporarily if interrupts
   are starving callout processing.
 - Run all callouts by default in the thread for cpu0 to maintain current
   ordering and concurrency guarantees.  Many consumers may not properly
   handle concurrent execution.
 - The new callout_reset_on() api allows specifying a particular cpu to
   execute the callout on.  This may migrate a callout to a new cpu.
   callout_reset() schedules on the last assigned cpu while
   callout_reset_curcpu() schedules on the current cpu.

Reviewed by:	phk
Sponsored by:	Nokia
2008-04-02 11:20:30 +00:00
Konstantin Belousov
35b450291a Add two missed chunks from the rev. 1.210, for the giant_read() and
giant_ioctl().

PR:	kern/122287
MFC after:	3 days
2008-04-02 11:11:58 +00:00
Jeff Roberson
1fd9b6a577 - Destroy the bo mtx when the vnode is destroyed. 2008-04-02 10:40:03 +00:00
David Xu
fadd84c58f Fix compiling problem for amd64. 2008-04-02 05:54:41 +00:00
David Xu
11b1023b7d Er, don't restart a timeout version. 2008-04-02 04:26:59 +00:00
David Xu
1a30511c61 Introduce kernel based userland rwlock. Each umtx chain now has two lists,
one for readers and one for writers, other types of synchronization
object just use first list.

Asked by: jeff
2008-04-02 04:08:37 +00:00
Attilio Rao
b31a149bbb Add rw_try_rlock() and rw_try_wlock() to rwlocks.
These functions try the specified operation (rlocking and wlocking) and
true is returned if the operation completes, false otherwise.

The KPI is enriched by this commit, so __FreeBSD_version bumping and
manpage updating will happen soon.

Requested by:	jeff, kris
2008-04-01 20:31:55 +00:00
Doug Rabson
60cdfde09f Don't try to use an SX lock while holding the vnode interlock.
Sponsored by:	Isilon Systems
2008-04-01 16:07:01 +00:00
Konstantin Belousov
f2296b585e Regen 2008-03-31 12:12:27 +00:00
Konstantin Belousov
7104518b07 Add the openat(), fexecve() and other *at() syscalls to the table.
Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:06:55 +00:00
Konstantin Belousov
632dbc19e2 Implement the fexecve(2) syscall.
Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:05:52 +00:00
Konstantin Belousov
e4193f25cb Implement the
openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2),
	futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2),
	readlinkat(2), renameat(2), symlinkat(2)
syscalls.

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:04:20 +00:00
Konstantin Belousov
57b4252e45 Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
	sponsored by Google Summer of Code 2007
Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 12:01:21 +00:00
Konstantin Belousov
e314f69fff Add the support for the O_EXEC open(2) mode, as specified by the
POSIX Extended API Set Part 2 extension specification.

Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 11:57:18 +00:00
Konstantin Belousov
0a3af16a75 Add the utility function vn_commname() to retrieve the command name
from the vfs namecache, when available.

Reviewed by:	rwatson, rdivacky
Tested by:	pho
2008-03-31 11:53:03 +00:00
Jeff Roberson
a03ee0000e - Consistently return EDEADLK when presented with a new set that is
incompatible with existing bindings.
 - Try to copyout the setid in cpuset() before migrating the proc to the
   setid in case the user has supplied a bad buffer.
 - Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to
   be more descriptive and free cpuset_root to be used as a different
   type of symbol.
 - Make cpuset_root the cpuset_t set of all cpus in the system.  This
   should contain the same bitmask as all_cpus presently.
 - Add a CPU_CMP() macro to compare two sets.
2008-03-30 11:31:14 +00:00
Jeff Roberson
5634d48667 - Don't allow calls to vn_lock() with no lock type requested. Callers
which simply want a reference should use vref().  Callers which want
   to check validity need to hold a lock while performing any action
   based on that validity.  vn_lock() would always release the interlock
   before returning making any action synchronous with the validity check
   impossible.
2008-03-29 23:36:26 +00:00
Jeff Roberson
069c6953a0 - Use vget() to lock the vnode rather than refing without a lock and
locking in separate steps.
2008-03-29 23:30:40 +00:00
Attilio Rao
71072af500 b_waiters cannot be adequately protected by the interlock because it is
dropped after the call to lockmgr() so just revert this approach using
something similar to the precedent one:
BUF_LOCKWAITERS() just checks if there are waiters (not the actual number
of them) and it is based on newly introduced lockmgr_waiters() which
returns if the lockmgr has waiters or not. The name has been choosen
differently by old lockwaiters() in order to not confuse them.

KPI results enriched by this commit so __FreeBSD_version bumping and
manpage update will be happening soon.
'struct buf' also changes, so kernel ABI is disturbed.

Bug found by:	jeff
Approved by:	jeff, kib
2008-03-28 12:30:12 +00:00
John Birrell
fc70c0bdd8 Regen after makesyscalls.sh change. 2008-03-27 01:55:06 +00:00