Commit Graph

10178 Commits

Author SHA1 Message Date
John Baldwin
57b7fe337e Partially revert the previous change. I failed to notice that where
ktruserret() is invoked, an unlocked check of  the per-process queue
is performed inline, thus, we don't lock the ktrace_sx on every userret().

Pointy hat to:	jhb
Approved by:	re (kensmith)
Pointy hat recovered from:	rwatson
2007-08-29 21:17:11 +00:00
John Baldwin
cc479dda4a Rework the routines to convert a 5.x+ statfs structure (with fixed-size
64-bit counters) to a 4.x statfs structure (with long-sized counters).
- For block counters, we scale up the block size sufficiently large so
  that the resulting block counts fit into a the long-sized (long for the
  ABI, so 32-bit in freebsd32) counters.  In 4.x the NFS client's statfs
  VOP did this already.  This can lie about the block size to 4.x binaries,
  but it presents a more accurate picture of the ratios of free and
  available space.
- For non-block counters, fix the freebsd32 stats converter to cap the
  values at INT32_MAX rather than losing the upper 32-bits to match the
  behavior of the 4.x statfs conversion routine in vfs_syscalls.c

Approved by:	re (kensmith)
2007-08-28 20:28:12 +00:00
Randall Stewart
2afb3e849f - During shutdown pending, when the last sack came in and
the last message on the send stream was "null" but still
  there, a state we allow, we could get hung and not clean
  it up and wait for the shutdown guard timer to clear the
  association without a graceful close. Fix this so that
  that we properly clean up.
- Added support for Multiple ASCONF per new RFC. We only
  (so far) accept input of these and cannot yet generate
  a multi-asconf.
- Sysctl'd support for experimental Fast Handover feature. Always
  disabled unless sysctl or socket option changes to enable.
- Error case in add-ip where the peer supports AUTH and ADD-IP
  but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to
  ABORT in this case.
- According to the Kyoto summit of socket api developers
  (Solaris, Linux, BSD). We need to have:
   o non-eeor mode messages be atomic - Fixed
   o Allow implicit setup of an assoc in 1-2-1 model if
     using the sctp_**() send calls - Fixed
   o Get rid of HAVE_XXX declarations - Done
   o add a sctp_pr_policy in hole in sndrcvinfo structure - Done
   o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch!
- Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize
  when we close sending out the data and disabling Nagle.
- Change key concatenation order to match the auth RFC
- When sending OOTB shutdown_complete always do csum.
- Don't send PKT-DROP to a PKT-DROP
- For abort chunks just always checksums same for
  shutdown-complete.
- inpcb_free front state had a bug where in queue
  data could wedge an assoc. We need to just abandon
  ones in front states (free_assoc).
- If a peer sends us a 64k abort, we would try to
  assemble a response packet which may be larger than
  64k. This then would be dropped by IP. Instead make
  a "minimum" size for us 64k-2k (we want at least
  2k for our initack). If we receive such an init
  discard it early without all the processing.
- When we peel off we must increment the tcb ref count
  to keep it from being freed from underneath us.
- handling fwd-tsn had bugs that caused memory overwrites
  when given faulty data, fixed so can't happen and we
  also stop at the first bad stream no.
- Fixed so comm-up generates the adaption indication.
- peeloff did not get the hmac params copied.
- fix it so we lock the addr list when doing src-addr selection
  (in future we need to use a multi-reader/one writer lock here)
- During lowlevel output, we could end up with a _l_addr set
  to null if the iterator is calling the output routine. This
  means we would possibly crash when we gather the MTU info.
  Fix so we only do the gather where we have a src address
  cached.
- we need to be sure to set abort flag on conn state when
  we receive an abort.
- peeloff could leak a socket. Moved code so the close will
  find the socket if the peeloff fails (uipc_syscalls.c)

Approved by:	re@freebsd.org(Ken Smith)
2007-08-27 05:19:48 +00:00
Konstantin Belousov
5114048b63 Destroy the kaio_mtx on the freeing the struct kaioinfo in the
aio_proc_rundown.

Do not allow for zero-length read to be passed to the fo_read file method
by aio.

Reported and tested by:	Peter Holm
Approved by:	re (kensmith)
2007-08-20 11:53:26 +00:00
Jeff Roberson
67e20930bd - Improve runq_findbit_from() which is used by ULE's circular queue. Mask
of the bits we want to ignore on the first pass rather than doing a
   linear scan.  This puts us within a few instructions of the cost of
   runq_findbit() and removes this function from the top of profiling output
   for context switch heavy workloads.

Approved by:	re
2007-08-20 06:36:12 +00:00
Jeff Roberson
9862717afe - Set steal_thresh to log2(ncpus). This improves idle-time load balancing
on 2cpu machines by reducing it to 1 by default.  This improves loaded
   operation on 8cpu machines by increasing it to 3 where the extra idle
   time is not as critical.

Approved by:	re
2007-08-20 06:34:20 +00:00
Nate Lawson
62db376af3 Always call sched_bind(), even if on the CPU in question. It is wrong to
check if we're already on that cpu and skip the bind since the thread could
be migrated off in the meantime.

Suggested by:	jeff
Approved by:	re
2007-08-20 06:28:26 +00:00
Nate Lawson
2145b9d207 Use a different loop variable for the inner loop. This previous reuse could
have caused a hang, but we got lucky with the available multi-CPU states
on actual hardware.

Submitted by:	Bjorn Koenig <bkoenig / alpha-tierchen.de>
Approved by:	re
MFC after:	3 days
2007-08-19 20:34:13 +00:00
David Xu
6ec46f7aa8 Regenerate.
Approved by: re(kensmith)
2007-08-16 05:32:26 +00:00
David Xu
0b1f0611b4 Add thr_kill2 syscall which sends a signal to a thread in another process.
Submitted by: Tijl Coosemans tijl at ulyssis dot org
Approved by: re (kensmith)
2007-08-16 05:26:42 +00:00
John Baldwin
1dc5b1cc56 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
Pawel Jakub Dawidek
354eb80141 Improve vn_printf() by:
- adding missing vnode flags,
- printing unknown flags as numbers,
- using strlcat() instead of strcat().

Approved by:	re (bmah)
2007-08-13 21:23:30 +00:00
Konstantin Belousov
004e08be60 Do not call free() while holding vnode interlock.
Reported and tested by:	Peter Holm
Reviewed by:	jeff
Approved by:	re (kensmith)
2007-08-07 09:04:50 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
Jeff Roberson
3a78f9658b - Fix one line that erroneously crept in my last commit.
Approved by:	re
2007-08-04 01:21:28 +00:00
Jeff Roberson
c47f202b45 - Share scheduler locks between hyper-threaded cores to protect the
tdq_group structure.  Hyper-threaded cores won't really benefit from
   seperate locks anyway.
 - Seperate out the migration case from sched_switch to simplify the main
   switch code.  We only migrate here if called via sched_bind().
 - When preempted place the preempted thread back in the same queue at
   the head.
 - Improve the cpu group and topology infrastructure.

Tested by:	many on current@
Approved by:	re
2007-08-03 23:38:46 +00:00
Jeff Roberson
413ea6f543 - Set SW_PREEMPT when we preempt in critical_exit().
Approved by:	re
2007-08-03 23:35:35 +00:00
Robert Watson
33d2bb9ca3 First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
  debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
  debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
  debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
  which keyed off debug_mpsafenet to determine various aspects of their own
  locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
  no-op's, as their entire behavior was determined by the value in
  debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by:	bz, jhb
Approved by:	re (kensmith)
2007-07-27 11:59:57 +00:00
Attilio Rao
34ed040030 Actually, upcalls cannot be freed while destroying the thread because we
should call uma_zfree() with various spinlock helds.  Rearranging the
code would not help here because we cannot break atomicity respect
prcess spinlock, so the only one choice we have is to defer the operation.
In order to do this use a global queue synchronized through the kse_lock
spinlock which is freed at any thread_alloc() / thread_wait() through a
call to thread_reap().
Note that this approach is not ideal as we should want a per-process
list of zombie upcalls, but it follows initial guidelines of KSE authors.

Tested by: jkim, pav
Approved by: jeff, julian
Approved by: re
2007-07-27 09:21:18 +00:00
Pawel Jakub Dawidek
57fd3d5572 When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with:	kib, ups
Approved by:	re (rwatson)
2007-07-26 16:58:09 +00:00
Pawel Jakub Dawidek
68c1a246ae The v_mountedhere field is protected by the vnode lock, not vnode's internal
lock.

Approved by:	re (rwatson)
2007-07-26 16:52:57 +00:00
Attilio Rao
758b17a100 upcall_free() was only used in kse_GC() which has been removed so it now
results unused; this, with -Werror option of gcc, rise a warning for gcc
which let the buildkernel to be busted.
Fix this removing upcall_free().

Reported by: various
Approved by: jeff
Approved by: re
Pointy hat to: attilio
2007-07-23 23:16:53 +00:00
Attilio Rao
ac8094e4e3 Actually, KSE kernel bits locking is broken and can lead likely to
dangerous races.
Fix this problems adding correct locking for the members of 'struct
kse_upcall' and other struct proc/struct thread related members.
For the moment, just leave ku_mflag and ku_flags "lazy" locked.
While here, cleanup the code removing the function kse_GC() (unused),
and merging upcall_link(), upcall_unlink(), upcall_stash() in their
respective callers (static functions, very short and only called in one
place).

Reported by: pav
Tested by: pav (on some pointyhat cluster nodes)
Approved by: jeff
Approved by: re
Sponsorized by: NGX Italy (http://www.ngx.it)
2007-07-23 14:52:22 +00:00
David Malone
6d8617d42a If clock_ct_to_ts fails to convert time time from the real time clock,
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).

Approved by:	re
MFC after:	3 weeks
2007-07-23 09:42:32 +00:00
Konstantin Belousov
e69aee3117 ttyfree() frees the cdev(). But if there are pending kevents,
filt_ttyrdetach() etc would later attempt to dereference cdev->si_tty,
causing a 0xdeadc0de dereference.  Change kn_hook value from cdev to
struct tty to avoid dereferencing freed cdev.

In ttygone(), wake up select(), sigio and kevent() users in addition
to the queue sleepers.

Return EV_EOF from kevent filters if TS_GONE is set.

Submitted by:	peter
Tested by:	Peter Holm
Approved by:	re (kensmith)
MFC after:	2 weeks
2007-07-20 09:41:54 +00:00
Attilio Rao
6aa294be2c Fix some problems with lock profiling in rw locks:
- Adjust lock_profiling stubs semantic in the hard functions in order to be
  more accurate and trustable
- As for sx locks, disable shared paths for lock_profiling.  Actually,
  lock_profiling has a subtle race which makes results caming from shared
  paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can
  be actually used for re-enabling this paths, but is currently intended
  for developing use only.
- style(9) fixes

Approved by: jeff, kmacy, jhb[1]
Approved by: re

[1] Had initial reservations not shared by others, conceded
    in the end.
2007-07-20 08:43:42 +00:00
Jeff Roberson
28994a5852 - Refine the load balancer to improve buildkernel times on dual core
machines.
 - Leave the long-term load balancer running by default once per second.
 - Enable stealing load from the idle thread only when the remote processor
   has more than two transferable tasks.  Setting this to one further
   improves buildworld.  Setting it higher improves mysql.
 - Remove the bogus pick_zero option.  I had not intended to commit this.
 - Entirely disallow migration for threads with SRQ_YIELDING set.  This
   balances out the extra migration allowed for with the load balancers.
   It also makes pick_pri perform better as I had anticipated.

Tested by:	Dmitry Morozovsky <marck@rinet.ru>
Approved by:	re
2007-07-19 20:03:15 +00:00
Jeff Roberson
08c9a16c4f - When newtd is specified to sched_switch() it was not being initialized
properly.  We have to temporarily unlock the TDQ lock so we can lock
   the thread and add it to the run queue.  This is used only for KSE.
 - When we add a thread from the tdq_move() via sched_balance() we need to
   ipi the target if it's sitting in the idle thread or it'll never run.

Reported by:	Rene Landan
Approved by:	re
2007-07-19 19:51:45 +00:00
Jeff Roberson
56696bd1ab - Remove explicit references to sched_lock. A simpler assert will do.
Approved by:	re
2007-07-19 08:58:40 +00:00
Jeff Roberson
6eeb364b4c - Calling sched_nice() in tdsigwakeup() is no longer required by ULE and
actually causes LORs and other panics.

Reported by:	mlaier
Approved by:	re
2007-07-19 08:49:16 +00:00
Jeff Roberson
6ea38de8aa - Remove the global definition of sched_lock in mutex.h to break
new code and third party modules which try to depend on it.
 - Initialize sched_lock in sched_4bsd.c.
 - Declare sched_lock in sparc64 pmap.c and assert that we're compiling
   with SCHED_4BSD to prevent accidental crashes from running ULE.  This
   is the sole remaining file outside of the scheduler that uses the
   global sched_lock.

Approved by:	re
2007-07-18 20:46:06 +00:00
Jeff Roberson
773890b9a8 - Add the proper lock profiling calls to _thread_lock().
Obtained from:	kipmacy
Approved by:	re
2007-07-18 20:38:13 +00:00
Jeff Roberson
ae7a6b38d5 ULE 3.0: Fine grain scheduler locking and affinity improvements. This has
been in development for over 6 months as SCHED_SMP.
 - Implement one spin lock per thread-queue.  Threads assigned to a
   run-queue point to this lock via td_lock.
 - Improve the facility for assigning threads to CPUs now that sched_lock
   contention no longer dominates scheduling decisions on larger SMP
   machines.
 - Re-write idle time stealing in an attempt to make it less damaging to
   general performance.  This is still disabled by default. See
   kern.sched.steal_idle.
 - Call the long-term load balancer from a callout rather than sched_clock()
   so there are no locks held.  This is disabled by default.  See
   kern.sched.balance.
 - Parameterize many scheduling decisions via sysctls.  Try to document
   these via sysctl descriptions.
 - General structural and naming cleanups.
 - Document each function with comments.

Tested by:	current@ amd64, x86, UP, SMP.
Approved by:	re
2007-07-17 22:53:23 +00:00
Jeff Roberson
fb62eea266 - Use ruxagg() in calcru() to make sure we have current tick information
from all threads.

Discussed with:	bde, attilio
Approved by:	re
2007-07-17 01:08:09 +00:00
Craig Rodrigues
d7f81adbd4 Revert previous commits which I committed by mistake.
Approved by:	re (implicit)
Pointy hat to:	me
2007-07-14 21:23:31 +00:00
Craig Rodrigues
d678780e60 The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by:	re (kensmith)
2007-07-14 21:18:19 +00:00
John Baldwin
59d8f3ff08 Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
  and use that in preference to maxssiz during exec() rather than always
  using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
  RLIMIT_STACK to determine if the existing mapping for the stack needs to
  be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by:	re (kensmith)
2007-07-12 18:01:31 +00:00
Attilio Rao
c1a6d9fa42 Fix some problems with lock_profiling in sx locks:
- Adjust lock_profiling stubs semantic in the hard functions in order to be
  more accurate and trustable
- Disable shared paths for lock_profiling.  Actually, lock_profiling has a
  subtle race which makes results caming from shared paths not completely
  trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for
  re-enabling this paths, but is currently intended for developing use only.
- Use homogeneous names for automatic variables in hard functions regarding
  lock_profiling
- Style fixes
- Add a CTASSERT for some flags building

Discussed with: kmacy, kris
Approved by: jeff (mentor)
Approved by: re
2007-07-06 13:20:44 +00:00
Konstantin Belousov
196a7385ac Revert destroy_dev() to the state before destroy_dev_sched() was introduced.
Attempt to spawn destroy_dev_sched() from it causes inadmissible races.

Requested by:	tegge
Approved by:	re (kensmith)
2007-07-05 13:04:59 +00:00
Bjoern A. Zeeb
f43455fd89 Remove netkey directory from cscope/TAGs generation and replace
it with netipsec now that KAME IPsec is gone.
While here add missing netinet6 directories.

Add comments about the ports needed to be able to run those targets.

Reviewed by:	philip
Approved by:	re (rwatson)
2007-07-05 08:55:14 +00:00
Peter Wemm
22af4cab91 Fix bad function type passed to destroy_dev_sched_cb().
Approved by:  re (rwatson)
2007-07-05 05:54:47 +00:00
Peter Wemm
c2815ad564 Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate
Approved by: re (kensmith)
2007-07-04 22:57:21 +00:00
Peter Wemm
552fbe752f Regenerate after mmap/lseek/etc syscall changes.
Approved by:  re (kensmith)
2007-07-04 22:49:55 +00:00
Peter Wemm
51504d9ac4 Create new syscalls for mmap(), lseek(), pread(), pwrite(), truncate() and
ftruncate(), but without the pad arg.

There are several reasons for this.  Consider 'mmap()'.  On AMD64, the
function call (and syscall) ABI allow for 6 register arguments.  Additional
arguments go on the stack.  mmap(2) has 6 arguments.  However, the syscall
definition has an extra 'int pad' argument.  This pushes it to 7 arguments,
which means one must spill into the memory stack.  Since the kernel API
doesn't match userland API, we have a hack in libc - libc/sys/mmap.c.
This implements the userland API by calling __syscall() with an extra
argument and the pad argument, for a total of 8 args.  This is all
unnecessary and inconvenient for several things, including the kernel's
syscall handler code which now has to handle merging stack arguments with
register arguments.  It is a big deal for certain 3rd party code.

I'm adding libc glue to make the transition totally painless.  I had
intended to mark the old syscalls as COMPAT6, but the potential to shoot
your feet by building a new kernel without COMPAT_FREEBSD6 but with a
slighly older userland was too great.  For now, they have manual
"freebsd6_" prefixes rather than being COMPAT6.  They will go back to
being marked 'COMPAT6' after 7-stable starts.

Approved by: re (kensmith)
2007-07-04 22:47:37 +00:00
Peter Wemm
9f0482e515 Add support for COMPAT6 syscalls.
Also, change the visibility of compat syscalls a slightly.  Compat
syscalls were missing from 'syscalls.h' entirely.  This additionally adds
them with their compat prefix.  eg: SYS_freebsd6_mmap.

Also, the syscalls.c names strings have different prefixes to differentiate
syscalls. Instead of several "old.mmap" strings, there will now be a
"compat.mmap" and "compat6.mmap" etc.  Before, both would have had the
same "old.mmap" label.

Approved by:  re
2007-07-04 22:38:28 +00:00
Konstantin Belousov
09828ba947 Since cdev mutex is after system map mutex in global lock order, free()
shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as
mutex, thus creating this LOR situation.

Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions
until the unrhdr mutex is dropped. Save the freed items on the ppfree list
instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to
clean the list.
Call clean_unrhdrl() after devfs_create() calls immediately before
dropping cdev mutex. devfs_create() is the only user of the alloc_unrl()
in the tree.

Reviewed by:	phk
Tested by:	Peter Holm
LOR:	80
Approved by:	re (kensmith)
2007-07-04 06:56:58 +00:00
Jeff Roberson
f6c1ecca50 - Use explicit locking in the various fcntl case statements so that we
can acquire shared filedescriptor locks in the appropriate cases.
 - Remove Giant from calls that issue ioctls.  The ioctl path has been
   mpsafe for some time now.
 - Only acquire giant for VOP_ADVLOCK when the filesystem requires giant.
   advlock is now mpsafe.

Reviewed by:	rwatson
Approved by:	re
2007-07-03 21:26:06 +00:00
Jeff Roberson
bc02f1d98d - Remove explicit Giant protection from lockf. Use the vnode interlock
to protect this datastructure instead.
 - Preallocate an extra lockf structure in case we want to split a lock
   on insert or delete.
 - msleep() on the vnode interlock when blocking on a lock.

Reviewed by:	rwatson
Approved by:	re
2007-07-03 21:22:58 +00:00
John Baldwin
fb1faf2082 Tweak the low-level MI SMP code some:
- Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and
  smp_rendezvous_action().
- Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and
  smp_rendezvous_action().
- Add an additional synch point in smp_rendezvous() to ensure that all the
  CPUs will always see an up-to-date value of smp_rv_setup_func.

Reviewed by:	attilio
Approved by:	re (kensmith)
Tested on:	alpha, amd64, i386, sparc64 SMP (for several years)
2007-07-03 18:37:06 +00:00
Konstantin Belousov
9d53363bc8 Rev. 1.204 and 1.205 got an erronous version of destroy_dev() that
calls destroy_dev_sched() with cdev mutex locked. Commit the code
that was actually tested.

Pointy hat to:	kib
Approved by:	re (implicit)
2007-07-03 18:18:30 +00:00
Konstantin Belousov
f5baf8d66b Lock Giant and proctree lock around dereferencing p_session->s_ttyvp->v_rdev.
Lock cdev mutex too to close the race with tty being freed.
Relock clone_drain_lock to prevent the LOR with proctree lock, thus
add #include <fs/devfs/devfs_int.h>.

Suggested by:	tegge
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:46:37 +00:00
Konstantin Belousov
8a5d7ef25c Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from pty clone handler.
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:45:52 +00:00
Konstantin Belousov
0a9c2b6db8 Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from the clone handler.
Lock Giant in the clone handler.
Use destroy_dev_sched() explicitely from pty_maybecleanup() and postpone
pty_release() until both master and slave cdevs are destroyed by setting
it as callback for destroy_dev_sched().

Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:44:59 +00:00
Konstantin Belousov
6f0281937b Automatically detect deadlock condition in destroy_dev(), that is, if
destroy_dev() is called from csw method, and no d_purge driver method is
provided. Transform the direct call to destroy_dev() into destroy_dev_sched().

Reviewed by:	njl (programming interface)
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:43:20 +00:00
Konstantin Belousov
de10ffa527 Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.

destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.

make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().

Reviewed by:	tegge (early versions), njl (programming interface)
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:42:37 +00:00
Konstantin Belousov
7aee5992a5 Relock the sema_mtxp unconditionally after copyin() for SETALL case in
kern_semctl. Otherwise, later mtx_unlock() can operate on unlocked mutex.

Submitted by:	rdivacky
MFC after:	3 days
Approved by:	re (kensmith)
2007-07-03 15:58:47 +00:00
Robert Watson
bc6eca2432 Continue kernel privilege cleanup for 7.0: unstaticize suser_enabled and
stop declaring it in systm.h -- it's used only in kern_priv.c and is not
required elsewhere.

Approved by:	re (kensmith)
2007-07-02 14:03:29 +00:00
Randall Stewart
b8709d23c5 - Add some needed error checking on bad fd passing in the sctp
syscalls.
Approved by:	re@freebsd.org (Ken Smith)
Obtained from:	Weongyo Jeong (weongyo.jeong@gmail.com)
2007-07-02 12:50:53 +00:00
Jeff Roberson
03d03260b2 - Use rufetchcalc() rather than calcru() in ttyinfo so that we get
correct system and user time stats.

Approved by:	re
Reported by:	kris
Discussed with:	Attilio
2007-07-01 00:17:59 +00:00
Robert Watson
dc2e1e3fae Use vm_offset_t for kmembase and kmemlimit rather than char *, avoiding
unnecessary casts, and making it possible to compile kern_malloc.c with
strict aliasing.

Submitted by:	rdivacky
Approved by:	re (kensmith)
2007-06-27 13:39:38 +00:00
Attilio Rao
6a0ce57d10 Fix an old standing LOR between callout_lock and sleepqueues chain (which
could lead to a deadlock).
- sleepq_set_timeout acquires callout_lock (via callout_reset()) only
  with sleepq chain lock held
- msleep_spin in _callout_stop_safe lock the sleepqueue chain with
  callout_lock held

In order to solve this don't use msleep_spin in _callout_stop_safe() but
use directly sleepqueues as inline msleep_spin code. Rearrange the
wakeup path in order to have it consistent too.

Reported by: kris (via stress2 test suite)
Tested by: Timothy Redaelli <drizzt@gufi.org>
Reviewed by: jhb
Approved by: jeff (mentor)
Approved by: re
2007-06-26 21:42:01 +00:00
Attilio Rao
f08945a7d2 Introduce a new rwlocks initialization function: rw_init_flags.
This is very similar to sx_init_flags: it initializes the rwlock using
special flags passed as third argument (RW_DUPOK, RW_NOPROFILE,
RW_NOWITNESS, RW_QUIET, RW_RECURSE).
Among these, the most important new feature is probabilly that rwlocks
can be acquired recursively now (for both shared and exclusive paths).

Because of the recursion counter, the ABI is changed.

Tested by: Timothy Redaelli <drizzt@gufi.org>
Reviewed by: jhb
Approved by: jeff (mentor)
Approved by: re
2007-06-26 21:31:56 +00:00
Rong-En Fan
534046e301 - Remove UMAP filesystem. It was disconnected from build three years ago,
and it is seriously broken.

Discussed on:   freebsd-arch@
Approved by:	re (mux)
2007-06-25 05:06:57 +00:00
Konstantin Belousov
9bc911d4a2 devfs_free() calls free_unr(), that may sleep.
Postpone call to devfs_free() after cdev mutex is dropped. Reuse
cdp_list link for queuing devices awaiting deletion in the
cdevp_free_list.

Reported by:	Hans Petter Selasky <hselasky c2i net>
Tested by:	Peter Holm
Approved by:	re (kensmith)
MFC after:	2 weeks
2007-06-19 13:19:23 +00:00
Konstantin Belousov
7550e3eac4 Add the witness warning for free_unr. Function could sleep, thus callers
shall not have any non-sleepable locks held.

Submitted by:	Hans Petter Selasky <hselasky c2i net>
Approved by:	re (kensmith)
2007-06-19 13:13:17 +00:00
Pawel Jakub Dawidek
dfe97ff4a5 We only flush entries related to the given file system. Currently there are
no 'invalid' cache entires - file system is responsible for keeping it that
way. The comment should have been updated in rev.1.25.
2007-06-18 09:28:24 +00:00
Robert Watson
7251b7863c Rather than passing SUSER_RUID into priv_check_cred() to specify when
a privilege is checked against the real uid rather than the effective
uid, instead decide which uid to use in priv_check_cred() based on the
privilege passed in.  We use the real uid for PRIV_MAXFILES,
PRIV_MAXPROC, and PRIV_PROC_LIMIT.  Remove the definition of
SUSER_RUID; there are now no flags defined for priv_check_cred().

Obtained from:	TrustedBSD Project
2007-06-16 23:41:43 +00:00
Marius Strobl
79be8b5082 - Remove zstty spin lock for no longer existing zs(4).
- Move the rtc_mtx spin lock out from under #ifdef SMP as it's just
  not SMP-specific.
- Add a new spin lock pcib_mtx for locking "fast" interrupt handlers
  of host-to-PCI bridge drivers on sparc64.
2007-06-16 23:30:57 +00:00
Jeff Roberson
dda713dfb8 - Fix an off by one error in sched_pri_range.
- In tdq_choose() only assert that a thread does not have too high a
   priority (low value) for the queue we removed it from.  This will catch
   bugs in priority elevation.  It's not a serious error for the thread
   to have too low a priority as we don't change queues in this case as
   an optimization.

Reported by:	kris
2007-06-15 19:33:58 +00:00
Robert Watson
7e273744a6 Remove the restriction that rtprio(2) cannot be used to set the realtime
or idle priority of another process owned by the same user.  This means
that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly
via p_cansched(9) or directly to set realtime/idle privilege, rather than
directly affecting target process authorization.
2007-06-14 23:31:52 +00:00
Robert Watson
b4be6ef22f Only require privilege to set the current time adjustment, not in order to
query it.
2007-06-14 18:37:58 +00:00
Robert Watson
3805385e3d Spell statistics more correctly in comments. 2007-06-14 03:02:33 +00:00
John Baldwin
34a9edafbc Improve the ktrace locking somewhat to reduce overhead:
- Depessimize userret() in kernels where KTRACE is enabled by doing an
  unlocked check of the per-process queue of pending events before
  acquiring any locks.  Previously ktr_userret() unconditionally acquired
  the global ktrace_sx lock on every return to userland for every thread,
  even if ktrace wasn't enabled for the thread.
- Optimize the locking in exit() to first perform an unlocked read of
  p_traceflag to see if ktrace is enabled and only acquire locks and
  teardown ktrace if the test succeeds.  Also, explicitly disable tracing
  before draining any pending events so the pending events actually get
  written out.  The unlocked read is safe because proc lock is acquired
  earlier after single-threading so p_traceflag can't change between then
  and this check (well, it can currently due to a bug in ktrace I will fix
  next, but that race existed prior to this change as well).

Reviewed by:	rwatson
2007-06-13 20:01:42 +00:00
John Baldwin
ce0be64687 Conditionally acquire Giant when dropping a reference on the ktrace vnode
during execve() when turning off tracing due to executing a setuid binary
as non-root.  Previously this could fail to acquire Giant and fail an
assertion if the ktrace file was on a non-MPSAFE filesystem and the
executable was on an MPSAFE filesystem.

MFC after:	3 days
Reported by:	kris
2007-06-13 19:41:47 +00:00
Jeff Roberson
3036ab79e3 - Include opt_sched.h for SCHED_STATS. 2007-06-12 23:27:31 +00:00
Jeff Roberson
671f2709ae - Garbage collect unused concurrency functions. 2007-06-12 19:50:31 +00:00
Jeff Roberson
e7c8d2e9fe - Garbage collect unused concurrency functions.
- Remove unused kse fields from struct proc.
 - Group remaining fields and #ifdef KSE them.
 - Move some kern_kse.c only prototypes out of proc and into kern_kse.

Discussed with:	Julian
2007-06-12 19:49:39 +00:00
Jeff Roberson
fe54587ffa - Move some common code out of sched_fork_exit() and back into fork_exit(). 2007-06-12 07:47:09 +00:00
Jeff Roberson
ff8fbcffcb Solve a complex exit race introduced with thread_lock:
- Add a count of exiting threads, p_exitthreads, to struct proc.
 - Increment p_exithreads when we set the deadthread in thread_exit().
 - When we thread_stash() a deadthread use an atomic to drop the count.
 - Spin until the p_exithreads count reaches 0 in thread_wait().
 - Lock the last exiting thread momentarily to be certain that it has
   exited cpu_throw().
 - Restructure thread_wait().  It does not need a loop as there will only
   ever be one thread.

Tested by:	moose@opera.com
Reported by:	kris, moose@opera.com
2007-06-12 07:24:46 +00:00
Robert Watson
32f9753cfb Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths.  Do, however, move those prototypes to priv.h.

Reviewed by:	csjp
Obtained from:	TrustedBSD Project
2007-06-12 00:12:01 +00:00
Jeff Roberson
efe641b939 - Add a missing PROC_SUNLOCK() in tdsignal() 2007-06-11 23:27:03 +00:00
Olivier Houchard
e411ce026a Re-acquire the PROC_SLOCK before calling calcru(), and release it after,
since calcru() expects it to be locked.

Reviewed by:	attilio
2007-06-11 21:05:41 +00:00
Sam Leffler
68e8e04e93 Update 802.11 wireless support:
o major overhaul of the way channels are handled: channels are now
  fully enumerated and uniquely identify the operating characteristics;
  these changes are visible to user applications which require changes
o make scanning support independent of the state machine to enable
  background scanning and roaming
o move scanning support into loadable modules based on the operating
  mode to enable different policies and reduce the memory footprint
  on systems w/ constrained resources
o add background scanning in station mode (no support for adhoc/ibss
  mode yet)
o significantly speedup sta mode scanning with a variety of techniques
o add roaming support when background scanning is supported; for now
  we use a simple algorithm to trigger a roam: we threshold the rssi
  and tx rate, if either drops too low we try to roam to a new ap
o add tx fragmentation support
o add first cut at 802.11n support: this code works with forthcoming
  drivers but is incomplete; it's included now to establish a baseline
  for other drivers to be developed and for user applications
o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates
  prepending mbufs for traffic generated locally
o add support for Atheros protocol extensions; mainly the fast frames
  encapsulation (note this can be used with any card that can tx+rx
  large frames correctly)
o add sta support for ap's that beacon both WPA1+2 support
o change all data types from bsd-style to posix-style
o propagate noise floor data from drivers to net80211 and on to user apps
o correct various issues in the sta mode state machine related to handling
  authentication and association failures
o enable the addition of sta mode power save support for drivers that need
  net80211 support (not in this commit)
o remove old WI compatibility ioctls (wicontrol is officially dead)
o change the data structures returned for get sta info and get scan
  results so future additions will not break user apps
o fixed tx rate is now maintained internally as an ieee rate and not an
  index into the rate set; this needs to be extended to deal with
  multi-mode operation
o add extended channel specifications to radiotap to enable 11n sniffing

Drivers:
o ath: add support for bg scanning, tx fragmentation, fast frames,
       dynamic turbo (lightly tested), 11n (sniffing only and needs
       new hal)
o awi: compile tested only
o ndis: lightly tested
o ipw: lightly tested
o iwi: add support for bg scanning (well tested but may have some
       rough edges)
o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data
o wi: lightly tested

This work is based on contributions by Atheros, kmacy, sephe, thompsa,
mlaier, kevlo, and others.  Much of the scanning work was supported by
Atheros.  The 11n work was supported by Marvell.
2007-06-11 03:36:55 +00:00
Attilio Rao
393a081d42 Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
2007-06-10 21:59:14 +00:00
Matt Jacob
a659386c7e Remove unused variable. 2007-06-10 01:50:05 +00:00
Matt Jacob
26756b7a58 The new compiler can't quite follow the logic of has_stime and
complains about using uninitialized tags in stime.
2007-06-10 01:49:17 +00:00
Matt Jacob
9b73d2396a Initialized ets to zero. This is arguably a gcc bug in that ets is always
set to rts when timeout is non-NULL and then timevalid is set and ets is
only checked later when timervalid is set.
2007-06-10 01:43:11 +00:00
Attilio Rao
bdf08be439 Fix a bug caming from the committing a pre-merge version of the patch
instead than a post-merge version (respect to another rusage fix).

Reported by: marcel
Approved by: jeff(mentor)
2007-06-10 00:28:41 +00:00
Marcel Moolenaar
55b5660de4 Work around an integer overflow in expression `3 * maxbufspace / 4',
when maxbufspace is larger than INT_MAX / 3. The overflow causes a
hard hang on ia64 when physical memory is sufficiently large (8GB).
2007-06-09 23:41:14 +00:00
Attilio Rao
a1fe14bc33 rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)
2007-06-09 21:48:44 +00:00
Attilio Rao
86a49dea5b Since locking in kern/subr_prof.c is changed a bit, we need nomore of
time_lock spinlock exported.

Approved by: jeff (mentor)
2007-06-09 19:41:14 +00:00
Attilio Rao
a140976eb4 The current rusage code show peculiar problems:
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations

This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.

A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.

Submitted by: jeff
Approved by: jeff (mentor)
2007-06-09 18:56:11 +00:00
Matt Jacob
65d32cd8fb Propagate volatile qualifier to make gcc4.2 happy. 2007-06-09 18:09:37 +00:00
Attilio Rao
e682569165 Remove the MUTEX_WAKE_ALL option and make it the default behaviour for our
mutexes.
Currently we alredy force MUTEX_WAKE_ALL beacause of some problems with the
!MUTEX_WAKE_ALL case (unavioidable priority inversion).
2007-06-08 21:36:52 +00:00
Poul-Henning Kamp
7acfb0af82 Double the WITNESS and DIAGNOSTIC benchmark warnings right before we
go into userland to improve the chances of people noticing them.
2007-06-08 11:47:36 +00:00
Xin LI
7b8c8b858c In getblk(), before gbincore(), use BO_LOCK directly when locking
the bufobj, rather than using VI_LOCK, like what was done with
revision 1.453.
2007-06-08 07:05:08 +00:00
Robert Watson
faef53711b Move per-process audit state from a pointer in the proc structure to
embedded storage in struct ucred.  This allows audit state to be cached
with the thread, avoiding locking operations with each system call, and
makes it available in asynchronous execution contexts, such as deep in
the network stack or VFS.

Reviewed by:	csjp
Approved by:	re (kensmith)
Obtained from:	TrustedBSD Project
2007-06-07 22:27:15 +00:00
John Baldwin
a66fde8d35 - Remove unused variable from create_thread().
- Move kern_thr_*() prototype to <sys/syscallsubr.h> where all the other
  kern_*() prototypes live.
2007-06-07 19:45:19 +00:00
David Xu
42ce445fed Backout experimental adaptive-spin umtx code. 2007-06-06 07:35:08 +00:00
Jeff Roberson
710eacdc5f - Placing the 'volatile' on the right side of the * in the td_lock
declaration removes the need for __DEVOLATILE().

Pointed out by:	tegge
2007-06-06 03:40:47 +00:00
Attilio Rao
d301eb10c7 Fix a problem with not-preemptive kernels caming from mis-merging of
existing code with the new thread_lock patch.
This also cleans up a bit unlock operation for mutexes.

Approved by: jhb, jeff(mentor)
2007-06-05 18:57:09 +00:00
Konstantin Belousov
b95b98b0bd Restore non-SMP build.
Reviewed by:	attilio
2007-06-05 14:20:13 +00:00
Jeff Roberson
95e3a0bca3 - Better fix for previous error; use DEVOLATILE on the td_lock pointer
it can actually sometimes be something other than sched_lock even on
   schedulers which rely on a global scheduler lock.

Tested by:	kan
2007-06-05 04:12:46 +00:00
Jeff Roberson
c219b097af - Pass &sched_lock as the third argument to cpu_switch() as this will
always be the correct lock and we don't get volatile warnings this
   way.

Pointed out by:	kan
2007-06-05 03:46:54 +00:00
Jeff Roberson
36b369163b - Define TDQ_ID() for the !SMP case.
- Default pick_pri to off.  It is not faster in most cases.
2007-06-05 02:53:51 +00:00
Jeff Roberson
8e0185f604 - Remove sched_core.c. The maintainer has lost interest in pursuing this
and it has been neglected in the recent ksegrp removal as well as
   the thread_lock() changes.

Discussed with:	davidxu
2007-06-05 00:12:37 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Jeff Roberson
bd43e47156 Commit 10/14 of sched_lock decomposition.
- Add new spinlocks to support thread_lock() and adjust ordering.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:55:45 +00:00
Jeff Roberson
07a61420ff Commit 9/14 of sched_lock decomposition.
- Attempt to return the ttyinfo() selection algorithm to something sane
   as it has been broken and disabled for some time.  Adapt this algorithm
   in such a way that it does not conflict with per-cpu scheduler locking.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:55:32 +00:00
Jeff Roberson
3c2e44364e Commit 8/14 of sched_lock decomposition.
- Use a global umtx spinlock to protect the sleep queues now that there
   is no global scheduler lock.
 - Use thread_lock() to protect thread state.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:54:50 +00:00
Jeff Roberson
765b2891e8 Commit 7/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.
 - Use a global kse spinlock to protect upcall and thread assignment.  The
   per-process spinlock can not be used because this lock must be acquired
   via mi_switch() where we already hold a thread lock.  The kse spinlock
   is a leaf lock ordered after the process and thread spinlocks.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:54:27 +00:00
Jeff Roberson
11bda9b8d5 Commit 6/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.
 - Replace the tail-end of fork_exit() with a scheduler specific routine
   which can do the appropriate lock manipulations.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:53:34 +00:00
Jeff Roberson
40acdeabab Commit 5/14 of sched_lock decomposition.
- Protect the cp_time tick counts with atomics instead of a global lock.
   There will only be one atomic per tick and this allows all processors
   to execute softclock concurrently.
 - In softclock, protect access to rusage and td_*tick data with the
   thread_lock(), expanding the scope of the thread lock over the whole
   function.
 - Do some creative re-arranging in hardclock() to avoid excess locking.
 - Protect the p_timer fields with the per-process spinlock.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:53:06 +00:00
Jeff Roberson
a54e85fdbf Commit 4/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.
 - Move some common code into thread_suspend_switch() to handle the
   mechanics of suspending a thread.  The locking here is incredibly
   convoluted and should be simplified.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:52:24 +00:00
Jeff Roberson
2502c107ba Commit 3/14 of sched_lock decomposition.
- Add a per-turnstile spinlock to solve potential priority propagation
   deadlocks that are possible with thread_lock().
 - The turnstile lock order is defined as the exact opposite of the
   lock order used with the sleep locks they represent.  This allows us
   to walk in reverse order in priority_propagate and this is the only
   place we wish to multiply acquire turnstile locks.
 - Use the turnstile_chain lock to protect assigning mutexes to turnstiles.
 - Change the turnstile interface to pass back turnstile pointers to the
   consumers.  This allows us to reduce some locking and makes it easier
   to cancel turnstile assignment while the turnstile chain lock is held.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:51:44 +00:00
Jeff Roberson
d72e80f09a Commit 2/14 of sched_lock decomposition.
- Adapt sleepqueues to the new thread_lock() mechanism.
 - Delay assigning the sleep queue spinlock as the thread lock until after
   we've checked for signals.  It is illegal for a thread to return in
   mi_switch() with any lock assigned to td_lock other than the scheduler
   locks.
 - Change sleepq_catch_signals() to do the switch if necessary to simplify
   the callers.
 - Simplify timeout handling now that locking a sleeping thread has the
   side-effect of locking the sleepqueue.  Some previous races are no
   longer possible.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:50:56 +00:00
Jeff Roberson
7b20fb19fb Commit 1/14 of sched_lock decomposition.
- Move all scheduler locking into the schedulers utilizing a technique
   similar to solaris's container locking.
 - A per-process spinlock is now used to protect the queue of threads,
   thread count, suspension count, p_sflags, and other process
   related scheduling fields.
 - The new thread lock is actually a pointer to a spinlock for the
   container that the thread is currently owned by.  The container may
   be a turnstile, sleepqueue, or run queue.
 - thread_lock() is now used to protect access to thread related scheduling
   fields.  thread_unlock() unlocks the lock and thread_set_lock()
   implements the transition from one lock to another.
 - A new "blocked_lock" is used in cases where it is not safe to hold the
   actual thread's lock yet we must prevent access to the thread.
 - sched_throw() and sched_fork_exit() are introduced to allow the
   schedulers to fix-up locking at these points.
 - Add some minor infrastructure for optionally exporting scheduler
   statistics that were invaluable in solving performance problems with
   this patch.  Generally these statistics allow you to differentiate
   between different causes of context switches.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-04 23:50:30 +00:00
Attilio Rao
b4b7081961 Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:45:18 +00:00
Attilio Rao
6759608248 Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
  given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)
2007-06-04 21:38:48 +00:00
David Malone
041b706b2f Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
David Malone
df82ff50ed Add a function for exporting 64 bit types. 2007-06-04 18:14:28 +00:00
Kris Kennaway
cdcc788a7e Revert some debugging KTRs that were added during development. 2007-06-03 18:24:31 +00:00
Konstantin Belousov
7a31868ed0 Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.

Noted by:	rwatson
Pointy hat to:	kib
2007-06-01 14:33:11 +00:00
Jeff Roberson
1c4bcd050a - Move rusage from being per-process in struct pstats to per-thread in
td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)
2007-06-01 01:12:45 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Paolo Pisati
3401f2c1df In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb
2007-05-31 19:25:35 +00:00
Konstantin Belousov
9e223287c0 Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by:	jhb
Reviewed by:	daichi (unionfs)
Approved by:	re (kensmith)
2007-05-31 11:51:53 +00:00
Robert Watson
049c3b6cdf Now that sx(9) locks support an interruptible lock acquire primitive,
properly observe the SB_NOINTR flag in sblock.  This restores the
required behavior that lock acquisition be interruptible on the socket
buffer I/O serialization lock to allow threads waiting for I/O to be
signaled even if they aren't the thread currently holding the I/O lock.
With this change, the sblock regression test is again passed.

Reported by:		alfred
sx(9) handiwork:	attilio
2007-05-31 11:51:22 +00:00
Attilio Rao
f9819486e5 Add functions sx_xlock_sig() and sx_slock_sig().
These functions are intended to do the same actions of sx_xlock() and
sx_slock() but with the difference to perform an interruptible sleep, so
that sleep can be interrupted by external events.
In order to support these new featueres, some code renstruction is needed,
but external API won't be affected at all.

Note: use "void" cast for "int" returning functions in order to avoid tools
like Coverity prevents to whine.

Requested by: rwatson
Tested by: rwatson
Reviewed by: jhb
Approved by: jeff (mentor)
2007-05-31 09:14:48 +00:00
Attilio Rao
2c7289cbfa style(9) fixes for sx locks.
Approved by: jeff (mentor)
2007-05-29 19:46:37 +00:00
Attilio Rao
acf840c4bd Add a small fix for lock profiling in sx locks.
"0" cannot be a correct value since when the function is entered at least
one shared holder must be present and since we want the last one "1" is
the correct value.
Note that lock_profiling for sx locks is far from being perfect.
Expect further fixes for that.

Approved by: jeff (mentor)
2007-05-29 19:34:32 +00:00
Attilio Rao
02b0a160dc Fix some problems introduced with the last descriptors tables locking
patch:
- Do the correct test for ldt allocation
- Drop dt_lock just before to call kmem_free (since it acquires blocking
  locks inside)
- Solve a deadlock with smp_rendezvous() where other CPU will wait
  undefinitively for dt_lock acquisition.
- Add dt_lock in the WITNESS list of spinlocks

While applying these modifies, change the requirement for user_ldt_free()
making that returning without dt_lock held.

Tested by: marcus, tegge
Reviewed by: tegge
Approved by: jeff (mentor)
2007-05-29 18:55:41 +00:00
Robert Watson
03c96c3176 Add DDB "show unpcb" command, allowing DDB to print out many pertinent
details from UNIX domain socket protocol layer state.
2007-05-29 12:36:00 +00:00
Ed Maste
911d16b8cd Revert 1.197 and instead avoid calling kdb_enter() if the KDB_UNATTENDED
option is in use.
2007-05-28 21:50:54 +00:00
Warner Losh
cfa7a8beea Simplify the kernel configuration file return code.
Reviewed by: wkoszek
2007-05-28 20:41:10 +00:00
Ed Maste
1e62d77c09 Eliminate explicit kdb_enter in the software watchdog handler (which
produced incorrect behaviour with the KDB_UNATTENDED option) and call
panic in both the KDB and non-KDB cases.  This change is consistent
with rwatson's current kdb/ddb work.
2007-05-28 19:51:12 +00:00
Robert Watson
dede2ab3b2 In kern_kevent(), unconditionally fdrop() fp once fget() has succeeded,
as we never have an opportunity to set it to NULL.

Found with:	Coverity Prevent(tm)
CID:		2161
2007-05-28 17:15:05 +00:00
Robert Watson
e1e8f51b85 Universally adopt most conventional spelling of acquire. 2007-05-27 20:50:23 +00:00
Robert Watson
87066f04c6 Select a more appealing spelling for the word acquire. 2007-05-27 19:24:00 +00:00
Robert Watson
1c293049d9 Add parens around *free in *free++ in mbp_count() so that mbp_count()
actually works.  mbp_count() turns out only to be used in debugging code
in if_patm_intr.c, so this bug did not affect much in practice.

Found with:	Coverity Prevent(tm)
CID:		1943
2007-05-27 17:38:36 +00:00
Robert Watson
097e1ea87f Remove amountpipes counter for pipes -- this replicates the function of
existing UMA statistics for pipes, and allows us to get rid of both the
per-pipe dtor and two atomic operations per pipe required to maintain
the counter.
2007-05-27 17:33:10 +00:00
Robert Watson
e4e80aa713 Remove #if 0'd check for 0-size allocations, which if enabled, called
kdb_enter().
2007-05-27 13:13:46 +00:00
Pawel Jakub Dawidek
6e042171bd To avoid a deadlock when handling .. directory during a lookup, we unlock
parent vnode and relock it after locking child vnode. The problem was that
we always relock it exclusively, even when it was share-locked.

Discussed with:	jeff
2007-05-25 22:23:38 +00:00
Pawel Jakub Dawidek
b4c85af977 We no longer need to put namecache entries onto temporary mplist.
It was useful in revision 1.86, but should have been removed in 1.89.
2007-05-25 22:19:49 +00:00
Pawel Jakub Dawidek
950afe9972 The cache_leaf_test() function seems to be unused, so remove it. 2007-05-25 22:16:17 +00:00
Sam Leffler
3c86b7cdad fix comment typo 2007-05-23 17:28:21 +00:00
Robert Watson
4dec0e67ea Comment that tdsignal() may be entered from the debugger. 2007-05-23 17:27:42 +00:00
Robert Watson
63d69d2592 Initialize time_lock before calling cpu_initclocks(). This corrects a
race condition in which hardclock fires before the mutex is initialized
leading to a "corrupt spinlock" panic.

Submitted by:	attilio
2007-05-23 17:27:01 +00:00
Olivier Houchard
302e130edc Remove duplicate includes.
Submitted by:   Cyril Nguyen Huu <cyril ci0 org>
2007-05-23 13:36:02 +00:00
Pawel Jakub Dawidek
f013ccb768 - Remove redundant initialization.
- Compare pointer with NULL.
2007-05-22 23:05:48 +00:00