10073 Commits

Author SHA1 Message Date
davidxu
3fef07aaec Regenerate.
Approved by: re(kensmith)
2007-08-16 05:32:26 +00:00
davidxu
0abd045472 Add thr_kill2 syscall which sends a signal to a thread in another process.
Submitted by: Tijl Coosemans tijl at ulyssis dot org
Approved by: re (kensmith)
2007-08-16 05:26:42 +00:00
jhb
7fdc86bfe3 On 6.x this works:
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by:	jhb
Reported by:	Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by:	re (bmah)
Proxy commit for:	rodrigc
2007-08-15 17:40:09 +00:00
pjd
8d074382c8 Improve vn_printf() by:
- adding missing vnode flags,
- printing unknown flags as numbers,
- using strlcat() instead of strcat().

Approved by:	re (bmah)
2007-08-13 21:23:30 +00:00
kib
b236f3d925 Do not call free() while holding vnode interlock.
Reported and tested by:	Peter Holm
Reviewed by:	jeff
Approved by:	re (kensmith)
2007-08-07 09:04:50 +00:00
rwatson
23574c8673 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
jeff
155be8b518 - Fix one line that erroneously crept in my last commit.
Approved by:	re
2007-08-04 01:21:28 +00:00
jeff
8b42bd66ad - Share scheduler locks between hyper-threaded cores to protect the
tdq_group structure.  Hyper-threaded cores won't really benefit from
   seperate locks anyway.
 - Seperate out the migration case from sched_switch to simplify the main
   switch code.  We only migrate here if called via sched_bind().
 - When preempted place the preempted thread back in the same queue at
   the head.
 - Improve the cpu group and topology infrastructure.

Tested by:	many on current@
Approved by:	re
2007-08-03 23:38:46 +00:00
jeff
bfe5819efe - Set SW_PREEMPT when we preempt in critical_exit().
Approved by:	re
2007-08-03 23:35:35 +00:00
rwatson
c29e74320b First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
  debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
  debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
  debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
  which keyed off debug_mpsafenet to determine various aspects of their own
  locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
  no-op's, as their entire behavior was determined by the value in
  debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by:	bz, jhb
Approved by:	re (kensmith)
2007-07-27 11:59:57 +00:00
attilio
c2dedaa0a9 Actually, upcalls cannot be freed while destroying the thread because we
should call uma_zfree() with various spinlock helds.  Rearranging the
code would not help here because we cannot break atomicity respect
prcess spinlock, so the only one choice we have is to defer the operation.
In order to do this use a global queue synchronized through the kse_lock
spinlock which is freed at any thread_alloc() / thread_wait() through a
call to thread_reap().
Note that this approach is not ideal as we should want a per-process
list of zombie upcalls, but it follows initial guidelines of KSE authors.

Tested by: jkim, pav
Approved by: jeff, julian
Approved by: re
2007-07-27 09:21:18 +00:00
pjd
fe74e944d1 When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with:	kib, ups
Approved by:	re (rwatson)
2007-07-26 16:58:09 +00:00
pjd
8dc8d42bdc The v_mountedhere field is protected by the vnode lock, not vnode's internal
lock.

Approved by:	re (rwatson)
2007-07-26 16:52:57 +00:00
attilio
0986b6b87f upcall_free() was only used in kse_GC() which has been removed so it now
results unused; this, with -Werror option of gcc, rise a warning for gcc
which let the buildkernel to be busted.
Fix this removing upcall_free().

Reported by: various
Approved by: jeff
Approved by: re
Pointy hat to: attilio
2007-07-23 23:16:53 +00:00
attilio
ad75d346f7 Actually, KSE kernel bits locking is broken and can lead likely to
dangerous races.
Fix this problems adding correct locking for the members of 'struct
kse_upcall' and other struct proc/struct thread related members.
For the moment, just leave ku_mflag and ku_flags "lazy" locked.
While here, cleanup the code removing the function kse_GC() (unused),
and merging upcall_link(), upcall_unlink(), upcall_stash() in their
respective callers (static functions, very short and only called in one
place).

Reported by: pav
Tested by: pav (on some pointyhat cluster nodes)
Approved by: jeff
Approved by: re
Sponsorized by: NGX Italy (http://www.ngx.it)
2007-07-23 14:52:22 +00:00
dwmalone
e8276674f3 If clock_ct_to_ts fails to convert time time from the real time clock,
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).

Approved by:	re
MFC after:	3 weeks
2007-07-23 09:42:32 +00:00
kib
a8fa279159 ttyfree() frees the cdev(). But if there are pending kevents,
filt_ttyrdetach() etc would later attempt to dereference cdev->si_tty,
causing a 0xdeadc0de dereference.  Change kn_hook value from cdev to
struct tty to avoid dereferencing freed cdev.

In ttygone(), wake up select(), sigio and kevent() users in addition
to the queue sleepers.

Return EV_EOF from kevent filters if TS_GONE is set.

Submitted by:	peter
Tested by:	Peter Holm
Approved by:	re (kensmith)
MFC after:	2 weeks
2007-07-20 09:41:54 +00:00
attilio
636c12b5fb Fix some problems with lock profiling in rw locks:
- Adjust lock_profiling stubs semantic in the hard functions in order to be
  more accurate and trustable
- As for sx locks, disable shared paths for lock_profiling.  Actually,
  lock_profiling has a subtle race which makes results caming from shared
  paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can
  be actually used for re-enabling this paths, but is currently intended
  for developing use only.
- style(9) fixes

Approved by: jeff, kmacy, jhb[1]
Approved by: re

[1] Had initial reservations not shared by others, conceded
    in the end.
2007-07-20 08:43:42 +00:00
jeff
e2ebe96ef4 - Refine the load balancer to improve buildkernel times on dual core
machines.
 - Leave the long-term load balancer running by default once per second.
 - Enable stealing load from the idle thread only when the remote processor
   has more than two transferable tasks.  Setting this to one further
   improves buildworld.  Setting it higher improves mysql.
 - Remove the bogus pick_zero option.  I had not intended to commit this.
 - Entirely disallow migration for threads with SRQ_YIELDING set.  This
   balances out the extra migration allowed for with the load balancers.
   It also makes pick_pri perform better as I had anticipated.

Tested by:	Dmitry Morozovsky <marck@rinet.ru>
Approved by:	re
2007-07-19 20:03:15 +00:00
jeff
550dacee12 - When newtd is specified to sched_switch() it was not being initialized
properly.  We have to temporarily unlock the TDQ lock so we can lock
   the thread and add it to the run queue.  This is used only for KSE.
 - When we add a thread from the tdq_move() via sched_balance() we need to
   ipi the target if it's sitting in the idle thread or it'll never run.

Reported by:	Rene Landan
Approved by:	re
2007-07-19 19:51:45 +00:00
jeff
b7b0163326 - Remove explicit references to sched_lock. A simpler assert will do.
Approved by:	re
2007-07-19 08:58:40 +00:00
jeff
1961c471a3 - Calling sched_nice() in tdsigwakeup() is no longer required by ULE and
actually causes LORs and other panics.

Reported by:	mlaier
Approved by:	re
2007-07-19 08:49:16 +00:00
jeff
e6da269e0c - Remove the global definition of sched_lock in mutex.h to break
new code and third party modules which try to depend on it.
 - Initialize sched_lock in sched_4bsd.c.
 - Declare sched_lock in sparc64 pmap.c and assert that we're compiling
   with SCHED_4BSD to prevent accidental crashes from running ULE.  This
   is the sole remaining file outside of the scheduler that uses the
   global sched_lock.

Approved by:	re
2007-07-18 20:46:06 +00:00
jeff
45491c3f8a - Add the proper lock profiling calls to _thread_lock().
Obtained from:	kipmacy
Approved by:	re
2007-07-18 20:38:13 +00:00
jeff
119fa6328d ULE 3.0: Fine grain scheduler locking and affinity improvements. This has
been in development for over 6 months as SCHED_SMP.
 - Implement one spin lock per thread-queue.  Threads assigned to a
   run-queue point to this lock via td_lock.
 - Improve the facility for assigning threads to CPUs now that sched_lock
   contention no longer dominates scheduling decisions on larger SMP
   machines.
 - Re-write idle time stealing in an attempt to make it less damaging to
   general performance.  This is still disabled by default. See
   kern.sched.steal_idle.
 - Call the long-term load balancer from a callout rather than sched_clock()
   so there are no locks held.  This is disabled by default.  See
   kern.sched.balance.
 - Parameterize many scheduling decisions via sysctls.  Try to document
   these via sysctl descriptions.
 - General structural and naming cleanups.
 - Document each function with comments.

Tested by:	current@ amd64, x86, UP, SMP.
Approved by:	re
2007-07-17 22:53:23 +00:00
jeff
a682b8e2c4 - Use ruxagg() in calcru() to make sure we have current tick information
from all threads.

Discussed with:	bde, attilio
Approved by:	re
2007-07-17 01:08:09 +00:00
rodrigc
81f7b063ec Revert previous commits which I committed by mistake.
Approved by:	re (implicit)
Pointy hat to:	me
2007-07-14 21:23:31 +00:00
rodrigc
def1240ba2 The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by:	re (kensmith)
2007-07-14 21:18:19 +00:00
jhb
ef42e8706b Fix a couple of issues with the stack limit for 32-bit processes on 64-bit
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
  and use that in preference to maxssiz during exec() rather than always
  using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
  RLIMIT_STACK to determine if the existing mapping for the stack needs to
  be grown or shrunk (as well as how much it should be grown or shrunk).

Approved by:	re (kensmith)
2007-07-12 18:01:31 +00:00
attilio
7f10997905 Fix some problems with lock_profiling in sx locks:
- Adjust lock_profiling stubs semantic in the hard functions in order to be
  more accurate and trustable
- Disable shared paths for lock_profiling.  Actually, lock_profiling has a
  subtle race which makes results caming from shared paths not completely
  trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for
  re-enabling this paths, but is currently intended for developing use only.
- Use homogeneous names for automatic variables in hard functions regarding
  lock_profiling
- Style fixes
- Add a CTASSERT for some flags building

Discussed with: kmacy, kris
Approved by: jeff (mentor)
Approved by: re
2007-07-06 13:20:44 +00:00
kib
81a0028ff2 Revert destroy_dev() to the state before destroy_dev_sched() was introduced.
Attempt to spawn destroy_dev_sched() from it causes inadmissible races.

Requested by:	tegge
Approved by:	re (kensmith)
2007-07-05 13:04:59 +00:00
bz
908dadf8c3 Remove netkey directory from cscope/TAGs generation and replace
it with netipsec now that KAME IPsec is gone.
While here add missing netinet6 directories.

Add comments about the ports needed to be able to run those targets.

Reviewed by:	philip
Approved by:	re (rwatson)
2007-07-05 08:55:14 +00:00
peter
c23e287f99 Fix bad function type passed to destroy_dev_sched_cb().
Approved by:  re (rwatson)
2007-07-05 05:54:47 +00:00
peter
fd45cf2a6d Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate
Approved by: re (kensmith)
2007-07-04 22:57:21 +00:00
peter
14c396a779 Regenerate after mmap/lseek/etc syscall changes.
Approved by:  re (kensmith)
2007-07-04 22:49:55 +00:00
peter
f69e26042a Create new syscalls for mmap(), lseek(), pread(), pwrite(), truncate() and
ftruncate(), but without the pad arg.

There are several reasons for this.  Consider 'mmap()'.  On AMD64, the
function call (and syscall) ABI allow for 6 register arguments.  Additional
arguments go on the stack.  mmap(2) has 6 arguments.  However, the syscall
definition has an extra 'int pad' argument.  This pushes it to 7 arguments,
which means one must spill into the memory stack.  Since the kernel API
doesn't match userland API, we have a hack in libc - libc/sys/mmap.c.
This implements the userland API by calling __syscall() with an extra
argument and the pad argument, for a total of 8 args.  This is all
unnecessary and inconvenient for several things, including the kernel's
syscall handler code which now has to handle merging stack arguments with
register arguments.  It is a big deal for certain 3rd party code.

I'm adding libc glue to make the transition totally painless.  I had
intended to mark the old syscalls as COMPAT6, but the potential to shoot
your feet by building a new kernel without COMPAT_FREEBSD6 but with a
slighly older userland was too great.  For now, they have manual
"freebsd6_" prefixes rather than being COMPAT6.  They will go back to
being marked 'COMPAT6' after 7-stable starts.

Approved by: re (kensmith)
2007-07-04 22:47:37 +00:00
peter
33b2e41038 Add support for COMPAT6 syscalls.
Also, change the visibility of compat syscalls a slightly.  Compat
syscalls were missing from 'syscalls.h' entirely.  This additionally adds
them with their compat prefix.  eg: SYS_freebsd6_mmap.

Also, the syscalls.c names strings have different prefixes to differentiate
syscalls. Instead of several "old.mmap" strings, there will now be a
"compat.mmap" and "compat6.mmap" etc.  Before, both would have had the
same "old.mmap" label.

Approved by:  re
2007-07-04 22:38:28 +00:00
kib
bbfd17f7e8 Since cdev mutex is after system map mutex in global lock order, free()
shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as
mutex, thus creating this LOR situation.

Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions
until the unrhdr mutex is dropped. Save the freed items on the ppfree list
instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to
clean the list.
Call clean_unrhdrl() after devfs_create() calls immediately before
dropping cdev mutex. devfs_create() is the only user of the alloc_unrl()
in the tree.

Reviewed by:	phk
Tested by:	Peter Holm
LOR:	80
Approved by:	re (kensmith)
2007-07-04 06:56:58 +00:00
jeff
af5bbfbc7b - Use explicit locking in the various fcntl case statements so that we
can acquire shared filedescriptor locks in the appropriate cases.
 - Remove Giant from calls that issue ioctls.  The ioctl path has been
   mpsafe for some time now.
 - Only acquire giant for VOP_ADVLOCK when the filesystem requires giant.
   advlock is now mpsafe.

Reviewed by:	rwatson
Approved by:	re
2007-07-03 21:26:06 +00:00
jeff
4c779e4b9e - Remove explicit Giant protection from lockf. Use the vnode interlock
to protect this datastructure instead.
 - Preallocate an extra lockf structure in case we want to split a lock
   on insert or delete.
 - msleep() on the vnode interlock when blocking on a lock.

Reviewed by:	rwatson
Approved by:	re
2007-07-03 21:22:58 +00:00
jhb
160998c890 Tweak the low-level MI SMP code some:
- Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and
  smp_rendezvous_action().
- Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and
  smp_rendezvous_action().
- Add an additional synch point in smp_rendezvous() to ensure that all the
  CPUs will always see an up-to-date value of smp_rv_setup_func.

Reviewed by:	attilio
Approved by:	re (kensmith)
Tested on:	alpha, amd64, i386, sparc64 SMP (for several years)
2007-07-03 18:37:06 +00:00
kib
c88cac4c41 Rev. 1.204 and 1.205 got an erronous version of destroy_dev() that
calls destroy_dev_sched() with cdev mutex locked. Commit the code
that was actually tested.

Pointy hat to:	kib
Approved by:	re (implicit)
2007-07-03 18:18:30 +00:00
kib
764028016f Lock Giant and proctree lock around dereferencing p_session->s_ttyvp->v_rdev.
Lock cdev mutex too to close the race with tty being freed.
Relock clone_drain_lock to prevent the LOR with proctree lock, thus
add #include <fs/devfs/devfs_int.h>.

Suggested by:	tegge
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:46:37 +00:00
kib
a1bd6e4f52 Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from pty clone handler.
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:45:52 +00:00
kib
3f255f64e4 Use make_dev_credf(MAKEDEV_REF) instead of make_dev() from the clone handler.
Lock Giant in the clone handler.
Use destroy_dev_sched() explicitely from pty_maybecleanup() and postpone
pty_release() until both master and slave cdevs are destroyed by setting
it as callback for destroy_dev_sched().

Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:44:59 +00:00
kib
c3c1199f3c Automatically detect deadlock condition in destroy_dev(), that is, if
destroy_dev() is called from csw method, and no d_purge driver method is
provided. Transform the direct call to destroy_dev() into destroy_dev_sched().

Reviewed by:	njl (programming interface)
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:43:20 +00:00
kib
0ae42a4095 Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.

destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.

make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().

Reviewed by:	tegge (early versions), njl (programming interface)
Debugging help and testing by:	Peter Holm
Approved by:	re (kensmith)
2007-07-03 17:42:37 +00:00
kib
e711aeee1e Relock the sema_mtxp unconditionally after copyin() for SETALL case in
kern_semctl. Otherwise, later mtx_unlock() can operate on unlocked mutex.

Submitted by:	rdivacky
MFC after:	3 days
Approved by:	re (kensmith)
2007-07-03 15:58:47 +00:00
rwatson
e95a48819f Continue kernel privilege cleanup for 7.0: unstaticize suser_enabled and
stop declaring it in systm.h -- it's used only in kern_priv.c and is not
required elsewhere.

Approved by:	re (kensmith)
2007-07-02 14:03:29 +00:00
rrs
a427d337cf - Add some needed error checking on bad fd passing in the sctp
syscalls.
Approved by:	re@freebsd.org (Ken Smith)
Obtained from:	Weongyo Jeong (weongyo.jeong@gmail.com)
2007-07-02 12:50:53 +00:00