10259 Commits

Author SHA1 Message Date
rwatson
0e6bbfc8e3 Regenerate. 2008-01-20 23:44:24 +00:00
rwatson
ff05f9dd9d Use audit events AUE_SHMOPEN and AUE_SHMUNLINK with new system calls
shm_open() and shm_unlink().  More auditing will need to be done for
these calls to capture arguments properly.
2008-01-20 23:43:06 +00:00
rwatson
ff397597d9 Export a type for POSIX SHM file descriptors via kern.proc.filedesc as
used by procstat, or SHM descriptors will show up as type unknown in
userspace.
2008-01-20 19:55:52 +00:00
attilio
caa2ca048b - Introduce the function lockmgr_recursed() which returns true if the
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
  locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
  VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj

BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.

KPI results, obviously, broken so further commits will update manpages
and freebsd version.

Tested by: kris (on UFS and NFS)
2008-01-19 17:36:23 +00:00
rwatson
dccd51b54f Move unlock of global UNIX domain socket lock slightly lower in
unp_connect(): it is expected to return with the lock held, and two
possible error paths otherwise returned with it unlocked.

The fix committed here is slightly different from the patch in the
PR, but along an alternative line suggested in the PR.

PR:		119778
MFC after:	3 days
Submitted by:	James Juran <james dot juran at baesystems dot com>
2008-01-18 19:16:03 +00:00
kib
3628ae460c In the rev. 1.153, the one place for converting minor number to unit
was missed. As result, pty_create_slave() may index out of the names[]
bounds, creating wrong slave tty names.

Tested by:	kensmith
Reviewed by:	jhb
MFC after:	3 days
2008-01-18 18:07:04 +00:00
julian
d6aa139aef refactor code so it can run in a chroot without having to have /dev/mounted
MFC After: 1 week
2008-01-18 17:02:14 +00:00
davidxu
ae86af8218 Make sure reading td_runtime in critical section since thread may be
preempted and td_runtime will be modified.
2008-01-18 13:00:28 +00:00
davidxu
80ec49a2cf Add POSIX clock id CLOCK_THREAD_CPUTIME_ID, this can be used to measure
per-thread runtime in user code.
2008-01-18 07:04:42 +00:00
jhb
7e32513a5b Add 'compat_freebsd[4567]' features corresponding to the kernel options
COMPAT_FREEBSD[4567].

MFC after:	1 week
Requested by:	kris
2008-01-17 22:46:32 +00:00
sam
e443f3b38c promote ath_defrag to m_collapse (and retire private+unused
m_collapse from cxgb)

Reviewed by:	pyun, jhb, kmacy
MFC after:	2 weeks
2008-01-17 21:25:09 +00:00
jhb
e3c7bebe5f Remove a conditional that is always true.
MFC after:	2 weeks
2008-01-17 20:15:15 +00:00
jhb
52da96d26e Add a set of regression tests for the POSIX shm API (shm_open(2) and
shm_unlink(2)).
2008-01-16 15:51:24 +00:00
njl
27739d97dc Remove duplicate cpufreq levels, i.e. ones that are within 25 Mhz of each
other.  The first one survives, the rest are removed.  So far, it appears
only some acpi_perf(4) BIOS tables have these invalid states, but address
this in the core to be sure to handle other potential driver data.

PR:		kern/114722
Tested by:	stefan.lambrev / moneybookers.com
MFC after:	3 days
2008-01-16 01:05:21 +00:00
jeff
7a1cea460a - When executing the 'tryself' branch in sched_pickcpu() look at the
lowest priority on the queue for the current cpu vs curthread's
   priority.  In the case that curthread is waking up many threads of a
   lower priority as would happen with a turnstile_broadcast() or wakeup()
   of many threads this prevents them from all ending up on the current cpu.
 - In sched_add() make the relationship between a scheduled ithread and
   the current cpu advisory rather than strict.  Only give the ithread
   affinity for the current cpu if it's actually being scheduled from
   a hardware interrupt.  This prevents it from migrating when it simply
   blocks on a lock.

Sponsored by:	Nokia
2008-01-15 09:03:09 +00:00
attilio
71b7824213 VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
attilio
8032f44d5f lockmgr() function will return successfully when trying to work under
panic but it won't actually lock anything.
This can lead some paths to reach lockmgr_disown() with inconsistent
lock which will let trigger the relative assertions.

Fix those in order to recognize panic situation and to not trigger.

Reported by: pho
Submitted by: kib
2008-01-11 16:38:12 +00:00
rwatson
f261f9865b Don't zero td_runtime when billing thread CPU usage to the process;
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.

When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.

This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.

There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time.  Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.

Reviewed by:		attilio, emaste, jhb, julian
2008-01-10 22:11:20 +00:00
rwatson
5e4b4e0b32 Remove "lock pushdown" todo item in comment -- I did that for 7.0.
MFC after:	3 weeks
2008-01-10 12:38:17 +00:00
rwatson
ffaa005137 Correct typos in comments.
MFC after:	3 weeks
2008-01-10 12:29:12 +00:00
attilio
18d0a0dd51 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
attilio
c3923db2c0 Fix a last second typo about recent lockmgr_disown() introduction. 2008-01-09 00:02:43 +00:00
attilio
9e182da191 Remove explicit calling of lockmgr() with the NULL argument.
Now, lockmgr() function can only be called passing curthread and the
KASSERT() is upgraded according with this.

In order to support on-the-fly owner switching, the new function
lockmgr_disown() has been introduced and gets used in BUF_KERNPROC().
KPI, so, results changed and FreeBSD version will be bumped soon.
Differently from previous code, we assume idle thread cannot try to
acquire the lockmgr as it cannot sleep, so loose the relative check[1]
in BUF_KERNPROC().

Tested by: kris

[1] kib asked for a KASSERT in the lockmgr_disown() about this
condition, but after thinking at it, as this is a well known general
rule, I found it not really necessary.
2008-01-08 23:48:31 +00:00
jhb
1975c09543 Regen for shm_open(2) and shm_unlink(2). 2008-01-08 22:01:26 +00:00
jhb
8cd9437636 Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
  object which provides the backing store.  Each descriptor starts off with
  a size of zero, but the size can be altered via ftruncate(2).  The shared
  memory file descriptors also support fstat(2).  read(2), write(2),
  ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
  memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
  manage shared memory file descriptors.  The virtual namespace that maps
  pathnames to shared memory file descriptors is implemented as a hash
  table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
  of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
  path argument to shm_open(2).  In this case, an unnamed shared memory
  file descriptor will be created similar to the IPC_PRIVATE key for
  shmget(2).  Note that the shared memory object can still be shared among
  processes by sharing the file descriptor via fork(2) or sendmsg(2), but
  it is unnamed.  This effectively serves to implement the getmemfd() idea
  bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
  collected when they are not referenced by any open file descriptors or
  the shm_open(2) virtual namespace.

Submitted by:	dillon, peter (previous versions)
Submitted by:	rwatson (I based this on his version)
Reviewed by:	alc (suggested converting getmemfd() to shm_open())
2008-01-08 21:58:16 +00:00
jhb
c9a8a65b69 Close a race in the kern.ttys sysctl handler that resulted in panics in
dev2udev() when a tty was being detached concurrently with the sysctl
handler:
- Hold the 'tty_list_mutex' lock while we read all the fields out of the
  struct tty for copying out later.  Previously the pty(4) and pts(4)
  destroy routines could set t_dev to NULL, drop their reference on the
  tty and destroy the cdev while the sysctl handler was attempting to
  invoke dev2udev() on the cdev being destroyed.  This happened when the
  sysctl handler read the value of t_dev prior to it being set to NULL
  either due to it being stale or due to timing races.  By holding the
  list lock we guarantee that the destroy routines will block in ttyrel()
  in that case and not destroy the cdev until after we've copied all of our
  data.  We may see a NULL cdev pointer or we may see the previous value,
  but the previous value will no longer point to a destroyed cdev if we
  see it.
- Fix the ttyfree() routine used by tty device drivers in their detach
  methods to use ttyrel() on the tty so we don't leak them.  Also, fix it
  to use the same order of operations as pty/pts destruction (set t_dev
  NULL, ttyrel(), destroy_dev()) so it cooperates with the sysctl handler.

MFC after:	3 days
Tested by:	avatar
2008-01-08 04:53:28 +00:00
kris
0b7e3f244a Fix logic in skipcount handling (used to sample every 1/N lock operations
to reduce profiling overhead)
2008-01-08 01:11:40 +00:00
rwatson
d08bce9f30 Free MAC label on a POSIX semaphore when the semaphore is freed.
MFC after:	3 days
Submitted by:	jhb
2008-01-07 22:03:19 +00:00
jhb
f8a246b979 Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
  a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
  object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
  vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by:	rwatson
2008-01-07 20:05:19 +00:00
bde
cf4bbeccc5 In sequential_heuristic():
- spell 16384 as 16384 and not as BKVASIZE.  16384 is (not quite) just a
  magic size that works well in practice.  BKVASIZE should be MAXBSIZE
  (65536), but is 16384 because i386's don't have enough kva for it to
  be MAXBSIZE; 16384 works (not so well) for it for much the same reasons
  that it works well in the heuristic.
- expand and/or add comments about this and other details.
- don't explicitly inline this function.
- fix some other style bugs.
2008-01-05 08:54:51 +00:00
peter
1e0f13faf7 Fall back to the binary-specified interpreter (ld-elf.so.1) if the
ABI override binary isn't found.  This could probably be smoother, but
it is what I did in p4 change #126891 on 2007/09/27.  It should solve
the "ld-elf32.so.1"-in-chroot problem.
2008-01-05 08:35:56 +00:00
jeff
ebfbe60329 - Restore timeslicing code for all bit SCHED_FIFO priority classes.
Reported by:	Peter Jeremy <peterjeremy@optushome.com.au>
2008-01-05 04:47:31 +00:00
bz
69daf9ac7f Add missing sb_sndptr* fields to db_print_sockbuf().
While here change %d to %u for u_ints.

Discussed with:	rwatson, kmacy
2008-01-03 15:19:31 +00:00
jeff
811f1d35ef - In sysctl_kern_file skip fdps with negative lastfiles. This can
happen if there are no files open.  Accounting for these can
   eventually return a negative value for olenp causing sysctl to
   crash with a bad malloc.

Reported by:	Pawel Worach <pawel.worach@gmail.com>
2008-01-03 01:26:59 +00:00
obrien
97f31c31f6 Note what is too {short,long}. 2008-01-02 18:48:27 +00:00
jhb
311f35fbe0 A few whitespace fixes. 2008-01-02 17:09:15 +00:00
jeff
d473542286 - Place the fhold() in unp_internalize_fp to be more consistent with refs.
- Clear all of the gc flags before doing a run.  Stale flags were causing
   us to skip some descriptors.
 - If a unp socket has been marked REF in a gc pass it can't be dead.

Found by:	rwatson's test tool.
2008-01-01 01:46:42 +00:00
rodrigc
9cea85864d In vfs_scanopt(), make sure that the mount option value is not NULL
before calling vsscanf().

PR:		118531
Submitted by:	Jaakko Heinonen <jh saunalahti fi>
MFC after:	3 days
2007-12-31 23:44:53 +00:00
jhb
f9f7dedb21 Actually declare the kern.features sysctl node.
Pointy hat to:	jhb
2007-12-31 22:03:57 +00:00
jeff
c8db393cdc - Pause a while after disabling lock profiling and before resetting it
to be sure that all participating CPUs have stopped updating it.
 - Restore the behavior of printing the name of the lock type in the output.
2007-12-31 03:45:51 +00:00
jeff
319a3672d9 - Check the correct variable against NULL in two places.
- If the unp_file is NULL that means it has never been internalized and it
   must be reachable.
2007-12-31 03:44:54 +00:00
imp
95c2febd5a Rather than not redirting the bp when we get ENXIO, only redirty it
when the error is EIO.  This catches a much larger class of errors
that are unlikely to succeed if retried.

Submitted by: bde
2007-12-30 05:53:45 +00:00
jeff
ce18638805 Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
   in such a way that the ops vector is only valid after the data, type,
   and flags are valid.
 - Protect f_flag and f_count with atomic operations.
 - Remove the global list of all files and associated accounting.
 - Rewrite the unp garbage collection such that it no longer requires
   the global list of all files and instead uses a list of all unp sockets.
 - Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by:	kris, pho
2007-12-30 01:42:15 +00:00
alc
4565fa1697 Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages.  (The earlier part was
the rewrite of the physical memory allocator.)  The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64.  (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
2007-12-29 19:53:04 +00:00
rwatson
166f16a0c6 In "show lockedvnods" DDB command, use db_printf() rather than printf()
so that the results end up in the DDB output stream rather than the
console output stream.

This should likely also be done for the vprint() function it calls.

MFC after:	3 months
2007-12-28 00:47:31 +00:00
attilio
c720b77ca9 Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace.
This option just adds complexity and the new implementation no longer
will support it, so axing it now that it is unused is probabilly the
better idea.

FreeBSD version is bumped in order to reflect the KPI breakage introduced
by this patch.

In the ports tree, kris found that only old OSKit code uses it, but as
it is thought to work only on 2.x kernels serie, version bumping will
solve any problem.
2007-12-28 00:38:13 +00:00
attilio
bd42361fd1 In order to avoid a huge class of deadlocks (in particular in interactions
with the interlock), owner of the lock should be only curthread or at
least, for its limited usage, NULL which identifies LK_KERNPROC.

The thread "extra argument" for the lockmgr interface is going to be
removed in the near future, but for the moment, just let kernel run for
some days with this check on in order to find potential deadlocking
places around the kernel and fix them.
2007-12-27 22:56:57 +00:00
rwatson
010adfaab6 Return ESRCH when a kernel stack is queried on a process in execve() --
p_candebug() will return EAGAIN which, if the other process never
leaves execve(), will result in the sysctl spinning and never returning
to userspace.  Processes should always eventually leave execve(), but
spinning in kernel while we wait is bad for countless reasons, and
particularly harmful if execve() itself is deadlocked.

Possibly we should return another error, or return a marker indicating
the thread is in execve() so it can be reported that way in userspace.

Reported by:	kris
2007-12-27 22:44:01 +00:00
attilio
d9b244638e As LK_EXCLUPGRADE is used in conjuction with LK_NOWAIT, LK_UPGRADE becames
equivalent with this and so operate the switch.

That call is the only one remaining LK_EXCLUPGRADE consumer and removing
it will prepare the ground for LK_EXCLUPGRADE axing and further
lockmgr improvements.

Discussed with: jeff, ups
2007-12-27 20:52:05 +00:00
imp
430e46303c A partial solution to some of the 'pull the umass device with a
mounted FS' problems.  These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices.  We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were.  The barn no longer catches fire in this
case, but the horse is still dead :-).

The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO.  In that case, the buffer
is treated like any other clean buffer that's being retured.  ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.

The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system.  If the device is gone, retrying later won't help and we'll
never be able to unmount the device.

These two are part of a larger patch set submitted by the author.  The
other patches will be forth coming.  I added comments to these two
patches.

Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)
2007-12-27 16:38:28 +00:00