Commit Graph

10951 Commits

Author SHA1 Message Date
Ed Schouten
91c3cbfe1f Remove redundant code in printf() and vprintf().
printf() and vprintf() are exactly the same, except the way arguments
are passed. Just like we see in other pieces of code (i.e. libc's
printf()), implement printf() using vprintf().

Submitted by:	Christoph Mallon <christoph mallon gmx de>
2009-02-27 13:28:54 +00:00
Ed Schouten
ff7b7d9039 Revert previous commit to subr_prf.c and make it more tidy.
As mentioned by bz and bde, the change I made wasn't the proper way to
fix. Inspired by bde's patch, perform some small cleanups to uprintf().

Reviewed by:	bz
2009-02-27 12:50:25 +00:00
Ed Schouten
69c9eff894 Remove unneeded pointer `ndp'.
Inside do_execve(), we have a pointer `ndp', which always points to
`&nd'. I can imagine a primitive (non-optimizing) compiler to really
reserve space for such a pointer, so just remove the variable and use
`&nd' directly.
2009-02-26 16:32:48 +00:00
Ed Schouten
c90c9021e9 Remove even more unneeded variable assignments.
kern_time.c:
- Unused variable `p'.

kern_thr.c:
- Variable `error' is always caught immediately, so no reason to
  initialize it. There is no way that error != 0 at the end of
  create_thread().

kern_sig.c:
- Unused variable `code'.

kern_synch.c:
- `rval' is always assigned in all different cases.

kern_rwlock.c:
- `v' is always overwritten with RW_UNLOCKED further on.

kern_malloc.c:
- `size' is always initialized with the proper value before being used.

kern_exit.c:
- `error' is always caught and returned immediately. abort2() never
  returns a non-zero value.

kern_exec.c:
- `len' is always assigned inside the if-statement right below it.

tty_info.c:
- `td' is always overwritten by FOREACH_THREAD_IN_PROC().

Found by:	LLVM's scan-build
2009-02-26 15:51:54 +00:00
Ed Schouten
318b1c3fd0 Remove unneeded variable `ocn_mute'.
Found by:	LLVM's scan-build
2009-02-26 13:01:45 +00:00
Ed Schouten
5225593633 Remove unused variables p' and unneeded assignments of rval'.
Found by:	LLVM's scan-build
2009-02-26 13:00:13 +00:00
Ed Schouten
2bbada90c8 Remove redundant assignment of `p'.
`p' is already initialized with `td->td_proc'. Because td is always
curthread, it is safe to initialize it without any locks.

Found by:	LLVM's scan-build
2009-02-26 12:12:34 +00:00
Robert Watson
6efcc2f26a Add static tracing for privilege checking:
priv:kernel:priv_check:priv_ok fires for granted privileges
  priv:kernel:priv_check:priv_errr fires for denied privileges

The first argument is the requested privilege number.  The naming
convention is a little different from the OpenSolaris equivilent
because we can't have '-' in probefunc names, and our privilege
namespace is different.

MFC after:	1 week
2009-02-26 10:56:13 +00:00
Ed Schouten
9e5775857d Silence compiler warning inside our ^T handler.
It turns out we're casting fixpt_t* to int*.

Spotted by:	clang
2009-02-26 10:38:19 +00:00
Ed Schouten
1d952ed28c Use unsigned longs for the TTY's sysctl stats.
Spotted by:	clang
2009-02-26 10:28:32 +00:00
Ed Schouten
1e737f33a0 Don't use PTY name as format string, even though it isn't insecure here.
It's guaranteed that the `name' variable always contains a string of the
form pty[l‐sL‐S][0‐9a‐v], but I'd rather keep the compiler happy (LLVM).
2009-02-26 10:14:10 +00:00
Jamie Gritton
613042491b Add support for methods to the OSD subsystem. Each object type has a
predefined set of methods, which are set in osd_register() and called
via osd_call().  Currently, no methods are defined, though prison
objects will have some in the future.

Expand the locking from a single per-type mutex to three different kinds
of locks (four if you include the requirement that the container
(e.g. prison) be locked when getting/setting data).  This clears up one
existing issue, as well as others added by the method support.

Approved by:	bz (mentor)
2009-02-21 11:15:38 +00:00
Ed Schouten
0eee862a54 Don't make Linux stat() open character devices to resolve its name.
The existing code calls kern_open() to resolve the vnode of a pathname
right after a stat(). This is not correct, because it causes random
character devices to be opened in /dev. This means ls'ing a tape
streamer will cause it to rewind, for example. Changes I have made:

- Add kern_statat_vnhook() to allow binary emulators to `post-process'
  struct stat, using the proper vnode.

- Remove unneeded printf's from stat() and statfs().

- Make the Linuxolator use kern_statat_vnhook(), replacing
  translate_path_major_minor_at().

- Let translate_fd_major_minor() use vp->v_rdev instead of
  vp->v_un.vu_cdev.

Result:

	crw-rw-rw- 1 root root   0, 14 Feb 20 13:54 /dev/ptmx
	crw--w---- 1 root adm  136,  0 Feb 20 14:03 /dev/pts/0
	crw--w---- 1 root adm  136,  1 Feb 20 14:02 /dev/pts/1
	crw--w---- 1 ed   tty  136,  2 Feb 20 14:03 /dev/pts/2

Before this commit, ptmx also had a major number of 136, because it
silently allocated and deallocated a pseudo-terminal. Device nodes that
cannot be opened now have proper major/minor-numbers.

Reviewed by:	kib, netchild, rdivacky (thanks!)
2009-02-20 13:05:29 +00:00
John Baldwin
03964c8e09 Enable caching of negative pathname lookups in the NFS client. To avoid
stale entries, we save a copy of the directory's modification time when
the first negative cache entry was added in the directory's NFS node.
When a negative cache entry is hit during a pathname lookup, the parent
directory's modification time is checked.  If it has changed, all of the
negative cache entries for that parent are purged and the lookup falls
back to using the RPC.  This required adding a new cache_purge_negative()
method to the name cache to purge only negative cache entries for a given
directory.

Submitted by:	mohans, Rick Macklem, Ricardo Labiaga @ NetApp
Reviewed by:	mohans
2009-02-19 22:28:48 +00:00
Ed Schouten
40d05103d8 Squash some small bugs in pts(4).
- Don't return a negative errno when using an unknown ioctl() on a
  pseudo-terminal master device. Be sure to convert ENOIOCTL to ENOTTY,
  just like the TTY layer does.

- Even though we should return st_rdev of the master device node when
  emulating pty(4) devices, FIODGNAME should still return the name of
  the slave device. Otherwise ptsname(3) and ttyname(3) return an
  invalid device name.
2009-02-19 17:54:42 +00:00
Attilio Rao
f8d9048018 - Add a function (fill_kinfo_aggregate()) which aggregates relevant
members for a kinfo entry on a process-wide system.
- Use the newly introduced function in order to fix cases like
  KERN_PROC_PROC where aggregating stats are broken because they just
  consider the first thread in the pool for each process.
  (Note, additively, that KERN_PROC_PROC is rather inaccurate on
  thread-wide informations like the 'state' of the process.  Such
  informations should maybe be invalidated and being forceably discarded
  by the consumers?).
- Simplify the logic of sysctl_out_proc() and adjust the
  fill_kinfo_thread() accordingly.
- Remove checks on the FIRST_THREAD_IN_PROC() being NULL but add
  assertives.

This patch should fix aggregate statistics for KERN_PROC_PROC.
This is one of the reasons why top doesn't use this option and now it
can be use it safely.
ps, when launched in order to display just processes, now should report
correct cpu utilization percentages and times (as opposed by the old
code).

Reviewed by:	jhb, emaste
Sponsored by:	Sandvine Incorporated
2009-02-18 21:52:13 +00:00
Joe Marcus Clarke
0618630015 Remove the printf's when the vnode to be exported for procstat is not a VDIR.
If the file system backing a process' cwd is removed, and procstat -f PID
is called, then these messages would have been printed.  The extra verbosity is
not required in this situation.

Requested by:	kib
Approved by:	kib
2009-02-14 21:55:09 +00:00
Joe Marcus Clarke
03fd9c2092 Change two KASSERTS to printfs and simple returns. Stress testing has
revealed that a process' current working directory can be VBAD if the
directory is removed.  This can trigger a panic when procstat -f PID is
run.

Tested by:	pho
Discovered by:	phobot
Reviewed by:	kib
Approved by:	kib
2009-02-14 21:12:24 +00:00
Andrew Thompson
a1797ef6c8 Remove semicolon left in the last commit
Spotted by:	csjp
2009-02-13 18:51:39 +00:00
John Baldwin
ea77ff0a15 Use shared vnode locks when invoking VOP_READDIR().
MFC after:	1 month
2009-02-13 18:18:14 +00:00
Luigi Rizzo
d4619572b4 Clarify and reimplement the bioq API so that bioq_disksort() has
the correct behaviour (sorting by distance from the current head position
in the scan direction) and bioq_insert_head() and bioq_insert_tail()
have a well defined (and useful) behaviour, especially when intermixed
with calls to bioq_disksort().

In particular:
- fix a bug in the existing bioq_disksort() that did not use the
  current head position correctly;
- redefine semantics of bioq_insert_head() and bioq_insert_tail().
  bioq_insert_tail() can now be used as a barrier
  between previous and subsequent calls to bioq_disksort().

The code is heavily documented in the source code so please refer
to that for the details.

Much of this code comes from Fabio Checconi. Also thanks to Kirk
for feedback on the (re)definition of bioq_insert_tail().

NOTE: in the current tree there is only a handful of files which
intermix calls to bioq_disksort() with bioq_insert_head() and
bioq_insert_tail(). The ordering of the queue in these situation
was not specified (nor easy to figure out) before, so I doubt any
of that code could be affected by the specification of the API.

Also note that the current implementation is significantly simpler
than the previous one (also used in ata_sort_queue()).
It would be useful to reimplement ata_sort_queue() using
the same code used in bioq_disksort().

MFC after:	1 week
2009-02-13 11:36:32 +00:00
Andrew Thompson
24ef070126 Check the exit flag at the start of the taskqueue loop rather than the end. It
is possible to tear down the taskqueue before the thread has run and the
taskqueue loop would sleep forever.

Reviewed by:	sam
MFC after:	1 week
2009-02-13 01:16:51 +00:00
Ed Schouten
c0086bf202 Serialize write() calls on TTYs.
Just like the old TTY layer, the current MPSAFE TTY layer does not make
any attempt to serialize calls of write(). Data is copied into the
kernel in 256 (TTY_STACKBUF) byte chunks. If a write() call occurs at
the same time, the data may interleave. This is especially likely when
the TTY starts blocking, because the output queue reaches the high
watermark.

I've implemented this by adding a new flag, TTY_BUSY_OUT, which is used
to mark a TTY as having a thread stuck in write(). Because I don't want
non-blocking processes to be possibly blocked by a sleeping thread, I'm
still allowing it to bypass the protection. According to this message,
the Linux kernel returns EAGAIN in such cases, but I think that's a
little too restrictive:

	http://kerneltrap.org/index.php?q=mailarchive/linux-kernel/2007/5/2/85418/thread

PR:		kern/118287
2009-02-11 16:28:49 +00:00
Robert Watson
54fffe2d67 Modify fdcopy() so that, during fork(2), it won't copy file descriptors
from the parent to the child process if they have an operation vector
of &badfileops.  This narrows a set of races involving system calls that
allocate a new file descriptor, potentially block for some extended
period, and then return the file descriptor, when invoked by a threaded
program that concurrently invokes fork(2).  Similar approches are used
in both Solaris and Linux, and the wideness of this race was introduced
in FreeBSD when we moved to a more optimistic implementation of
accept(2) in order to simplify locking.

A small race necessarily remains because the fork(2) might occur after
the finit() in accept(2) but before the system call has returned, but
that appears unavoidable using current APIs.  However, this race is
vastly narrower.

The fix can be validated using the newfileops_on_fork regression test.

PR:		kern/130348
Reported by:	Ivan Shcheklein <shcheklein at gmail dot com>
Reviewed by:	jhb, kib
MFC after:	1 week
2009-02-11 15:22:01 +00:00
Warner Losh
c9584ebe61 o Use NULL in pereference to 0 in pointer contexts.
o Use newly minted KOBJMETHOD_END as appropriate
o fix prototype for root_setup_intr.
2009-02-11 04:54:02 +00:00
Alexander Motin
e05e00bcae Check for device_set_devclass() errors and skip driver probe/attach if any.
Attach call without devclass set crashes the system.

On resume AHCI driver sometimes tries to create duplicate adX device.
It is surely his own problem, but IMHO it is not a reason to crash here.
Other reasons are also possible.
2009-02-10 23:22:29 +00:00
Attilio Rao
a1d7ce03ea Scanning all the formats for binary translation of modules loading can
result in errors for a format loading but subsequent correct recognizing
for another format.

File format loading functions should avoid printing any additional
informations but just returning appropriate (and different between each
other) error condition, characterizing different informations.
Additively, the linker should handle appropriately different format
loading errors.

While a general mechanism is desired, fix a simple and common case on
amd64: file type is not recognized for link elf and confuses the linker.
Printout an error if all the registered linker classes can't recognize
and load the module.

Reviewed by:	jhb
Sponsored by:	Sandvine Incorporated
2009-02-10 15:50:19 +00:00
Robert Watson
e2757609ec Remove extra 'comma = 0' in socket state printing code, which otherwise
could lead to an extra comma in output.

Submitted by:   Christoph Mallon <christoph dot mallon at gmx dot de>
2009-02-09 18:19:58 +00:00
Martin Blapp
37e399b26e s/SS_FDREF/SS_NOFDREF/ 2009-02-09 13:29:01 +00:00
Ed Schouten
89d647cb30 Remove a stale comment from the clists code.
We don't support quote bits.
2009-02-09 11:27:56 +00:00
John Baldwin
8941aad19b Tweak the output of VOP_PRINT/vn_printf() some.
- Align the fifo output in fifo_print() with other vn_printf() output.
- Remove the leading space from lockmgr_printinfo() so its output lines up
  in vn_printf().
- lockmgr_printinfo() now ends with a newline, so remove an extra newline
  from vn_printf().
2009-02-06 20:06:48 +00:00
Edward Tomasz Napierala
ec48c16f14 Add KASSERTs to make it easier to debug problems like the one fixed
in r188141.

Reviewed by:	kib,attilio
Approved by:	rwatson (mentor)
Tested by:	pho
Sponsored by:	FreeBSD Foundation
2009-02-06 18:16:01 +00:00
John Baldwin
875b66a05b Expand the scope of the sysctllock sx lock to protect the sysctl tree itself.
Back in 1.1 of kern_sysctl.c the sysctl() routine wired the "old" userland
buffer for most sysctls (everything except kern.vnode.*).  I think to prevent
issues with wiring too much memory it used a 'memlock' to serialize all
sysctl(2) invocations, meaning that only one user buffer could be wired at
a time.  In 5.0 the 'memlock' was converted to an sx lock and renamed to
'sysctl lock'.  However, it still only served the purpose of serializing
sysctls to avoid wiring too much memory and didn't actually protect the
sysctl tree as its name suggested.  These changes expand the lock to actually
protect the tree.

Later on in 5.0, sysctl was changed to not wire buffers for requests by
default (sysctl_handle_opaque() will still wire buffers larger than a single
page, however).  As a result, user buffers are no longer wired as often.
However, many sysctl handlers still wire user buffers, so it is still
desirable to serialize userland sysctl requests.  Kernel sysctl requests
are allowed to run in parallel, however.

- Expose sysctl_lock()/sysctl_unlock() routines to exclusively lock the
  sysctl tree for a few places outside of kern_sysctl.c that manipulate
  the sysctl tree directly including the kernel linker and vfs_register().
- sysctl_register() and sysctl_unregister() require the caller to lock
  the sysctl lock using sysctl_lock() and sysctl_unlock().  The rest of
  the public sysctl API manage the locking internally.
- Add a locked variant of sysctl_remove_oid() for internal use so that
  external uses of the API do not need to be aware of locking requirements.
- The kernel linker no longer needs Giant when manipulating the sysctl
  tree.
- Add a missing break to the loop in vfs_register() so that we stop looking
  at the sysctl MIB once we have changed it.

MFC after:	1 month
2009-02-06 14:51:32 +00:00
John Baldwin
e4d9b9eb18 Drop the kernel linker lock while running SYSUNINIT routines and removing
sysctls during a linker file unload.  We drop the lock when doing similar
operations during a linker file load.  To close races, clear the LINKED
flag before dropping the lock so that the linker file is no longer visible
to userland.

MFC after:	1 week
2009-02-05 23:01:36 +00:00
Attilio Rao
feabc903d9 Add more KTR_VFS logging point in order to have a more effective tracing.
Reviewed by:	brueffer, kib
Tested by:	Gianni Trematerra <giovanni D trematerra A gmail D com>
2009-02-05 15:03:35 +00:00
Ed Schouten
c3328b2ab8 Don't leave the console TTY constantly open.
When we leave the console TTY constantly open, we never reset the
termios attributes. This causes output processing, echoing, etc. not to
be reset to the proper values when going into single user mode after the
system has booted. It also causes nl-to-crnl-conversion not to take
place during shutdown, which causes a `staircase effect'.

This patch adds a new TTY flag, TF_OPENED_CONS, which is set when the
TTY is opened through /dev/console. Because the flags are only used by
the kernel and the pstat(8) utility, I've decided to renumber the TTY
flags. This shouldn't be an issue, because the TTY layer is not yet part
of a stable release.

Reported by:	Mark Atkinson <atkin901 yahoo com>
Tested by:	sepotvin
2009-02-05 14:21:09 +00:00
Jamie Gritton
ca04ba6430 Don't allow creating a socket with a protocol family that the current
jail doesn't support.  This involves a new function prison_check_af,
like prison_check_ip[46] but that checks only the family.

With this change, most of the errors generated by jailed sockets
shouldn't ever occur, at least until jails are changeable.

Approved by:	bz (mentor)
2009-02-05 14:15:18 +00:00
Jamie Gritton
b89e82dd87 Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise.  The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family.  For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes).  Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by:	bz (mentor)
2009-02-05 14:06:09 +00:00
Edward Tomasz Napierala
27dd8057d3 In some situations, mnt_lockref could go negative due to vfs_unbusy() being
called without calling vfs_busy() first.  This made umount(8) hang waiting
for mnt_lockref to become zero, which would never happen.

Reviewed by:	kib
Approved by:	rwatson (mentor)
Reported by:	pho
Found with:	stress2
Sponsored by:	FreeBSD Foundation
2009-02-05 08:46:18 +00:00
Robert Watson
fd4f1ebdfe Remove written-to but never read local variable 'offset' from
soreceive_dgram().

Submitted by:	Christoph Mallon <christoph dot mallon at gmx dot de>
MFC after:	1 week
2009-02-04 20:00:17 +00:00
Ed Schouten
f98f752202 Remove slush space from clists.
Right now we only have a very small amount of drivers that use clists,
but we still allocate 50 cblocks as slush space, which allows drivers to
temporarily overcommit their storage. Most of the drivers don't allow
this anyway.

I've performed the following changes:

- We don't allocate any cblocks on startup.

- I've removed the DDB command, because it has nothing useful to print
  now. You can obtain the amount of allocated blocks by running `vmstat
  -m | grep clist'.

- I've removed cfreecount, which is now unused.

- The old code first tries to allocate using M_NOWAIT, followed by
  M_WAITOK. This doesn't make any sense, so just remove this logic. It
  seems the drivers allow us to sleep anyway.

We can even remove ccmax from clist_alloc_cblocks and c_cbmax from
struct clist, but this breaks binary compatibility.

This reduces the amount of allocated cblocks on my system from 54 to 4.
2009-02-04 17:10:01 +00:00
Ed Schouten
41ba7e9b13 Slightly improve the design of the TTY buffer.
The TTY buffers used the standard <sys/queue.h> lists. Unfortunately
they have a big shortcoming. If you want to have a double linked list,
but no tail pointer, it's still not possible to obtain the previous
element in the list. Inside the buffers we don't need them. This is why
I switched to custom linked list macros. The macros will also keep track
of the amount of items in the list. Because it doesn't use a sentinel,
we can just initialize the queues with zero.

In its simplest form (the output queue), we will only keep two
references to blocks in the queue, namely the head of the list and the
last block in use. All free blocks are stored behind the last block in
use.

I noticed there was a very subtle bug in the previous code: in a very
uncommon corner case, it would uma_zfree() a block in the queue before
calling memcpy() to extract the data from the block.
2009-02-03 19:58:28 +00:00
Warner Losh
2c204a1631 Use NULL in preference to 0 in pointer contexts. 2009-02-03 07:54:42 +00:00
Warner Losh
13b4c4c3a3 Make bioq_disksort have a ANSI-C definition rather than a K&R definition. 2009-02-03 07:53:51 +00:00
Warner Losh
8ed4d9c970 rman_debug should be static, so make it static. 2009-02-03 07:53:08 +00:00
Warner Losh
bada728732 Use ANSI function definition for profil. 2009-02-03 07:52:36 +00:00
Warner Losh
04d17b6283 Prefer ANSI function definitions to K&R ones. 2009-02-03 07:52:07 +00:00
Warner Losh
d710cae75a Use NULL in preference to 0 for pointers. 2009-02-03 07:51:41 +00:00
Warner Losh
4592c621f3 Use NULL in preference to 0 for pointers. 2009-02-03 07:51:11 +00:00
Warner Losh
8260e3a4c0 o Use unsigned for bit fields.
o Use NULL for pointers in preference to 0.
2009-02-03 07:50:41 +00:00