Commit Graph

10806 Commits

Author SHA1 Message Date
Warner Losh
53366abd02 Close, but not eliminate, a race condition. It is one that properly
designed drivers would never hit, but was exposed in diving into
another problem...

When expanding the devclass array, free the old memory after updating
the pointer to the new memory.  For the following single race case,
this helps:

	allocate new memory
	copy to new memory
	free old memory
<interrupt>				read pointer to freed memory
	update pointer to new memory

Now we do
	allocate new memory
	copy to new memory
	update pointer to new memory
	free old memory

Which closes this problem, but doesn't even begin to address the
multicpu races, which all should be covered by Giant at the moment,
but likely aren't completely.

Note: reviewers were ok with this fix, but suggested the use case
wasn't one we wanted to encourage.

Reviewed by:	jhb, scottl.
2008-10-10 17:49:47 +00:00
Konstantin Belousov
387ad99800 If the ABI-overriden interpreter was not loaded, do not set
have_interp to TRUE. This allows the code in image activator to try
/libexec/ld-elf.so.1 as interpreter when newinterp is not found to
execute.

Reviewed by:	peter
MFC after:	2 weeks (together with r175105)
2008-10-08 11:11:36 +00:00
Robert Watson
e298cf5902 Remove stale comment (and XXX saying so) about why we zero the file
descriptor pointer in unp_freerights: we can no longer recurse into
unp_gc due to unp_gc being invoked in a deferred way, but it's still
a good idea.

MFC after:	3 days
2008-10-08 06:26:51 +00:00
Robert Watson
fa9402f28a Differentiate pr_usrreqs for stream and datagram UNIX domain sockets, and
employ soreceive_dgram for the datagram case.

MFC after:	3 months
2008-10-08 06:19:49 +00:00
Robert Watson
ff601c3645 In soreceive_dgram, when a 0-length buffer is passed into recv(2) and
no data is ready, return 0 rather than blocking or returning EAGAIN.
This is consistent with the behavior of soreceive_generic (soreceive)
in earlier versions of FreeBSD, and restores this behavior for UDP.

Discussed with:	jhb, sam
MFC after:	3 days
2008-10-07 20:57:55 +00:00
Robert Watson
ffe72750d9 Remove temporary debugging KASSERT's introduced to detect protocols
improperly invoking sosend(), soreceive(), and sopoll() instead of
attach either specialized or _generic() versions of those functions
to their pru_sosend, pru_soreceive, and pru_sopoll protosw methods.

MFC after:	3 days
2008-10-07 09:57:03 +00:00
Robert Watson
7978014d3a Rewrite sbreserve_locked()'s comment on NULL thread pointers, eliminating
an XXXRW about the comment being stale.

MFC after:	3 days
2008-10-07 09:51:39 +00:00
Robert Watson
58f7ce962c Lock receive socket buffer in soo_stat() rather than commenting that we
should lock it, which may marginally improve the consistency of the
results.  Remove comment.

MFC after:	3 days
2008-10-07 07:10:28 +00:00
Robert Watson
2c8995842c Now that portalfs doesn't directly invoke uipc_connect2(), make it a
static symbol.

MFC after:	3 days
2008-10-06 18:43:11 +00:00
Sam Leffler
73254c9ee7 dynamically allocate the task structure in firmware_mountroot: when
booting from an MFS root (e.g. from an install CD) firmware_mountroot
can be called twice with the second call happening before the task
callback occurs; this results in the task structure contents being
corrupted because it was declared static.

Submitted by:	marius (original version)
2008-10-04 23:58:02 +00:00
John Baldwin
4ec9dc4de2 Oops, missed updating a place with with 's/lock1/plock/' when adding
interlock support to WITNESS.  Specifically, the printf listing the
first location when duplicate locks of the same type are acquired.

Reported by:	pho
2008-10-03 18:13:05 +00:00
Robert Watson
0b36cd25fc Further minor cleanups to UNIX domain sockets:
- Staticize and locally prototype functions uipc_ctloutput(), unp_dispose(),
  unp_init(), and unp_externalize(), none of which have been required
  outside of uipc_usrreq.c since uipc_proto.c was removed.
- Remove stale prototype for uipc_usrreq(), which has not existed in the
  code since 1997
- Forward declare and staticize uipc_usrreqs structure in uipc_usrreq.c and
  not un.h.
- Comment on why uipc_connect2() is still non-static -- it is used directly
  by fifofs.
- Remove stale comments, tidy up whitespace.

MFC after:	3 days (where applicable)
2008-10-03 13:01:56 +00:00
Robert Watson
60a5ef26a1 Remove or update several stale comments.
A bit of whitespace/style cleanup.

Update copyright.

MFC after:	3 days (applicable changes)
2008-10-03 09:01:55 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Peter Wemm
e6592ee55c Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment.  Textdump magic can be passed in.
2008-10-01 22:08:53 +00:00
John Baldwin
b957a8225a Enable shared locks for path name lookups on supported filesystems (NFS
client, UFS, and ZFS) by default.
2008-10-01 19:25:37 +00:00
John Baldwin
d59701d07d Remove the LOOKUP_SHARED kernel option. Instead, make vfs.lookup_shared
a loader tunable (it was already a sysctl).
2008-10-01 19:24:16 +00:00
John Baldwin
1af1c6cd8a Wait until after dropping the receive socket buffer lock to allocate space
to store the socket address stored in the first mbuf in a packet chain.
This reduces contention on the lock and CPU system time in certain UDP
workloads.

Tested by:	ps
Reviewed by:	rwatson
MFC after:	1 week
2008-10-01 19:14:05 +00:00
Robert Watson
25edc6dd20 Various cleanups for soreceive_dgram():
- Update or remove comments that were left over from the original
  soreceive_generic() implementation.  Quite a few were misleading in the
  context of the new code.
- Since soreceive_dgram() has a simpler structure, replace several gotos
  with a while loop making the invariants more clear.
- In the blocking while loop, don't try to handle cases incompatible with
  the loop invariant (since m is always NULL, don't check for and handle
  non-NULL).
- Don't drop and re-acquire the socket buffer lock unnecessarily after
  sbwait() returns, which may help reduce lock contention (etc).
- Assume PR_ATOMIC since we assert it at the top of the function.

MFC after:	3 days
2008-10-01 13:26:52 +00:00
John Baldwin
c4688866d3 Update the function name in several assertions in soreceive_dgram().
Approved by:	rwatson
MFC after:	3 days
2008-09-30 18:44:26 +00:00
Konstantin Belousov
41a4e90e6f If the panic thread is preempted after setting panicstr but before
setting TDF_INPANIC then it will never be rescheduled again. Wrap
setting the panic condition with the critical section.

Noted and reviewed by:	tegge
MFC after:	1 week
2008-09-27 15:45:54 +00:00
Ed Schouten
c6ec8c53c5 Move uminor() and umajor() to the same place as userspace minor() and major().
The uminor() and umajor() functions have the same use in kernel space as
the minor() and major() functions in userspace. If we ever get rid of
the minor() function in kernel space, we could decide to just expose
minor() and major() to kernel space, making uminor() and umajor()
redundant.

There are two reasons why we want to have uminor() and umajor() in
<sys/types.h>:

- Having them close together prevents them from diverting. Even though
  it's unlikely the definitions will change, it's a good habit to have
  them at the same place.

- They don't really belong in kern_conf.c. kern_conf.c has been
  liberated from dealing with device major and minor number handling.

The device_ids(9) manpage now lists the wrong #include's, because it
should only list <sys/types.h> now. I'm leaving it as it is now, because
I wonder if we should document them anyway. We're probably better off
documenting minor(3) and major(3).
2008-09-27 13:19:09 +00:00
Ed Schouten
6bfa9a2d66 Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by:	kib
2008-09-27 08:51:18 +00:00
Ed Schouten
dacf7de1a6 Don't forget to initialize `int error' in ttydev_open().
I've had some reports in the past that opening an already opened TTY
through, for example, /dev/tty can fail with random error codes. Looking
at ttydev_open(), I can see there is a way `error' is returned without
initialising it. Even though I haven't had any confirmation this fixes
the bug, I'll fix it anyway.

Reported by:	Andrzej Tobola <ato iem pw edu pl>
2008-09-26 18:17:04 +00:00
Ed Schouten
edde874555 Rename the minor' argument of make_dev(9) to unit'.
To prevent any further confusion about device minor and unit numbers,
we'd better just refer to device unit numbers. Many people still think
the numbers we show inside devfs have any relation to the numbers passed
to make_dev(9), which is not the case.

Discussed with:	kib
2008-09-26 14:31:24 +00:00
Ed Schouten
d3ce832719 Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
John Baldwin
74d9b5a551 Regen. 2008-09-25 20:08:36 +00:00
John Baldwin
48a43ae819 Tidy up a few things with syscall generation:
- Instead of using a syscall slot (370) just to get a function prototype
  for lkmressys(), add an explicit function prototype to <sys/sysent.h>.
  This also removes unused special case checks for 'lkmressys' from
  makesyscalls.sh.
- Instead of having magic logic in makesyscalls.sh to only generate a
  function prototype the first time 'lkmnosys' is seen, make 'NODEF'
  always not generate a function prototype and include an explicit
  prototype for 'lkmnosys' in <sys/sysent.h>.
- As a result of the fix in (2), update the LKM syscall entries in
  the freebsd32 syscall table to use 'lkmnosys' rather than 'nosys'.
- Use NOPROTO for the __syscall() entry (198) in the native ABI.  This
  avoids the need for magic logic in makesyscalls.h to only generate
  a function prototype the first time 'nosys' is encountered.
2008-09-25 20:07:42 +00:00
John Baldwin
7d43ca696e - Don't do a WITNESS_SAVE() on the interlock if it is Giant in the condition
variable wait routines.  DROP_GIANT() already manages that state in the
  Giant interlock case.
- Assert that Giant is held when it is passed as a sleep interlock.
2008-09-25 13:42:19 +00:00
John Baldwin
d2722d704c Part 1 of making shared lookups more resilient with respect to forced
unmounts.  When we upgrade a vnode lock from shared to exclusive during
a name cache lookup, fail the lookup with EBADF if the vnode is invalidated
while we are waiting for the exclusive lock.

Also, for correctness (though I'm not sure it can occur in practice),
downgrade an exclusively locked vnode if it should be share locked.

Tested by:	pho
2008-09-24 18:51:33 +00:00
John Baldwin
c1fa2e4200 Update description of witness_watch. 2008-09-24 18:47:24 +00:00
Ed Schouten
4c7428e1ff Fix a crash when calling tty_rel_free() while draining during closure.
Yesterday I got two reports of potential crashes, related to TTY
deallocation during device closure. When a thread is in TF_OPENCLOSE,
draining its output upon closure, we should not allow calls to
tty_rel_free() to happen at the same time. This could cause the TTY to
be torn down twice.

PR:		kern/127561
Reported by:	KOIE Hidetaka <koie suri co jp>
Discussed with:	thompsa
2008-09-24 11:16:09 +00:00
Konstantin Belousov
a8d403e102 Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from:	jhb
MFC after:	1 month
2008-09-24 10:14:37 +00:00
Ed Schouten
b61637107c Track state to determine if the associated TTY device node has been used.
It turns out our old TTY layer (and other implementations) block when
you read() on a PTY master device of which the slave device node has not
been opened yet. Our new implementation just returned 0. This caused
applications like telnetd to die in a very subtle way (when child
processes would open the TTY later than the first call to select()).

Introduce a new flag called PTS_FINISHED, which indicates whether we
should block or bail out of a read() or write() occurs.

Reported by:	Claude Buisson <clbuisson orange fr>
2008-09-23 17:12:25 +00:00
David E. O'Brien
b386eb9139 style(9) 2008-09-23 14:25:56 +00:00
David E. O'Brien
715457f6f6 Reverse if() logic to improve readability.
Reviewed by:	ru
2008-09-23 14:25:38 +00:00
Ed Schouten
a1215e37a4 Introduce a hooks layer for the MPSAFE TTY layer.
One of the features that prevented us from fixing some of the TTY
consumers to work once again, was an interface that allowed consumers to
do the following:

- `Sniff' incoming data, which is used by the snp(4) driver.

- Take direct control of the input and output paths of a TTY, which is
  used by ng_tty(4), ppp(4), sl(4), etc.

There's no practical advantage in committing a hooks layer without
having any consumers. In P4 there is a preliminary port of snp(4) and
thompsa@ is busy porting ng_tty(4) to this interface. I already want to
have it in the tree, because this may stimulate others to work on the
remaining modules.

Discussed with:	thompsa
Obtained from:	//depot/projects/mpsafetty/...
2008-09-22 19:25:14 +00:00
Ed Schouten
d344ffe549 Fix style(9) issue in TTY header files: document function argument names.
According to style(9), function argument names should only be omitted
for prototypes that are exported to userspace. This means we should
document the function arguments in the TTY header files, because they
are only used in userspace.

While there, change the type of the buffer argument of
ttydisc_rint_bypass() to `const void *' instead of `char *'.

Requested by:	attilio
Obtained from:	//depot/projects/mpsafetty/...
2008-09-22 18:44:09 +00:00
Joseph Koshy
122ccdc1ca Support sparsely numbered CPUs.
Requested by:	obrien, alfred (long ago)
2008-09-22 10:37:02 +00:00
Ed Schouten
37ddf38e38 Make fstat() on a pseudo-terminal master return sane timestamps.
Because pseudo-terminal master file descriptors no longer have a vnode
underneath, we have to fill in fstat() values ourselves. Make our
implementation somewhat sane by returning the timestamps of the TTY
device node that corresponds with our file descriptor.

Obtained from:	//depot/projects/mpsafettty/...
2008-09-21 19:24:15 +00:00
Ed Schouten
3111c5c922 Now that the number of clist consumers have dropped massively, trim down
the code to prevent useless waste of space.

- Remove support for quote bits. There is not a single driver that needs
  these bits anymore. This means putc() now accepts a char instead of an
  int.

- Remove the unneeded catq() and nextc() routines. They were only used
  by the old TTY layer.

- Convert the clist code to use ANSI C prototypes.
2008-09-21 18:12:18 +00:00
Konstantin Belousov
caf8aec886 fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:50:52 +00:00
Konstantin Belousov
4c5a20e3da Initialize va_rdev to NODEV instead of 0 or VNOVAL in VOP_GETATTR().
NODEV is more appropriate when va_rdev doesn't have a meaningful value.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Suggested by:   bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:49:15 +00:00
Konstantin Belousov
0fbbf2ea56 Initialize va_rdev to NODEV and va_fsid to VNOVAL before the
VOP_GETATTR() call in vn_stat(). Thus if a file system doesn't
initialize those fields in VOP_GETATTR() they will have a sane default
value.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:48:24 +00:00
Konstantin Belousov
86dacdfe2b Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Reviewed by:	bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:46:45 +00:00
Konstantin Belousov
ea60a5f526 Initialize birthtime fields in vn_stat() to prevent stat(2) from
returning uninitialized birthtime. Most file systems don't initialize
birthtime properly in their VOP_GETTATTR().

Submitted by:   Jaakko Heinonen <jh saunalahti fi>
Reviewed by:	bde
Discussed on:   freebsd-fs
MFC after:	1 month
2008-09-20 19:43:22 +00:00
David E. O'Brien
6e6049e9df Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)
2008-09-19 15:17:32 +00:00
John Baldwin
da672ec2ca Various style fixes. 7 space indent is just odd. 2008-09-18 20:10:11 +00:00
John Baldwin
cbb598af66 Sort includes. 2008-09-18 20:04:22 +00:00
Attilio Rao
cecd8edba5 Remove the suser(9) interface from the kernel. It has been replaced from
years by the priv_check(9) interface and just very few places are left.
Note that compatibility stub with older FreeBSD version
(all above the 8 limit though) are left in order to reduce diffs against
old versions. It is responsibility of the maintainers for any module, if
they think it is the case, to axe out such cases.

This patch breaks KPI so __FreeBSD_version will be bumped into a later
commit.

This patch needs to be credited 50-50 with rwatson@ as he found time to
explain me how the priv_check() works in detail and to review patches.

Tested by:      Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reviewed by:    rwatson
2008-09-17 15:49:44 +00:00
Ed Schouten
42ff2756c7 Fix minor TTY API inconsistency.
Unlike tty_rel_gone() and tty_rel_sess(), the tty_rel_pgrp() routine
does not unlock the TTY. I once had the idea to make the code call
tty_rel_pgrp() and tty_rel_sess(), picking up the TTY lock once. This
turned out a little harder than I expected, so this is how it works now.

It's a lot easier if we just let tty_rel_pgrp() unlock the TTY, because
the other routines do this anyway.
2008-09-16 14:57:23 +00:00
Konstantin Belousov
2814d5ba5f When attempt is made to suspend a filesystem that is already syspended,
wait until the current suspension is lifted instead of silently returning
success immediately. The consequences of calling vfs_write() resume when
not owning the suspension are not well-defined at best.

Add the vfs_susp_clean() mount method to be called from
vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and
stop calling it manually.

Add the thread flag TDP_IGNSUSP that allows to bypass the suspension
point in the vn_start_write. It is intended for use by VFS in the
situations where the suspender want to do some i/o requiring calls to
vn_start_write(), and this i/o cannot be done later.

Reviewed by:	tegge
In collaboration with:	pho
MFC after:	 1 month
2008-09-16 11:51:06 +00:00
Konstantin Belousov
52dfc8d7da Add the ffs structures introspection functions for ddb.
Show the b_dep value for the buffer in the show buffer command.
Add a comand to dump the dirty/clean buffer list for vnode.

Reviewed by:	tegge
Tested and used by:   pho
MFC after:   1 month
2008-09-16 11:19:38 +00:00
Konstantin Belousov
bdb8094763 Garbage-collect vn_write_suspend_wait().
Suggested and reviewed by:	tegge
Tested by:	pho
MFC after:	1 month
2008-09-16 11:09:26 +00:00
Sam Leffler
39297ba455 Make ddb command registration dynamic so modules can extend
the command set (only so long as the module is present):
o add db_command_register and db_command_unregister to add and remove
  commands, respectively
o replace linker sets with SYSINIT's (and SYSUINIT's) that register
  commands
o expose 3 list heads: db_cmd_table, db_show_table, and db_show_all_table
  for registering top-level commands, show operands, and show all operands,
  respectively

While here also:
o sort command lists
o add DB_ALIAS, DB_SHOW_ALIAS, and DB_SHOW_ALL_ALIAS to add aliases
  for existing commands
o add "show all trace" as an alias for "show alltrace"
o add "show all locks" as an alias for "show alllocks"

Submitted by:	Guillaume Ballet <gballet@gmail.com> (original version)
Reviewed by:	jhb
MFC after:	1 month
2008-09-15 22:45:14 +00:00
John Baldwin
37e9511fcb Expose a new public routine intr_event_execute_handlers() which executes
all the non-filter handlers attached to an interrupt event.  This can be
used by device drivers which multiplex their interrupt onto the interrupt
handlers for child devices.
2008-09-15 22:19:44 +00:00
Attilio Rao
d56bc17bce - For any lock list we hold the head in order to reduce allocation from
the free list and in this way avoid contention on the w_mtx.
  In order to make the code simple, we rely on the rule that when the head
  has not a child it also doesn't have other subsequent entries.
  Actually this assertion is broken because we can free all the head
  children and quit witness_unlock() with the head still allocated, with no
  children and subsequent entries present.
  Fix this by shifting the head if other entries are present and still
  freeing the object, but leaving always an head.
- Fix witness_thread_has_locks() in order to report, correctly, if the
  lock list linked to a specific thread has children or not based on the
  above explained rule.
- Fix a printout into DDB's "show alllocks" command in order to show,
  correctly, the process name that is really what we want.
- Fix style(9) for a comment.

Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reported by:	Marko Kiiskila <marko dot kiiskila at nokia dot com>
Sponsored by:	Nokia
2008-09-12 21:44:01 +00:00
Christian S.J. Peron
856ebf8530 Make sure the TTY has not disappeared out from under us before calling
ttydevsw_outwakeup().  This should fix panics which occur after remote
login sessions timeout during moderate TTY activity.  An example of
where this might occur is where a pending write to the terminal is
occurring while sshd(8) is shutting down the TTY after a TCP timeout.

Submitted by:	ed
2008-09-10 20:12:10 +00:00
John Baldwin
413134305e Teach WITNESS about the interlocks used with lockmgr. This removes a bunch
of spurious witness warnings since lockmgr grew witness support.  Before
this, every time you passed an interlock to a lockmgr lock WITNESS treated
it as a LOR.

Reviewed by:	attilio
2008-09-10 19:13:30 +00:00
John Baldwin
bf9c6c31e7 Various whitespace fixes. 2008-09-10 17:59:21 +00:00
Edward Tomasz Napierala
dfa7fd1d70 Remove VSVTX, VSGID and VSUID. This should be a no-op,
as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.

Approved by:	rwatson (mentor)
2008-09-10 13:16:41 +00:00
John Baldwin
8c68f75a7c - Reduce scope of #ifdef's in uma_zcreate() call in init_turnstile0().
- Set UMA_ZONE_NOFREE so that the per-turnstile spin locks are type stable
  to avoid a race where one thread might dereference a lock in a free'd
  turnstile that was previously used by another thread.

Theorized by:	tegge (2)
MFC after:	1 week
2008-09-08 21:40:15 +00:00
John Baldwin
aa4c44b58b Close a race in sleepq_broadcast() where the sleepq could be reused after
it had been assigned to the last sleeping thread.  That thread might have
started running on another CPU and have reused that sleep queue.  Fix it
by just walking the thread queue using TAILQ_FOREACH_SAFE() rather than
a while loop.

PR:		amd64/124200
Discovered by:	tegge
Tested by:	benjsc
MFC after:	1 week
2008-09-08 19:44:57 +00:00
Bjoern A. Zeeb
6f4745d575 Catch a possible NULL pointer deref in case the offsets got mangled
somehow.
As a consequence we may now get an unexpected result(*).
Catch that error cases with a well defined panic giving appropriate
pointers to ease debugging.

(*) While the concensus was that the case should never happen unless
    there was a bug, noone was definitively sure.

Discussed with:		kmacy (about 8 months back)
Reviewed by:		silby (as part of a larger patch in March)
MFC after:		2 months
2008-09-07 13:09:04 +00:00
Ed Schouten
3c8574bc8a Make TIOCCONS use priv_check() instead of checking /dev/console permissions.
As discussed with Robert on IRC, checking the permissions on
/dev/console to see if we can call TIOCCONS could be unreliable. When we
run a chroot() without a devfs instance mounted inside, it won't
actually check the permissions on the device node inside the devfs
instance.

Using the already existing PRIV_TTY_CONSOLE for this seems like a better
idea.

Approved by:	rwatson
2008-09-06 14:43:32 +00:00
Ed Schouten
c27991e819 Fix a small typo in a comment in calcru1().
The word "happene" should read "happened".

Submitted by:	Jille Timmermans <jille quis cx>
2008-09-05 15:55:06 +00:00
David Xu
fbc48e974e Fix LOR between vnode lock and internal mqueue locks. 2008-09-05 07:32:57 +00:00
Andrew Thompson
9128ec21f3 Remove the alignment of the align parameter. This is up to the caller to pass
in and it breaks tap(4) on strict alignment machines as m_uiotombuf is called
with ETHER_ALIGN.

Found by:	Jared Go
Reviewed by:	emax
MFC after:	3 days
2008-09-05 04:05:31 +00:00
David Xu
b042e9760c Fix lock name conflict.
PR:	kern/127040
2008-09-05 02:07:25 +00:00
Ed Schouten
64308260f6 Implement pts(4) packet mode.
As reported by several users on the mailing lists, applications like
screen(1) fail to properly handle ^S and ^Q characters. This was because
MPSAFE TTY didn't implement packet mode (TIOCPKT) yet. Add basic packet
mode support to make these applications work again.

Obtained from:	//depot/projects/mpsafetty/...
2008-09-04 16:39:02 +00:00
Ed Schouten
2bda9238e5 Fix an awful bug inside our COMPAT_43TTY code.
When I migrated tty_compat.c to MPSAFE TTY, I just hooked it up to the
build and fixed it until it compiled and somewhat worked. It turns out
this was not the smartest thing, because the old TTY layer also had a
field called t_flags, which contained a set of sgtty flags.

This means our current COMPAT_43TTY code overwrites the TTY flags,
causing all strange problems to occur. Fix this code to use a new struct
member called t_compatflags. This commit may cause kern/127054 to be
fixed, but this still has to be tested/confirmed by the originator. It
has to be fixed anyway.

PR:		kern/127054
2008-09-04 16:30:53 +00:00
Kevin Lo
f308bddd3f If the process id specified is invalid, the system call returns ESRCH 2008-09-04 10:44:33 +00:00
Simon L. B. Nielsen
59ca51adba - Fix amd64 local privilege escalation. [08:07]
- Fix nmount(2) local privilege escalation. [08:08]
- Fix IPv6 remote kernel panics. [08:09]

Fix for [08:07] is merge of r181823.

Submitted by:	kib [08:07], csjp [08:08], bz [08:09]
Reviewed by:	peter [08:07], jhb [08:07]
Reviewed by:	jinmei [08:09], rwatson [08:09]
Approved by:	re (SA blanket)
Approved by:	so (simon)
Security:	FreeBSD-SA-08:07.amd64
Security:	FreeBSD-SA-08:08.nmount
Security:	FreeBSD-SA-08:09.icmp6
2008-09-03 19:09:47 +00:00
Ed Schouten
ffffa83b60 Use size_t to store the return value of ttydisc_getc().
The ttydisc_getc() routine obtains a read length from ttyoutq_read().
For no valid reason, the current code stores this value in an int, and
returns a size_t. There is no need to perform this useless conversion.

Obtained from:	//depot/projects/mpsafetty/...
2008-09-02 17:13:11 +00:00
Robert Watson
26ec197d15 Remove XXXRW in soreceive_dgram that proves unnecessary.
Remove unused orig_resid variable in soreceive_dgram.

Submitted by:	alfred
X-MFC with:	soreceive_dgram (r180198, r180211)
2008-09-02 16:55:21 +00:00
Pawel Jakub Dawidek
2765482b7f When setting error to EINVAL in 'fvp == tdvp' case, jump to out label,
because if not, the error will be later overwritten by
mac_vnode_check_rename_to() call.

Reviewed by:	rwatson
2008-09-01 10:11:39 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Attilio Rao
988d28340a - Improve some witness_watch operability in code which does perform both
lock tracking and checks, doing just the former ones.
- Fix a bug where sysctl utility was printing crazy values when setting a
  new value for debug.witness.watch [0]

[0] Reported by:	yongari
2008-08-30 13:20:35 +00:00
Ed Schouten
74bb9e3ad5 Fix some edge cases in the TTY queues:
- In the current design, when a TTY decreases its baud rate, it tries to
  shrink the queues. This may not always be possible, because it will
  not free any blocks that are still filled with data.

  Change the TTY queues to store a `quota' value as well, which means it
  will not free any blocks when changing the baud rate, but when placing
  blocks back into the queue. When the amount of blocks exceeds the
  quota, they get freed.

  It also fixes some edge cases, where TIOCSETA during read()/
  write()-calls could actually make the queue a tiny bit bigger than in
  normal cases.

- Don't leak blocks of memory when calling TIOCSETA when the device
  driver abandons the TTY while allocating memory.

- Create ttyoutq_init() and ttyinq_init() to initialize the queues,
  instead of initializing them by hand. The new TTY snoop driver also
  creates an outq, so it's good to have a proper interface to do this.

Obtained from:	//depot/projects/mpsafetty/...
2008-08-30 09:18:27 +00:00
Attilio Rao
df3310e04a - Make witness_watch a 3 state value.
1 means that witness is up and running.
  0 means that witness is disabled but that it can be established later
    again in effective way.
  -1 means that witness is disabled permanently
- Fix a bug causing kernel to panic on witness disabling through
  witness_watch.  lock lists queues were still full of entries and this was
  causing throubles with debugging stubs (like witness_thread_exit()).

Reported by:	kris, yongari
Sponsored by:	Nokia
2008-08-29 15:47:53 +00:00
Ed Schouten
a15ec0a5e4 Backport two small fixes from the MPSAFE TTY branch in Perforce:
- Implement IMAXBEL. It turned out the IMAXBEL termios switch was marked
  as supported, while it had not been implemented.

- Don't go into the high watermark when in canonical mode, no data has
  been canonicalized and the input buffer is full. This caused the
  terminal to lock up. This prevented users from pressing
  backspace/^U/etc in such cases.

  This could easily be simulated by pasting a very big amount of data in
  a shell with sh(1) in canonical mode.

Obtained from:	//depot/projects/mpsafetty/...
2008-08-29 15:02:50 +00:00
David Xu
3eb8b8bbeb Don't remove queued SIGCHLD if options contain WNOWAIT, so other
threads still can be notified by the signal.
2008-08-29 01:34:05 +00:00
Tom Rhodes
1e018d99f2 Fix a typo in r180291
"NAme of the current YP/NIS domain" -> "Name of the current YP/NIS domain"
2008-08-28 23:52:34 +00:00
Ed Schouten
a05cae5186 Make ureadc() warn when holding any locks, just like uiomove().
A couple of months ago I was quite impressed, because when I was writing
code, I discovered that uiomove() would not allow any locks to be held,
while ureadc() did, mainly because ureadc() is implemented using the
same building blocks as uiomove().

Let's see if this triggers any aditional witness warnings on our source
tree.

Reviewed by:	atillio
2008-08-28 19:34:58 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Konstantin Belousov
a888d54d39 Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() function
to ignore the unmounting and forces insertion of the vnode into the mount
vnode list.

Change insmntque() to fail when forced unmount is in progress and
VV_FORCEINSMQ is not specified.

Add an assertion to the insmntque(), requiring the vnode to be
exclusively locked for mp-safe filesystems.

Use the VV_FORCEINSMQ for the creation of the syncvnode.

Tested by:	pho
Reviewed by:	tegge
MFC after:	1 month
2008-08-28 09:08:15 +00:00
Ed Schouten
ceef66c0e3 Properly unlock the init/lock-state devices when invoking TIOCSETA.
For some reason a return-statement crept into this code, where it
shouldn't belong. This means we didn't properly unlock the TTY before
returning to userspace.

Submitted by:	Tor Egge <tor egge cvsup no freebsd org>
2008-08-27 19:37:21 +00:00
John Baldwin
9c2bf0cce2 - Only count the number of CPUs in the rendezvous map once rather than
doing it on every CPU.
- Use CPU_ABSENT() rather than pcpu_find() to determine if a CPU is not
  present.
- Count up to mp_maxid rather than MAXCPU when iterating over CPUs to
  match the rest of the code in the kernel.

MFC after:	1 week
2008-08-27 18:23:55 +00:00
Konstantin Belousov
cbc158449b Implement WNOWAIT flag for wait4(2). It specifies that process whose status
is returned shall be kept in the waitable state.
Add WSTOPPED as an alias for WUNTRACED.

Submitted by:	Jukka Ukkonen <jau at iki fi>
PR:	standards/116221
MFC after:	2 weeks
2008-08-26 12:37:16 +00:00
Konstantin Belousov
eaad109973 When calculating arguments to the interpreter for the shebang script
executed by fexecve(2), imgp->args->fname is NULL. Moreover, there is
no way to recover the path to the script being executed.
Do what some other U*ixes do unconditionally, namely supply /dev/fd/n
as the script path when called from fexecve(). Document requirement of
having fdescfs mounted as caveat.
2008-08-26 10:53:32 +00:00
John Baldwin
cf22c63dd5 Resort a few accessor routines so that they are consistently grouped
with 'set_foo/get_foo' adjacent to each other.
2008-08-25 16:16:57 +00:00
Robert Watson
3f3978840e More fully audit fexecve(2) and its arguments.
Obtained from:	TrustedBSD Project
Sponsored by:	Google, Inc.
2008-08-25 13:50:01 +00:00
Robert Watson
5ae504055a Regenerate following r182123. 2008-08-24 21:23:08 +00:00
Robert Watson
e484af13ed When MPSAFE ttys were merged, a new BSM audit event identifier was
allocated for posix_openpt(2).  Unfortunately, that identifier
conflicts with other events already allocated to other systems in
OpenBSM.  Assign a new globally unique identifier and conform
better to the AUE_ event naming scheme.

This is a stopgap until a new OpenBSM import is done with the
correct identifier, so we'll maintain this as a local diff in svn
until then.

Discussed with:	ed
Obtained from:	TrustedBSD Project
2008-08-24 21:20:35 +00:00
Christian S.J. Peron
e451733718 Remove worrying printf warning on bootup when processing vnodes which
have NULL mount-points.  This is the case for special vnodes, such as the
one used in nameiinit() which is used for crossing mount points in lookup()
to avoid  lock ordering issues.

MFC after:	2 weeks
Discussed with:	rwatson, kib
2008-08-24 20:16:44 +00:00
Ed Schouten
1a643b0f02 Allow the user to suppress the rate-limited pty(4) warning.
The pty(4) driver raises up to warnings when an old BSD-style PTY is
created. The reason why I added this warning, was to make it easier to
spot applications that allocate BSD-style PTY's, while they should just
use openpty() or posix_openpt().

Add a sysctl, which allows you to override the number of remaining
messages, making it possible to suppress the warnings.

Requested by:	kib
Reviewed by:	kib
2008-08-23 16:03:00 +00:00
Robert Watson
6356dba0b4 Introduce two related changes to the TrustedBSD MAC Framework:
(1) Abstract interpreter vnode labeling in execve(2) and mac_execve(2)
    so that the general exec code isn't aware of the details of
    allocating, copying, and freeing labels, rather, simply passes in
    a void pointer to start and stop functions that will be used by
    the framework.  This change will be MFC'd.

(2) Introduce a new flags field to the MAC_POLICY_SET(9) interface
    allowing policies to declare which types of objects require label
    allocation, initialization, and destruction, and define a set of
    flags covering various supported object types (MPC_OBJECT_PROC,
    MPC_OBJECT_VNODE, MPC_OBJECT_INPCB, ...).  This change reduces the
    overhead of compiling the MAC Framework into the kernel if policies
    aren't loaded, or if policies require labels on only a small number
    or even no object types.  Each time a policy is loaded or unloaded,
    we recalculate a mask of labeled object types across all policies
    present in the system.  Eliminate MAC_ALWAYS_LABEL_MBUF option as it
    is no longer required.

MFC after:	1 week ((1) only)
Reviewed by:	csjp
Obtained from:	TrustedBSD Project
Sponsored by:	Apple, Inc.
2008-08-23 15:26:36 +00:00
John Baldwin
969bf150df Fix a race condition with concurrent LOOKUP namecache operations for a vnode
not in the namecache when shared lookups are enabled (vfs.lookup_shared=1,
it is currently off by default) and the filesystem supports shared lookups
(e.g. NFS client).  Specifically, if multiple concurrent LOOKUPs both miss
in the name cache in parallel, each of the lookups may each end up adding an
entry to the namecache resulting in duplicate entries in the namecache
for the same pathname.  A subsequent removal of the mapping of that
pathname to that vnode (via remove or rename) would only evict one of the
entries from the name cache.  As a result, subseqent lookups for that
pathname would still return the old vnode.

This race was observed with shared lookups over NFS where a file was updated
by writing a new file out to a temporary file name and then renaming that
temporary file to the "real" file to effect atomic updates of a file.  Other
processes on the same client that were periodically reading the file would
occasionally receive an ESTALE error from open(2) because the VOP_GETATTR()
in nfs_open() would receive that error when given the stale vnode.

The fix here is to check for duplicates in cache_enter() and just return
if an entry for this same directory and leaf file name for this vnode is
already in the cache.  The check for duplicates is done by walking the
per-vnode list of name cache entries.  It is expected that this list should
be very small in the common case (usually 0 or 1 entries during a
cache_enter() since most files only have 1 "leaf" name).

Reviewed by:	ups, scottl
MFC after:	2 months
2008-08-23 15:13:39 +00:00
Ed Schouten
ce570f82cc Remove unused tty_gone() checks inside ttyoutq_read_uio().
When my earlier MPSAFE TTY prototypes still implemented line
disciplines, we needed a mechanism to abort read()'s on PTY master
devices when inside the line discipline. Because this is no longer the
case, these checks have become unneeded.
2008-08-23 13:32:21 +00:00
Craig Rodrigues
d5bdb2f68d In nmount(), when we see the "force" option,
set the MNT_FORCE flag, but do not persist "force"
in the options list, since it is a command, not a persistent property
of a mount.

Similarly, when we see "reload", set MNT_RELOAD,
but delete "reload" from the options list.

MFC after:	1 week
2008-08-23 01:16:09 +00:00
Kip Macy
6205924afd Submit a band-aid for interrupt set up race.
MFC after:	1 month
2008-08-22 23:24:53 +00:00
Ed Schouten
0f0a7c27c5 Fix two small bugs in tcsetattr().
- According to POSIX, tcsetattr() must not fail when any of the bits in
  the structure are unsupported, but it must leave the unsupported flags
  alone.

- The CIGNORE flag (set by TCSASOFT, extension) was not cleared from
  c_cflag, which means using it would cause it to be applied during its
  entire lifespan. Eventually make sure we clear the flag.

I don't really like CIGNORE, but I think we must keep it alive right
now. With our new TTY layer, we don't actually need this mechanism,
because if you leave c_cflag, c_ispeed and c_ospeed alone, we won't make
a call into the device driver anyway.

Reported by:	naddy
Tested by:	naddy
2008-08-22 21:27:37 +00:00
John Baldwin
7847a9daec A suspended thread can, in fact, be swapped out. Thus,
thread_unsuspend_one() needs to optionally wakeup the swapper.  Since we
hold the thread lock for that entire function, however, we have to push
that requirement up into the caller.

Found by:	rwatson
2008-08-22 16:15:58 +00:00
John Baldwin
814f26da8a Use |= rather than += when aggregrating requests to wakeup the swapper.
What we really want is an inclusive or of all the requests, and += can
in theory roll over to 0.
2008-08-22 16:14:23 +00:00
Ed Schouten
6137be4386 Fix pts(4) error codes when slave device is closed.
Unlike pre-MPSAFE TTY, the pts(4) driver always returned ENXIO when a
read() or write() was performed on a pseudo-terminal master device when
the slave device was not opened. The old implementation had different
semantics:

- When the slave device had not been opened yet, read() and write() just
  blocked.
- When the slave device had been closed, a read() call would return 0
  bytes length.
- When the slave device had been closed, a write() call would return
  EIO.

Change the new implementation to return 0 and EIO as well. We don't
implement the first rule, but I suspect this is not needed, because
routines like openpty() also open the slave device node. posix_openpt()
users also do similar things.

Reported by:	rink
Tested by:	rink
2008-08-22 10:40:21 +00:00
Ed Schouten
7dc843ca92 Prevent VSTART flooding when turning on software flow control.
It turned out we transmitted VSTART after each successful read on a TTY
when software flow control was turned on. This was because of a very
evil bug where we tested the TF_HIWAT_IN flag the other way around.

Reported by:	Christian Weisgerber <naddy mips inka de>
2008-08-22 05:15:52 +00:00
David E. O'Brien
35c316caaf Add comments on NOARGS, NODEF, and NOPROTO. 2008-08-21 22:57:31 +00:00
Ed Schouten
40572ab385 Properly lock proctree_lock before locking the process while accounting.
During the import of the MPSAFE TTY layer (r181905), I changed
acct_process() to lock proctree_lock instead of SESS_LOCK, because
s_ttyp is now locked using proctree_lock. One of the things I forgot,
was to lock it before we PROC_LOCK.

Commit this patch, written by kib@. To ensure we hold proctree_lock as
short as possible, obtaining `ac_tty' has now been made the first step
of filling `acct'.

Reported by:	Kevin <kevinxlinuz 163 com>
Solved by:	kib
2008-08-21 15:02:17 +00:00
Ed Schouten
040b1db930 Remove the now unused `lbolt' variable from the kernel.
We used to have a single wait channel inside the kernel which could be
used by threads that just wanted to sleep for some time (the next
second). The old TTY layer was the only piece of code that still used
lbolt, because I already removed the use of lbolt from the NFS clients
and the VFS syncer.

Approved by:	philip
2008-08-20 12:20:22 +00:00
Kip Macy
fc3a86f6e9 remove scheduler_running as xenbus no longer needs it
MFC after:	1 month
2008-08-20 09:21:24 +00:00
Ed Schouten
18cf135421 Update system call tables.
The previous commit also included changes to all the system call lists,
but it is a tradition to update these lists in a second commit, so rerun
make sysent to update the $FreeBSD$ tags inside these files to refer to
the latest version of syscalls.master.

Requested by:	rwatson
2008-08-20 08:39:10 +00:00
Ed Schouten
bc093719ca Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

  The old TTY layer has a driver model that is not abstract enough to
  make it friendly to use. A good example is the output path, where the
  device drivers directly access the output buffers. This means that an
  in-kernel PPP implementation must always convert network buffers into
  TTY buffers.

  If a PPP implementation would be built on top of the new TTY layer
  (still needs a hooks layer, though), it would allow the PPP
  implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

  With the old TTY layer, it isn't entirely safe to destroy TTY's from
  the system. This implementation has a two-step destructing design,
  where the driver first abandons the TTY. After all threads have left
  the TTY, the TTY layer calls a routine in the driver, which can be
  used to free resources (unit numbers, etc).

  The pts(4) driver also implements this feature, which means
  posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

  One of the major improvements is the per-TTY mutex, which is expected
  to improve scalability when compared to the old Giant locking.
  Another change is the unbuffered copying to userspace, which is both
  used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from:		//depot/projects/mpsafetty/...
Approved by:		philip (ex-mentor)
Discussed:		on the lists, at BSDCan, at the DevSummit
Sponsored by:		Snow B.V., the Netherlands
dcons(4) fixed by:	kan
2008-08-20 08:31:58 +00:00
Konstantin Belousov
2bb4c6f922 In brelse, put the B_NEEDSGIANT buffer on the QUEUE_DIRTY_GIANT queue,
instead of QUEUE_DIRTY.

Tested by:	pho
Reviewed by:	attilio
MFC after:	3 days
2008-08-19 11:31:49 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Alfred Perlstein
cbd3ba3edf Prevent crashes due to unlocked access to hash buckets in two sysctls.
Use CACHE_LOCK to prevent crashes.

Sysctls fixed: debug.hashstat.nchash and debug.hashstat.rawnchash.

Obtained from: Juniper Networks
MFC After: 1 week
2008-08-16 21:48:10 +00:00
Kip Macy
e77d7f7143 Add flag to indicate to xen support code that threads are running (and thus we can block).
MFC after:	1 month
2008-08-15 21:03:13 +00:00
Attilio Rao
3d06b4b330 Introduce some WITNESS improvements:
- Speedup the lock orderings lookup modifying the witness graph from a
  linked tree to a matrix. A table lookup caches the lock orderings in
  order to make a O(1) access for them. Any witness object has an unique
  index withing this lookup cache table.
- Reduce the lock contention on w_mtx acquiring it only when the LOR
  actually happens and not in a sane case. In order to do this don't totally
  flush lock lists (per-CPU spinlocks list and per-thread sleeplocks list)
  but check for ll_count anytime we need to have to verify allocations sanity.
- Introduce the function witness_thread_exit() in the witness namespace which
  should verify a thread doesn't hold any witness occurrence why exiting.
- Rename the sysctl debug.witness.graphs into debug.witness.fullgraph and
  add debug.witness.badstacks which prints out stacks for LOR revealed.
  This is implemented using the stack(9) support, which makes WITNESS to be
  dependent by the STACK option or by the DDB (including STACK) option.
- Fix style(9) for src/sys/kern/subr_witness.c

The hash table approach has been developed by Ilya Maykov on the behalf of
Isilon Systems which kindly released the patch.
Jeff Roberson, ported the patch to -CURRENT and fixed w_mtx contention, on the
behalf of Nokia.

Submitted by:	Ilya Maykov <ivmaykov at gmail dot com> (Isilon Systems), jeff
Sponsored by:	Nokia
2008-08-13 18:24:22 +00:00
Christian S.J. Peron
ded7d39cb9 Reduce the scope of the vnode lock such that it does not cover
the various copyouts associated with initializing the process's
argv/env data in userspace.  It is possible that these copyout
operations can fault under memory pressure, possibly resulting
in dead locks.  This is believed to be safe since none of the
copyout_strings() operations need to interact with the vnode here.

Submitted by:	Zhouyi Zhou
PR:		kern/111260
Discussed with:	kib
MFC after:	3 weeks
2008-08-12 21:27:48 +00:00
Konstantin Belousov
e792b09be2 Revert r181345.
Move the NULL pointer check to the vfs_deleteopt() function.

Discussed with:	rodrigc
MFC after:	3 days
2008-08-10 12:15:36 +00:00
Ed Schouten
79da190c16 Remove unneeded D_NEEDGIANT from /dev/fd/{0,1,2}.
There is no reason the fdopen() routine needs Giant. It only sets
curthread->td_dupfd, based on the device unit number of the cdev.

I guess we won't get massive performance improvements here, but still, I
assume we eventually want to get rid of Giant.
2008-08-09 12:42:12 +00:00
Dag-Erling Smørgrav
2616144e43 Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from:	Varnish
MFC after:	2 weeks
2008-08-09 11:14:05 +00:00
Dag-Erling Smørgrav
546d78908b Switch to simplified BSD license (with phk's approval), plus whitespace
and style(9) cleanup.
2008-08-09 10:26:21 +00:00
John Baldwin
414e7679cb Permit Giant to be passed as the explicit interlock either to
msleep/mtx_sleep or the various cv_*wait*() routines.  Currently, the
"unlock" behavior of PDROP and cv_wait_unlock() with Giant is not
permitted as it is will be confusing since Giant is fully unrecursed and
unlocked during a thread sleep.

This is handy for subsystems which wish to allow unlocked drivers to
continue to use Giant such as CAM, the new TTY layer, and the new USB
stack.  CAM currently uses a hack that I told Scott to use because I
really didn't want to permit this behavior, and the TTY and USB patches
both have various patches to permit this.

MFC after:	2 weeks
2008-08-07 21:00:13 +00:00
John Baldwin
da7bbd2c08 If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup().  When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup.  However, this required
grabbing proc0's thread lock to perform the wakeup.  If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked.  The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up.  The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with:	jeff
Glanced at by:	sam
Tested by:	Jurgen Weber  jurgen - ish com au
MFC after:	2 weeks
2008-08-05 20:02:31 +00:00
John Baldwin
0f3dd6ff0d Close two different races with concurrent opens of pty master devices
that could result in leaked ttys or a leaked pty + tty pair.

MFC after:	1 week
2008-08-04 19:51:23 +00:00
John Baldwin
0bc7bc0ec8 - Close a race with concurrent open's of a pts master device which could
result in leaked tty structures.
- When constructing a new pty, allocate it's tty structure before adding
  it to the list.

MFC after:	1 week
2008-08-04 19:49:05 +00:00
Antoine Brodin
f8062a0b0f Kill a dead variable
PR:		126223
Submitted by:	Mateusz Guzik
2008-08-03 21:07:19 +00:00
Robert Watson
1d986c5ff1 Remove broken code to replace st_mode value with ACCESSPERMS when
lstat(2) is called on symlinks -- this code appears never to have
worked.  The PR this addresses suggests that the intended
original behavior is the right one, but as bde points out in the
PR comments, we do actually support storing a mode on symlinks,
so returning it seems reasonable.

This is consistent with Mac OS X, which despite documentation to
the contrary does return the mode set on a symlink, but not some
other platforms.  The Single Unix Spec requires only that the
returned bits be "meaningful", which seems at best unhelpful as
advice goes.

PR:		25018
MFC after:	3 days
2008-08-03 15:44:56 +00:00
Konstantin Belousov
4f7afc20e0 Calling linker_load_dependencies() while holding the module'
vnode lock may cause a LOR between kld_sx lock and vnode lock.
linker_load_dependencies() drops kld_sx, and another thread may attempt
to load the same kld.

Reported and tested by:	pjd
MFC after:	1 week
2008-08-03 13:33:45 +00:00
Sam Leffler
6e0186d5ee add callout_schedule; besides being useful it also improves
compatibility with other systems

Reviewed by:	ed, battlez
2008-08-02 17:42:38 +00:00
Christian S.J. Peron
dfc714fba1 Currently, BSM audit pathname token generation for chrooted or jailed
processes are not producing absolute pathname tokens.  It is required
that audited pathnames are generated relative to the global root mount
point.  This modification changes our implementation of audit_canon_path(9)
and introduces a new function: vn_fullpath_global(9) which performs a
vnode -> pathname translation relative to the global mount point based
on the contents of the name cache.  Much like vn_fullpath,
vn_fullpath_global is a wrapper function which called vn_fullpath1.

Further, the string parsing routines have been converted to use the
sbuf(9) framework.  This change also removes the conditional acquisition
of Giant, since the vn_fullpath1 method will not dip into file system
dependent code.

The vnode locking was modified to use vhold()/vdrop() instead the vref()
and vrele().  This will modify the hold count instead of modifying the
user count.  This makes more sense since it's the kernel that requires
the reference to the vnode.  This also makes sure that the vnode does not
get recycled we hold the reference to it. [1]

Discussed with:	rwatson
Reviewed by:	kib [1]
MFC after:	2 weeks
2008-07-31 16:57:41 +00:00
Ed Schouten
e7ea30e404 Remove the use of lbolt from the VFS syncer.
It seems we only use `lbolt' inside the VFS syncer and the TTY layer
now.  Because I'm planning to replace the TTY layer next month, there's
no reason to keep `lbolt' if it's only used in a single thread inside
the kernel.

Because the syncer code wanted to wake up the syncer thread before the
timeout, it called sleepq_remove(). Because we now just use a condvar(9)
with a timeout value of `hz', we can wake it up using cv_broadcast()
without waking up any unrelated threads.

Reviewed by:	phk
2008-07-30 12:39:18 +00:00
Ed Schouten
911d490140 Don't make subr_clist.c depend on the TTY layer.
After the import of the new TTY layer, the TTY_QUOTE definition will not
be present anymore. To make sure clists will still work as expected,
introduce an internal definition called QUOTEMASK.

Maybe we can decide to remove the quote bits entirely, but we still have
to look into this. There may be drivers that still use the quote bits.

Obtained from:	//depot/projects/mpsafetty
2008-07-30 12:32:42 +00:00
John Baldwin
c3ea337801 When choosing a CPU for a thread in a cpuset, prefer the last CPU that the
thread ran on if there are no other CPUs in the set with a shorter per-CPU
runqueue.
2008-07-28 20:39:21 +00:00
John Baldwin
f7f1cc1518 Really fix this. 2008-07-28 18:33:43 +00:00
Pawel Jakub Dawidek
7224dd4dad Properly check if td_name is empty and if it is, print process name,
instead of empty thread name.

Reviewed by:	jhb
2008-07-28 18:10:26 +00:00
John Baldwin
f200843b72 Implement support for cpusets in the 4BSD scheduler.
- When a cpuset is applied to a thread, walk the cpuset to see if it is a
  "full" cpuset (includes all available CPUs).  If not, set a new
  TDS_AFFINITY flag to indicate that this thread can't run on all CPUs.
  When inheriting a cpuset from another thread during thread creation, the
  new thread also inherits this flag.  It is in a new ts_flags field in
  td_sched rather than using one of the TDF_SCHEDx flags because fork()
  clears td_flags after invoking sched_fork().
- When placing a thread on a runqueue via sched_add(), if the thread is not
  pinned or bound but has the TDS_AFFINITY flag set, then invoke a new
  routine (sched_pickcpu()) to pick a CPU for the thread to run on next.
  sched_pickcpu() walks the cpuset and picks the CPU with the shortest
  per-CPU runqueue length.  Note that the reason for the TDS_AFFINITY flag
  is to avoid having to walk the cpuset and examine runq lengths in the
  common case.
- To avoid walking the per-CPU runqueues in sched_pickcpu(), add an array
  of counters to hold the length of the per-CPU runqueues and update them
  when adding and removing threads to per-CPU runqueues.

MFC after:	2 weeks
2008-07-28 17:25:24 +00:00
John Baldwin
8aa3d7ffc0 Various and sundry style and whitespace fixes. 2008-07-28 15:52:02 +00:00
Kip Macy
947265b6bd - track maximum wait time
- resize columns based on actual observed numerical values

MFC after:	3 days
2008-07-27 21:45:20 +00:00
Pawel Jakub Dawidek
5573021d78 Assert for exclusive vnode lock in vinactive(), vrecycle() and vgonel()
functions.

Reviewed by:	kib
2008-07-27 11:48:15 +00:00
Pawel Jakub Dawidek
610507ae00 - Move vp test for beeing NULL under IGNORE_LOCK().
- Check if panicstr isn't set, if it is ignore the lock. This helps to avoid
  confusion, because lockmgr is a no-op when panicstr isn't NULL, so
  asserting anything at this point doesn't make sense and can just race with
  other panic.

Discussed with:	kib
2008-07-27 11:46:42 +00:00
Tom Rhodes
be6b130476 Fill in a few sysctl descriptions.
Approved by:	rwatson
2008-07-26 00:55:35 +00:00
Ed Schouten
bea45cdda3 Move ttyinfo() into its own C file.
The ttyinfo() routine generates the fancy output when pressing ^T. Right
now it is stored in tty.c. In the MPSAFE TTY code it is already stored
in tty_info.c. To make integration of the MPSAFE TTY code a little
easier, take the same approach.

This makes the TTY code a little bit more readable, because having the
proc_*/thread_* routines in tty.c is very distractful.

Approved by:	philip (mentor)
2008-07-25 14:31:00 +00:00
Konstantin Belousov
58e8af1bf5 Call pargs_drop() unconditionally in do_execve(), the function correctly
handles the NULL argument.
Make pargs_free() static.

MFC after:	1 week
2008-07-25 11:55:32 +00:00
Konstantin Belousov
96f1567fa7 s/alredy/already/ in the comments and the log message. 2008-07-25 11:22:25 +00:00
Konstantin Belousov
8b4a2800de Do the pargs_hold() on the copy of the pointer to the p_args of the
child process immediately after bulk bcopy() without dropping the
process lock.

Since process is not single-threaded when forking, dropping and
reacquiring the lock allows an other thread to change the process title
of the parent in between, and results in hold being done on the invalid
pointer. The problem manifested itself as the double free of the old
p_args.

Reported by:	kris
Reviewed by:	jhb
MFC after:	1 week
2008-07-23 08:45:25 +00:00
Attilio Rao
09400d5abe - Disallow XFS mounting in write mode. The write support never worked really
and there is no need to maintain it.
- Fix vn_get() in order to let it call vget(9) with a valid locking
  request.  vget(9) returns the vnode locked in order to prevent recycling,
  but in this case internal XFS locks alredy prevent it from happening, so
  it is safe to drop the vnode lock before to return by vn_get().
- Add a VNASSERT() in vget(9) in order to catch malformed locking requests.

Discussed with:	kan, kib
Tested by:	Lothar Braun <lothar at lobraun dot de>
2008-07-21 23:01:09 +00:00
Robert Watson
828e07694c If run_interrupt_driven_config_hooks() waits 360 seconds and INVARIANTS
is compiled into the kernel, then panic.

MFC after:	3 days
Discussed with:	scottl
2008-07-21 20:50:49 +00:00
Pawel Jakub Dawidek
7f41115ef6 Implement the following macros for completeness:
SYSCTL_QUAD()
	SYSCTL_ADD_QUAD()
	TUNABLE_QUAD()
	TUNABLE_QUAD_FETCH()

Now we can use 64bit tunables on 32bit systems.
2008-07-21 15:05:25 +00:00
Kip Macy
dd0e6c383a Add accessor functions for socket fields.
MFC after:	1 week
2008-07-21 00:49:34 +00:00