Commit Graph

8710 Commits

Author SHA1 Message Date
David Xu
a861574011 Temporarily disable nice threshold detection code, as it can starve
a thread holding critical resource, e.g mutex or other implicit
synchronous flags. Give thread which exceeds nice threshold a minimum
time slice.

PR: kern/86087
2005-09-22 01:19:37 +00:00
John Baldwin
e12560dd4b Use correct VFS locking rather than unconditionally grabbing Giant around
namei() calls in kern_alternate_path().

Reviewed by:	csjp
MFC after:	1 week
2005-09-21 19:49:42 +00:00
Robert Watson
87328e07e0 Pass 'curthread' into VFS_STATFS() from acctwatch(), rather than passing
NULL.  The NFS client expects that a thread will always be present for a
VOP so that it can check for signal conditions, and will dereference a
NULL pointer if one isn't present.

MFC after:	3 days
2005-09-21 15:28:07 +00:00
Robert Watson
5580b0b157 Correct an incorrect comment from the dawn of time: neither tprintf()
nor uprintf() is believed to perform tsleep() or msleep() as written,
as ttycheckoutq() is called with '0' as its sleep argument.

Remove recently added WITNESS warnings for sleep as the comment was
incorrect.  This should silence a warning from the nfs_timer() code.

Discussed with:	bde
2005-09-20 09:55:36 +00:00
Andre Oppermann
e452573df7 Start time_uptime with 1 instead of 0.
Discussed with:		phk
2005-09-19 22:16:31 +00:00
Poul-Henning Kamp
e606a3c63e Rewamp DEVFS internals pretty severely [1].
Give DEVFS a proper inode called struct cdev_priv.  It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex.  Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables,  the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr.  Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them.  A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there.  We still spelunk into other mountpoints
and fondle their data without 100% good locking.  It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints.  It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by:		Kris
Much confusion caused by (races in):	md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.
2005-09-19 19:56:48 +00:00
Robert Watson
84d2b7df26 Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions.  In most cases, this means adding an
acquisition of Giant immediately around the function.  In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset.  This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after:	1 week
2005-09-19 16:51:43 +00:00
Robert Watson
223aaaecb0 Remove mac_create_root_mount() and mpo_create_root_mount(), which
provided access to the root file system before the start of the
init process.  This was used briefly by SEBSD before it knew about
preloading data in the loader, and using that method to gain
access to data earlier results in fewer inconsistencies in the
approach.  Policy modules still have access to the root file system
creation event through the mac_create_mount() entry point.

Removed now, and will be removed from RELENG_6, in order to gain
third party policy dependencies on the entry point for the lifetime
of the 6.x branch.

MFC after:	3 days
Submitted by:	Chris Vance <Christopher dot Vance at SPARTA dot com>
Sponsored by:	SPARTA
2005-09-19 13:59:57 +00:00
Marcel Moolenaar
73130b2224 Move the UUID generator into its own function, called kern_uuidgen(),
so that UUIDs can be generated from within the kernel. The uuidgen(2)
syscall now allocates kernel memory, calls the generator, and does a
copyout() for the whole UUID store. This change is in support of GPT.
2005-09-18 21:40:15 +00:00
Robert Watson
8434c29b28 Add three new read-only socket options, which allow regression tests
and other applications to query the state of the stack regarding the
accept queue on a listen socket:

SO_LISTENQLIMIT    Return the value of so_qlimit (socket backlog)
SO_LISTENQLEN      Return the value of so_qlen (complete sockets)
SO_LISTENINCQLEN   Return the value of so_incqlen (incomplete sockets)

Minor white space tweaks to existing socket options to make them
consistent.

Discussed with:	andre
MFC after:	1 week
2005-09-18 21:08:03 +00:00
Robert Watson
bc6b8b5d64 Fix spelling in a comment.
MFC after:	3 days
2005-09-18 10:46:34 +00:00
Robert Watson
7da7362b95 Re-comment sbcompress() to explain what it is it does; it took me
quite a bit of reading to figure it out, and I want to avoid figuring
it out again.

Convert an if (foo) else printf("this is almost a panic") into a
KASSERT.

MFC after:	3 days
2005-09-18 10:30:10 +00:00
Warner Losh
fe0519b171 MFp4: Expose device_probe_child() 2005-09-18 01:32:09 +00:00
Christian S.J. Peron
42e7197fba Implement new world order in VFS locking for ACLs. This will remove the
unconditional acquisition of Giant for ACL related operations. If the file
system is set as being MP safe and debug.mpsafevfs is 1, do not pickup
giant.

For any operations which require namei(9) lookups:

__acl_get_file
__acl_get_link
__acl_set_file
__acl_set_link
__acl_delete_file
__acl_delete_link
__acl_aclcheck_file
__acl_aclcheck_link

-Set the MPSAFE flag in NDINIT
-Initialize vfslocked variable using the NDHASGIANT macro

For functions which operate on fds, make sure the operations are locked:

__acl_get_fd
__acl_set_fd
__acl_delete_fd
__acl_aclcheck_fd

-Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode

Discussed with:	jeff
2005-09-17 22:01:14 +00:00
Tor Egge
61ac14dab6 Break out of loop if next buffer pointer has become invalid while flushing
current buffer.

Reviewed by:	kan
2005-09-16 18:28:12 +00:00
Stephan Uphoff
19b2dff7b0 Fix race condition that caused activation of an event to
be ignored immediately after it was deactivated.

Found by: 	Yahoo!
MFC after:	3 days
2005-09-15 21:10:12 +00:00
John Baldwin
21f9e816cd Oops, missed adding the required include.
Pointy hat to:	jhb
2005-09-15 20:20:36 +00:00
John Baldwin
53c0e1ff7d Replace the dont_sleep_in_callout mutex hack (similar to g_x{up,down})
with the disallow sleeping facility.
2005-09-15 20:09:08 +00:00
John Baldwin
10f508d9a3 Don't disallow sleeping for handlers on swi's since some swi handlers
(like CAM) do sleep in their handlers.

Requested by:	scottl
2005-09-15 20:08:21 +00:00
John Baldwin
b27dbfbf4a - Enforce an implicit lock order that Giant cannot be locked while holding
any other non-sleepable lock.  In plain English: Giant comes before all
  other mutexes.
- Add some extra description to the lock order reversal printf's to indicate
  when a reversal is triggered by a hard-coded implicit rule.

Requested by:	truckman (2)
MFC after:	1 week
2005-09-15 19:07:14 +00:00
John Baldwin
51460da87f - Add a new simple facility for marking the current thread as being in a
state where sleeping on a sleep queue is not allowed.  The facility
  doesn't support recursion but uses a simple private per-thread flag
  (TDP_NOSLEEPING).  The sleepq_add() function will panic if the flag is
  set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
  (ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.

MFC after:	1 week
Reviewed by:	phk
2005-09-15 19:05:37 +00:00
Christian S.J. Peron
68ff2a4397 Improve the MP safeness associated with the creation of symbolic
links and the execution of ELF binaries. Two problems were found:

1) The link path wasn't tagged as being MP safe and thus was not properly
   protected.
2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was
   insufficiently protected.

This commit makes the following changes:

-Sets the MPSAFE flag in NDINIT for symbolic link paths
-Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which
 will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been
 picked up.
-Drop in an assertion into vfs_lookup which ensures that if the MPSAFE
 flag is NOT set, that we have picked up giant. If not panic (if WITNESS
 compiled into the kernel). This should help us find conditions where vnode
 operations are in-sufficiently protected.

This is a RELENG_6 candidate.

Discussed with:	jeff
MFC after:	4 days
2005-09-15 15:03:48 +00:00
Maxim Konovalov
aada5cccd8 Backout rev. 1.246, it breaks code uses shutdown(2) on non-connected
sockets.

Pointed out by:	rwatson
2005-09-15 13:18:05 +00:00
Ralf S. Engelschall
724447ac41 Fix system shutdown timeout handling by again supporting longer running
shutdown procedures (which have a duration of more than 120 seconds).

We have two user-space affecting shutdown timeouts: a "soft" one in
/etc/rc.shutdown and a "hard" one in init(8). The first one can be
configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults
to 30 seconds. The second one was originally (in 1998) intended to be
configured via sysctl(8) variable "kern.shutdown_timeout" and defaults
to 120 seconds.

Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999
(as it obviously is actually not used within the kernel itself) and
hence was intentionally but misleadingly removed in revision 1.107 from
init_main.c. Kernel sysctl(8) variables are certainly a wrong way to
control user-space processes in general, but in this particular case the
sysctl(8) variable should have remained as it supports init(8), which
isn't passed command line flags (which in turn could have been set via
/etc/rc.conf), etc.

As there is already a similar "kern.init_path" sysctl(8) variable which
directly affects init(8), resurrect the init(8) shutdown timeout under
sysctl(8) variable "kern.init_shutdown_timeout". But this time document
it as being intentionally unused within the kernel and used by init(8).
Also document it in the manpages init(8) and rc.conf(5).

Reviewed by: phk
MFC after: 2 weeks
2005-09-15 13:16:07 +00:00
Maxim Konovalov
c5cff17017 o Return ENOTCONN when shutdown(2) on non-connected socket.
PR:		kern/84761
Submitted by:	James Juran
R-test:		tools/regression/sockets/shutdown
MFC after:	1 month
2005-09-15 11:45:36 +00:00
Poul-Henning Kamp
74f46f19aa Retire unused dev_named() function. 2005-09-15 08:01:57 +00:00
Robert Watson
fd1a469ba5 In vfs_kqfilter(), return EINVAL instead of 1 (EPERM) when an unsupported
kqueue filter type is requested on a vnode.

MFC after:	3 days
2005-09-12 19:22:37 +00:00
Jung-uk Kim
9ed448b20c use monotonic time_uptime' instead of time_second'
Approved by:	anholt (mentor)
Discussed on:	arch
2005-09-12 15:31:28 +00:00
Poul-Henning Kamp
2883ba6668 Introduce vfs_read_dirent() which can help VOP_READDIR() implementations
by handling all the cookie stuff.
2005-09-12 08:46:07 +00:00
Tor Egge
6ff5e2db45 Don't retry when vget() returns ENOENT in the nonblocking case due to the
vnode being doomed.  It causes a livelock.
2005-09-12 01:48:57 +00:00
Don Lewis
908b3deb2b Relocate witness_levelall(), witness_leveldescendents(), and
witness_displaydescendants() so that they are protected by
"#ifdef DDB/#endif" to unbreak kernels not using "option DDB".

MFC after:	3 weeks
2005-09-11 07:57:06 +00:00
Gleb Smirnoff
d04304d155 Make callout_reset() return a non-zero value if a pending callout
was rescheduled. If there was no pending callout, then return 0.

Reviewed by:	iedowse, cperciva
2005-09-08 14:20:39 +00:00
Don Lewis
d07f87a218 Add a new struct buf flag bit, B_PERSISTENT, and use it to tag
struct bufs that are persistently held by ext2fs.  Ignore any buffers
with this flag in the code in boot() that counts "busy" and dirty
buffers and attempts to sync the dirty buffers, which is done before
attempting to unmount all the file systems during shutdown.

This fixes the problem caused by any ext2fs file systems that are
mounted at system shutdown time, which caused boot() to give up on
a non-zero number of buffers and skip the call to vfs_unmountall().
This left all the mounted file systems in a dirty state and caused
them to all require cleanup by fsck on reboot.

Move the two separate copies of the "busy" buffer test in boot()
to a separate function.

Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro.

Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with
this and previous flag changes.

PR:		kern/56675, kern/85163
Tested by:	"Matthias Andree" matthias.andree at gmx.de
Reviewed by:	bde
MFC after:	3 days
2005-09-08 06:30:05 +00:00
David E. O'Brien
5b1c0294e4 Forward declaring static variables as extern is invalid ISO-C. Now that
GCC can properly handle forward static declarations, do this properly.
2005-09-07 10:06:14 +00:00
Gleb Smirnoff
016e62123a In soreceive(), when a first mbuf is removed from socket buffer use
sockbuf_pushsync(). Previous manipulation could lead to an inconsistent
mbuf.

Reviewed by:	rwatson
2005-09-06 17:05:11 +00:00
Gleb Smirnoff
f46ab10c02 Document flags of a pollrec. 2005-09-06 11:09:18 +00:00
Christian S.J. Peron
d1dfd92177 Convert the primary ACL allocator from malloc(9) to using a UMA zone instead.
Also introduce an aclinit function which will be used to create the UMA zone
for use by file systems at system start up.

MFC after:	1 month
Discussed with:	rwatson
2005-09-06 00:06:30 +00:00
Gleb Smirnoff
16901c0186 Remove Giant mutex from polling(4) and use a separate poll_mtx(4)
instead. Detailed changelist:

o Add flags field to struct pollrec, to indicate that
  are particular entry is being worked on.
o Define a macro PR_VALID() to check that a pollrec
  is valid and pollable.
o Mark ISRs as mpsafe.

o ether_poll()
  - Acquire poll_mtx while traversing pollrec array.
  - Skip pollrecs, that are being worked on.
  - Conditionally acquire Giant when entering handler.

o netisr_pollmore()
  - Conditionally assert Giant.
  - Acquire poll_mtx while working with statistics.

o netisr_poll()
  - Conditionally assert Giant.
  - Acquire poll_mtx while working with statistics
    and traversing pollrec array.

o ether_poll_register(), ether_poll_deregister()
  - Conditionally assert Giant.
  - Acquire poll_mtx while working with pollrec array.

o poll_idle()
  - Remove all strange manipulations with Giant.

In collaboration with:	ru, pjd
In collaboration with:	Oleg Bulyzhin <oleg rinet.ru>
In collaboration with:	dima <_pppp mail.ru>
2005-09-05 16:02:11 +00:00
Xin LI
5248ef8a3c When padding with zero, do pad after prefixes rather than padding
before prefixes.

Use cases:
	printf("%05d", -42);   -->   "00-42"   (should be "-0042")
	printf("%#05x", 12);   -->   "000xc"   (should be "0x00c")

Submitted by:	Oliver Fromme
PR:		kern/85520
MFC After:	1 week
2005-09-04 18:03:45 +00:00
Poul-Henning Kamp
1e7d2c4763 If we ignore an unknown % sequence, we must stop interpreting the
remaining % arguments because the varargs are now out of sync and
there is a risk that we might for instance dereference an integer
in a %s argument.

Sponsored by: Napatech.com
2005-09-03 10:28:08 +00:00
John Baldwin
acc0265cc2 - Add some comments to some of the static lock orders. Don't explicitly
link proctree and allproc to Giant since that order is already implicitly
  enforced.
- Use a goto to handle the case where we want to enforce a reversal before
  calling isitmydescendant() in witness_checkorder() so that the logic is
  easier to follow and so that it is easier to add more forced-reversal
  cases in the future.

MFC after:	 3 days
2005-09-02 20:23:49 +00:00
John Baldwin
83cece6fa1 - Add an assertion to panic if one tries to call mtx_trylock() on a spin
mutex.
- Don't panic if a spin lock is held too long inside _mtx_lock_spin() if
  panicstr is set (meaning that we are already in a panic).  Just keep
  spinning forever instead.
2005-09-02 20:21:49 +00:00
John Baldwin
83de502d59 Add witness warnings to panic if a thread tries to exit while holding any
locks.

Requested by:	jeff
MFC after:	3 days
2005-09-02 20:20:01 +00:00
Nate Lawson
9000b91eb9 Break out the checks for duplicates and absolute settings being too high
instead of trying to do them all at once.  This should fix the level sorting
problems from the previous revision.

Testing help:	ume
2005-09-02 16:32:43 +00:00
Suleiman Souhlal
1f71de49e1 Print out a warning and a backtrace if we try to unlock a lockmgr that
we do not hold.

Glanced at by:	phk
MFC after:	3 days
2005-09-02 15:56:01 +00:00
Suleiman Souhlal
2611e5a6a9 Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed
and unbusied in devfs_fixup(), which assumes that the devfs mount is
still locked.

Granced at by:	phk
MFC after:	3 days
2005-09-02 13:37:54 +00:00
Pawel Jakub Dawidek
d8b464e51e In case of mac_check_vnode_rename_from() or vn_start_write() failure,
vn_finished_write() should not be called.

Reviewed by:	ssouhlal
MFC after:	3 days
2005-09-01 21:46:33 +00:00
Andre Oppermann
fdcc028d11 Changes and cleanups to m_sanity():
o for() instead of while() looping  over mbuf chain
o paren's around all flag checks
o more verbose function and purpose description
o some more style changes

Based on feedback from:	sam
2005-08-30 21:31:42 +00:00
Andre Oppermann
e0068c3a69 Unbreak m_demote() and put back the 'all' flag. Without it we cannot
correctly test for m_nextpkt in an mbuf chain.
2005-08-30 21:14:30 +00:00
Andre Oppermann
fbe816384a o Remove the 'all' flag from m_demote(). Users can simply call it with
m_demote(m->m_next) if they wish to start at the second mbuf in chain.
o Test m_type with == instead of &.
o Check m_nextpkt against NULL instead of implicit 0.

Based on feedback from:	sam
2005-08-30 20:07:49 +00:00