Commit Graph

11744 Commits

Author SHA1 Message Date
Zachary Loafman
7fd32ea923 Add VOP_ADVLOCKPURGE so that the file system is called when purging
locks (in the case where the VFS impl isn't using lf_*)

Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, dfr
2010-05-12 21:24:46 +00:00
Pawel Jakub Dawidek
408a7c5093 When there is no memory or KVA, try to help by reclaiming some vnodes.
This helps with 'kmem_map too small' panics.

No objections from:	kib
Tested by:		Alexander V. Ribchansky <shurik@zk.informjust.ua>
MFC after:		1 week
2010-05-12 16:42:28 +00:00
Pawel Jakub Dawidek
c60c36a745 I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

MFC after:	3 days
2010-05-11 22:46:36 +00:00
Attilio Rao
98332c8c71 Right now, WITNESS just blindly pipes all the output to the
(TOCONS | TOLOG) mask even when called from DDB points.
That breaks several output, where the most notable is textdump output.
Fix this by having configurable callbacks passed to witness_list_locks()
and witness_display_spinlock() for printing out datas.

Reported by:	several broken textdump outputs
Tested by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
MFC after:	7 days
X-MFC:		r207922
2010-05-11 18:24:22 +00:00
Attilio Rao
3caaaae046 There is not a good reason to have a different prototype for db_printf()
when compared to printf().
Unify it by returning the number of characters displayed for db_printf()
as well.

MFC after:	7 days
2010-05-11 17:01:14 +00:00
Attilio Rao
de6648745c Fix a hang introduced in r206878 for kernel compiled with SMP support but
being not actual SMP and similar situations by always initializing the
smp ipi mutex.

Reported by:	marius
MFC after:	3 days
X-MFC:		r206878
2010-05-11 15:36:16 +00:00
Alan Cox
7d1d2ef60a Update a comment: It no longer makes sense to talk about the page queues
lock here.
2010-05-08 23:01:47 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Konstantin Belousov
d2ba618a63 Add MAKEDEV_NOWAIT flag to make_dev_credf(9), to create a device node
in a no-sleep context. If resource allocation cannot be done without
sleep, make_dev_credf() fails and returns NULL.

Reviewed by:	jh
MFC after:	2 weeks
2010-05-06 19:22:50 +00:00
Alan Cox
eb00b276ab Eliminate page queues locking around most calls to vm_page_free(). 2010-05-06 18:58:32 +00:00
Edward Tomasz Napierala
77dda2b96f Avoid overflow.
Submitted by:	bde@
2010-05-06 18:52:41 +00:00
Edward Tomasz Napierala
307d88b787 Style fixes and removal of unneeded variable.
Submitted by:	bde@
2010-05-06 18:43:19 +00:00
Alan Cox
f0c0d3998d Remove page queues locking from all sf_buf_mext()-like functions. The page
lock now suffices.

Fix a couple nearby style violations.
2010-05-06 17:43:41 +00:00
Alan Cox
52683078a2 Eliminate a small bit of unneeded code from kern_sendfile(): While
kern_sendfile() is running, the file's vm object can't be destroyed
because kern_sendfile() increments the vm object's reference count.  (Once
kern_sendfile() decrements the reference count and returns, the vm object
can, however, be destroyed.  So, sf_buf_mext() must handle the case where
the vm object is destroyed.)

Reviewed by:	kib
2010-05-06 15:52:08 +00:00
Joel Dahl
8e0ad55abb Switch to our preferred 2-clause BSD license.
Approved by:	kmacy
2010-05-05 20:39:02 +00:00
Alan Cox
5ac59343be Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held.  (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements.  It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib
2010-05-05 18:16:06 +00:00
Edward Tomasz Napierala
b5f770bd86 Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().
Reviewed by:	kib
2010-05-05 16:44:25 +00:00
Konstantin Belousov
213c077f62 Fix a mistake in r207603. td_rux.rux_runtime still needs conversion.
Reported and tested by:	nwhitehorn
Pointy hat to:	kib
MFC after:	6 days
2010-05-05 16:05:51 +00:00
Alan Cox
e3ef0d2fcf Push down the acquisition of the page queues lock into vm_page_unwire().
Update the comment describing which lock should be held on entry to
vm_page_wire().

Reviewed by:	kib
2010-05-05 03:45:46 +00:00
Alan Cox
a7283d3213 Add page locking to the vm_page_cow* functions.
Push down the acquisition and release of the page queues lock into
vm_page_wire().

Reviewed by:	kib
2010-05-04 15:55:41 +00:00
Konstantin Belousov
9182554ae9 Fix typo in comment.
MFC after:	3 days
2010-05-04 06:06:01 +00:00
Konstantin Belousov
603a4d7f41 Remove a comment that merely repeats code.
Submitted by:	bde
MFC after:	1 week
2010-05-04 06:04:33 +00:00
Konstantin Belousov
03d13670c2 Use td_rux.rux_runtime for ki_runtime instead of redoing calculation.
Submitted by:	bde
MFC after:	1 week
2010-05-04 06:00:39 +00:00
Konstantin Belousov
bed4c52416 Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks
information for thread to allow calcru1() (re)use.

Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1].
The ruxagg_locked() function no longer clears thread ticks nor
td_incruntime.

Requested by:	attilio [1]
Discussed with:	attilio, bde
Reviewed by:	bde
Based on submission by:	Alexander Krizhanovsky <ak natsys-lab com>
MFC after:	1 week
X-MFC-Note:	td_rux shall be moved to the end of struct thread
2010-05-04 05:55:37 +00:00
Alan Cox
c5a648516e Acquire the page lock around vm_page_unwire() and vm_page_wire().
Reviewed by:	kib
2010-05-03 16:41:11 +00:00
Alan Cox
913814935a This is the first step in transitioning responsibility for synchronizing
access to the page's wire_count from the page queues lock to the page lock.

Submitted by:	kmacy
2010-05-03 05:41:50 +00:00
Konstantin Belousov
a0b8e597e5 Lock the page around hold_count access.
Reviewed by:	alc
2010-05-02 19:25:22 +00:00
Alan Cox
139a0de7f1 Properly synchronize access to the page's hold_count in vfs_vmio_release().
Reviewed by:	kib
2010-05-02 19:10:27 +00:00
Alan Cox
b88b6c9d80 It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),
to unconditionally set PG_REFERENCED on a page before sleeping.  In many
cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by
the page daemon, before the caller to vm_page_sleep() is reawakened.
Instead, we now explicitly set PG_REFERENCED in those cases where having
the page persist until the caller is awakened is clearly desirable.  Note,
however, that setting PG_REFERENCED on the page is still only a hint,
and not a guarantee that the page should persist.
2010-05-02 17:33:46 +00:00
Marko Zec
a83baab6e4 Remove a redundant variable assignment.
Reviewed by:	bz, rwatson
MFC after:	3 days
2010-05-01 18:34:50 +00:00
Konstantin Belousov
3087dc40a9 Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility
function ruxagg_tlock().
Convert the definition of kern_getrusage() to ANSI C.

Submitted by:	Alexander Krizhanovsky <ak natsys-lab com>
MFC after:	1 week
2010-05-01 14:46:17 +00:00
Zachary Loafman
1dac222419 Handle taskqueue_drain(9) correctly on a threaded taskqueue:
taskqueue_drain(9) will not correctly detect whether a task is
currently running.  The check is against a field in the taskqueue
struct, but for a threaded queue with more than one thread, multiple
threads can simultaneously be running a task, thus stomping over the
tq_running field.

Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        jhb
Approved by:        dfr (mentor)
2010-04-30 16:29:05 +00:00
Alfred Perlstein
b7402d8269 Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(),
use the heap instead.

Obtained from: Juniper Networks

Reviewed by:	jhb
2010-04-30 03:15:00 +00:00
Alfred Perlstein
fba6b1af2e Don't leak core_buf or gzfile if doing a compressed core file and we
hit an error condition.

Obtained from: Juniper Networks
2010-04-30 03:13:24 +00:00
Alfred Perlstein
feb112c552 Do not set IO_NODELOCKED while writing to vnodes as our consumers
do not lock the vnodes.

Obtained from: Juniper Networks

Reviewed by:	jhb
2010-04-30 03:10:53 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Konstantin Belousov
e68d26fd5d Remove caddr_t casts.
Requested by:	bde
MFC after:	10 days
2010-04-29 09:55:51 +00:00
Andriy Gapon
4f27c5edfe kern_ntptime: drop a comment that became stale after r207359
MFC after:	1 week
X-MFC after:	r207359
2010-04-29 09:18:36 +00:00
Andriy Gapon
5c7e270fcd periodically save system time to hardware time-of-day clock
This is done in kern_ntptime, perhaps not the best place.
This is done using resettodr().
Some features:
- make save period configurable via tunable and sysctl
- period of zero disables saving, setting a non-zero period re-enables
  it or reschedules it
- do saving only if system clock is ntp-synchronized
- save on shutdown

Discussed with:	des, Peter Jeremy <peterjeremy@acm.org>
X-Maybe:		save time near seconds boundary for better precision
MFC after:		2 weeks
2010-04-29 09:02:46 +00:00
Andriy Gapon
9a9ae42a43 kern_ntptime: abstract time error check into a function
... to avoid code duplication

MFC after:	1 week
2010-04-29 09:02:21 +00:00
Lawrence Stewart
7d11e744c1 - Rework the underlying ALQ storage to be a circular buffer, which amongst other
things allows variable length messages to be easily supported.

- Extend KPI with alq_writen() and alq_getn() to support variable length
  messages, which is enabled at ALQ creation time depending on the
  arguments passed to alq_open(). Also add variants of alq_open() and
  alq_post() that accept a flags argument. The KPI is still fully
  backwards compatible and shouldn't require any change in ALQ consumers
  unless they wish to utilise the new features.

- Introduce the ALQ_NOACTIVATE and ALQ_ORDERED flags to allow ALQ consumers
  to have more control over IO scheduling and resource acquisition
  respectively.

- Strengthen invariants checking.

- Document ALQ changes in ALQ(9) man page.

Sponsored by:	FreeBSD Foundation
Reviewed by:	gnn, jeff, rpaulo, rwatson
MFC after:	1 month
2010-04-26 13:48:22 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Jeff Roberson
113db2dddb - Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
   for background fsck on unclean shutdown.

Sponsored by:   iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
2010-04-24 07:05:35 +00:00
Bjoern A. Zeeb
2f826fdf53 Remove one zero from the double-0.
This code doesn't have a license to kill.

MFC after:	3 days
2010-04-23 14:32:58 +00:00
Konstantin Belousov
b4bf2ac1ac Fix typo.
Submitted by:	emaste
Pointy hat to:	kib (who needs much bigger wardrobe)
MFC after:	1 week
2010-04-21 20:04:42 +00:00
Konstantin Belousov
05e06d1157 Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to
mostly work on 64bit host.

The work is based on an original patch submitted by emaste, obtained
from Sandvine's source tree.

Reviewed by:	jhb
MFC after:	1 week
2010-04-21 19:32:00 +00:00
Warner Losh
e8b93e4257 Make sure that we free the passed in data message if we don't actually
insert it onto the queue.  Also, fix a mtx leak if someone turns off
devctl while we're processing a messages.

MFC after:	5 days
2010-04-20 20:39:42 +00:00
Attilio Rao
0a2d5fea59 Fix compilation in the !SMP case.
Keep the interrupts disabled in order to avoid preemption problems.

Reported by:	tinderbox, b.f. <bf1783 at googlemail dot com>
MFC:		2 weeks
X-MFC:		r206878
2010-04-20 12:22:06 +00:00
Konstantin Belousov
5673e3cb08 The cache_enter(9) function shall not be called for doomed dvp.
Assert this.

In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.

Verify that dvp is not reclaimed before calling cache_enter().

Reported and tested by:	pho
Reviewed by:	kan
MFC after:	2 weeks
2010-04-20 10:19:27 +00:00
Attilio Rao
95335fd844 getblk lockmgr is mostly used as a msleep() and may lead too easilly to
false positives.
Whitelist it.

Reported by:	Erik Cederstrand <erik at cederstrand dot dk>
2010-04-19 23:40:46 +00:00
Attilio Rao
248bb9379f Fix a deadlock in the shutdown code:
When performing a smp_rendezvous() or more likely, on amd64 and i386,
a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx
spinlock held, busy-waiting for other CPUs to acknowledge the operation.
As long as CPUs are suspended (via cpu_reset()) between the active mask
read and IPI sending there can be a deadlock where the caller will wait
forever for a dead CPU to acknowledge the operation.
Please note that on CPU0 that is going to be someway heavier because of
the spinlocks being disabled earlier than quitting the machine.

Fix this bug by calling cpu_reset() with the smp_ipi_mtx held.
Note that it is very likely that a saner offline/online CPUs mechanism
will help heavilly in fixing similar cases as it is likely more bugs
of this type may arise in the future.

Reported by:	rwatson
Discussed with:	jhb
Tested by:	rnoland, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
MFC:		2 weeks

Special deciation to:	anyone who made possible to have 16-ways machines
			in Netperf
2010-04-19 23:27:54 +00:00
Konstantin Belousov
3e22320c43 Fix typo.
MFC after:	3 days
2010-04-15 17:17:02 +00:00
Julian Elischer
94a04272dd Change the semantics of the debug.ktr.alq_enable control so that when you
disable alq, it acts as if alq had not been enabled in the build.
in other words, the rest of ktr is still available for use.
If you really don't want that as well, set the mask to 0.

MFC after:3 weeks
2010-04-14 21:42:29 +00:00
Konstantin Belousov
d5ff273563 Handle a case in kern_openat() when vn_open() change file type from
DTYPE_VNODE.

Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode,
since we allow for fcntl(2) to process with advisory locks for
DTYPE_VNODE only. Another reason is that all fo_close() routines need to
check and release locks otherwise.

For O_TRUNC, call fo_truncate() instead of truncating the vnode.

Discussed with:	rwatson
MFC after:	2 week
2010-04-13 08:52:20 +00:00
Konstantin Belousov
5478ba7358 Remove XXX comment. Add another comment, describing why f_vnode assignment
is useful.

MFC after:	3 days
2010-04-13 08:45:55 +00:00
Alan Cox
ac45ee97c9 Initialize the virtual memory-related resource limits in a single place.
Previously, one of these limits was initialized in two places to a
different value in each place.  Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures.  (Currently, this
error is masked by login.conf's default settings.)

Make vm_thread_swapin() and vm_thread_swapout() static.

Submitted by:	bde (an earlier version)
Reviewed by:	kib
2010-04-11 16:26:07 +00:00
Attilio Rao
36e51f655d - Introduce a blessed list for sxlocks that prevents the deadlkres to
panic on those ones. [0]
- Fix ticks counter wrap-up

Sponsored by:		Sandvine Incorporated
[0] Reported by:	jilles
[0] Tested by:		jilles
MFC:			1 week
2010-04-11 16:06:09 +00:00
Konstantin Belousov
6d51747362 Do not leak master pty or ptmx vnode.
Report and test case by:	Petr Salinger <Petr.Salinger seznam cz>
Reviewed by:	ed
MFC after:	1 week
2010-04-08 08:58:18 +00:00
Konstantin Belousov
3f1c4c4f31 When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	4 weeks
2010-04-06 10:43:01 +00:00
Jaakko Heinonen
0e9bd4171f Add missing MNT_NFS4ACLS. 2010-04-04 14:48:43 +00:00
Alan Cox
92351f162e Make _vm_map_init() the one place where the vm map's pmap field is
initialized.

Reviewed by:	kib
2010-04-03 19:07:05 +00:00
Pawel Jakub Dawidek
b9d8d69108 Fix some whitespace nits. 2010-04-03 11:19:20 +00:00
Pawel Jakub Dawidek
000026c809 Add missing mnt_kern_flag flags in 'show mount' output. 2010-04-03 11:15:55 +00:00
Andriy Gapon
364b8a7b33 vn_stat: take into account va_blocksize when setting st_blksize
As currently st_blksize is always PAGE_SIZE, it is playing safe to not
use any smaller value.  For some cases this might not be optimal, but
at least nothing should get broken.

Generally I don't expect this commit to change much for the following
reasons (in case of VREG, VDIR):
- application I/O and physical I/O are sufficiently decoupled by
  filesystem code, buffer cache code, cluster and read-ahead logic
- not all applications use st_blksize as a hint, some use f_iosize, some
  use fixed block sizes

I expect writes to the middle of files on ZFS to benefit the most from
this change.

Silence from:	fs@
MFC after:	2 weeks
2010-04-03 08:39:00 +00:00
Andriy Gapon
1b4bc5f851 bo_bsize: revert r205860 and take an alternative approch in getblk
In r205860 I missed the fact that there is code that strongly assumes
that devvp bo_bsize is equal to underlying provider's sectorsize.
In those places it is hard to obtain the sectorsize in an alternative
way if devvp bo_bsize is set to something else.
So, I am reverting bo_bsize assigment in g_vfs_open.
Instead, in getblk I use DEV_BSIZE block size for b_offset calculation
if vp is a disk vp as reported by vn_isdisk.  This should coinside with
vp being a devvp.

Reported by:	Mykola Dzham <i@levsha.me>
Tested by:	Mykola Dzham <i@levsha.me>
Pointyhat to:	avg
MFC after:	2 weeks
X-ToDo:		convert bread(devvp) in all fs to use bo_bsize-d blocks
2010-04-02 15:12:31 +00:00
Konstantin Belousov
0bc7bd67ca Supply default implementation of VOP_RENAME() that does neccessary
unlocks and unreferences for argument vnodes, as expected by
kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and
reference leaks when rename is attempted on fs that does not
implement rename.

PR:	kern/107439
Based on submission by:	Mikolaj Golub <to.my.trociny gmail com>
Tested by:	Mikolaj Golub
MFC after:	1 week
2010-04-02 14:03:43 +00:00
Konstantin Belousov
ea01588095 Add function vop_rename_fail(9) that performs needed cleanup for locks
and references of the VOP_RENAME(9) arguments. Use vop_rename_fail()
in deadfs_rename().

Tested by:	Mikolaj Golub
MFC after:	1 week
2010-04-02 14:03:01 +00:00
Lawrence Stewart
97c11ef282 The ALQ should not be considered drained until it has been made inactive.
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:27:10 +00:00
Lawrence Stewart
9ffad7a94d According to SLEEP(9), msleep() is deprecated in favour of mtx_sleep().
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:23:36 +00:00
Lawrence Stewart
c0ea37a885 - Factor code to destroy an ALQ out of alq_close() into a private alq_destroy().
- Use the new alq_destroy() to properly handle a failure case in alq_open().

Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:16:00 +00:00
Lawrence Stewart
d28f42f957 Add support for ALQ(9) to be compiled and loaded as a kernel module.
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-03-31 03:58:57 +00:00
John Baldwin
7e3d78ae6d Defer freeing a kevent list until after dropping kqueue locks.
LOR:		185
Submitted by:	Matthew Fleming @ Isilon
MFC after:	1 week
2010-03-30 18:31:55 +00:00
Ed Schouten
510ea843ba Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
2010-03-28 13:13:22 +00:00
Jaakko Heinonen
246b651054 Support only LOOKUP operation for "/" in relookup() because lookup()
can't succeed for CREATE, DELETE and RENAME.

Discussed with:	bde
2010-03-26 11:33:12 +00:00
Nathan Whitehorn
a0ea661f5e Add the ELF relocation base to struct image_params. This will be
required to correctly relocate the executable entry point's function
descriptor on powerpc64.
2010-03-25 14:31:26 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Nathan Whitehorn
920acedb80 Change the way text_addr and data_addr are computed to use the
executable status of segments instead of detecting the main text segment
by which segment contains the program entry point. This affects
obreak() and is required for correct operation of that function
on 64-bit PowerPC systems. The previous behavior was apparently
required only for the Alpha, which is no longer supported.

Reviewed by:	jhb
Tested on:	amd64, sparc64, powerpc
2010-03-25 14:21:22 +00:00
Bjoern A. Zeeb
2430ab4629 Print the pointer to the lock with the panic message. The previous
panic: rw lock not unlocked
was not really helpful for debugging. Now one can at least call
	show lock <ptr>
form ddb to learn more about the lock.

MFC after:	3 days
2010-03-24 19:21:26 +00:00
Nathan Whitehorn
f4e26adefc The nargvstr and nenvstr properties of arginfo are ints, not longs,
so should be copied to userspace with suword32() instead of suword().
This alleviates problems on 64-bit big-endian architectures, and is a
no-op on all 32-bit architectures.

Tested on:	amd64, sparc64, powerpc64
2010-03-24 03:13:24 +00:00
Ed Schouten
0fef797f4a Actually make O_DIRECTORY work.
According to POSIX open() must return ENOTDIR when the path name does
not refer to a path name. Change vn_open() to respect this flag. This
also simplifies the Linuxolator a bit.
2010-03-21 20:43:23 +00:00
Bjoern A. Zeeb
42eedeac00 Split eventhandler_register() into an internal part and a wrapper function
that provides the allocated and setup eventhandler entry.

Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.

Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.

Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.

Sponsored by:	ISPsystem
Discussed with:	jhb
Reviewed by:	zec (earlier version), jhb
MFC after:	1 month
2010-03-19 19:51:03 +00:00
Konstantin Belousov
723d37c0ac Convert aio syscall registration to SYSCALL_INIT_HELPER.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:11:34 +00:00
Konstantin Belousov
afde2b6593 Implement compat32 shims for mqueuefs.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:10:24 +00:00
Konstantin Belousov
0e5d5bc279 Implement compat32 shims for ksem syscalls.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:08:43 +00:00
Konstantin Belousov
75d633cbf6 Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding
sysv_{msg,sem,shm}.c files.

Mark SysV IPC freebsd32 syscalls as NOSTD and add required
SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto
register/unregister on module load.

This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded
as modules.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:04:42 +00:00
Konstantin Belousov
4cfc39cfa6 Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to
sysv_ipc.c.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:01:51 +00:00
Konstantin Belousov
0687ba3e90 Introduce SYSCALL_INIT_HELPER and SYSCALL32_INIT_HELPER macros and
neccessary support functions to allow registering dynamically loaded
syscalls from the MOD_LOAD handlers. Helpers handle registration
failures semi-automatically.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 10:56:30 +00:00
Konstantin Belousov
5322f02ec0 Properly handle compat32 calls to sctp generic sendmsd/recvmsg functions that
take iov.

Reviewed by:	tuexen
MFC after:	2 weeks
2010-03-19 10:46:54 +00:00
Konstantin Belousov
fd9d1e7627 Remove dead statement.
Reviewed by:	tuexen
MFC after:	2 weeks
2010-03-19 10:44:02 +00:00
Konstantin Belousov
0a977ede48 Fix two style issues.
MFC after:	2 weeks
2010-03-19 10:41:32 +00:00
John Baldwin
1b25979b06 Style fixes.
Submitted by:	bde
2010-03-11 15:13:55 +00:00
Nathan Whitehorn
841c0c7ec7 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
John Baldwin
9e0cda0391 Fix a comment nit.
Submitted by:	Alexander Best
2010-03-11 13:16:06 +00:00
John Baldwin
dff8e0b7cf Add descriptions for debug.ktr sysctl nodes. 2010-03-10 21:35:42 +00:00
Warner Losh
91ee765659 Bump up the firmware_table from 30 to 50. bwn needs more than 30, it
seems.
2010-03-07 22:37:35 +00:00
Alfred Perlstein
8b325009a3 put calls to gzclose() under ifdef COMPRESS_USER_CORES to prevent
undefined symbols on kernels without this option.

Reported by: Alexander Best
2010-03-04 21:53:45 +00:00
Randall Stewart
bec67fd3bb sched_getparam was just plain broke for time-share
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).

sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.

MFC after:	3 weeks
2010-03-03 21:46:51 +00:00
John Baldwin
ab99e589dd Allow lseek(SEEK_END) to work on disk devices by using the DIOCGMEDIASIZE
to determine the media size.

Submitted by:	nox
MFC after:	1 week
2010-03-03 16:18:04 +00:00
Ivan Voras
07ce8d9b43 Document the VM detection type and sysctl a bit better. 2010-03-02 23:57:42 +00:00
Alfred Perlstein
e722820434 Merge projects/enhanced_coredumps (r204346) into HEAD:
Enhanced process coredump routines.

  This brings in the following features:
  1) Limit number of cores per process via the %I coredump formatter.
  Example:
    if corefilename is set to %N.%I.core AND num_cores = 3, then
    if a process "rpd" cores, then the corefile will be named
    "rpd.0.core", however if it cores again, then the kernel will
    generate "rpd.1.core" until we hit the limit of "num_cores".

    this is useful to get several corefiles, but also prevent filling
    the machine with corefiles.

  2) Encode machine hostname in core dump name via %H.

  3) Compress coredumps, useful for embedded platforms with limited space.
    A sysctl kern.compress_user_cores is made available if turned on.

    To enable compressed coredumps, the following config options need to be set:
    options COMPRESS_USER_CORES
    device zlib   # brings in the zlib requirements.
    device gzio   # brings in the kernel vnode gzip output module.

  4) Eventhandlers are fired to indicate coredumps in progress.

  5) The imgact sv_coredump routine has grown a flag to pass in more
  state, currently this is used only for passing a flag down to compress
  the coredump or not.

  Note that the gzio facility can be used for generic output of gzip'd
  streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan
2010-03-02 06:58:58 +00:00
Bruno Ducrot
5f73a7eb08 Deliver siginfo when signal is generated by thr_kill(2) (SI_USER with properly
filled si_uid and si_pid).

Reported by:	Joel Bertrand <joel.bertrand systella fr>
PR:		141956
Reviewed by:	kib
MFC after:	2 weeks
2010-03-01 14:27:16 +00:00
Robert Watson
6c77113c9f Remove stale comment about socket buffer accounting from access(2) code.
It is the case, however, that the uidinfo of the temporary credential
set up for access(2) is not properly updated when its effective uid is
changed.

MFC after:	3 days
2010-02-27 19:57:40 +00:00
Alan Cox
0b993ee5fd When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs.  Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by:	jhb, kib
MFC after:	3 days
2010-02-27 18:00:57 +00:00
Konstantin Belousov
00305546f3 For kinfo_proc in kp->ki_siglist, return the set of the signals pending
in the process queue when gathering information for the process, and set
of signals pending for the thread, when gathering information for the
thread. Previously, the sysctl returned a union of the process and some
arbitrary thread pending set for the process, and union of the process
and the thread pending set for the thread.

MFC after:	1 week
2010-02-27 15:32:49 +00:00
Konstantin Belousov
cf14146715 Fix several style issues.
Define make_dev_credv() as static to match declaration.

MFC after:	3 days
2010-02-27 15:26:36 +00:00
Jilles Tjoelker
bbb17d169c Include terminated threads in ps's process cpu time field.
MFC after:	2 weeks
2010-02-27 12:15:59 +00:00
Brooks Davis
2e5514defa Don't inforce an upper bound on kern.ngroups. The INT_MAX-1 limit was
too high due to several overflows.  The actual limit is somewhere in the
neighborhood of INT_MAX/4 on 64-bit machines, but most systems could not
support such a limit due to a lack of memory and the cost of duplicate
credentials.

Reported by:	bde
2010-02-24 15:52:18 +00:00
Ed Schouten
28993443b4 Decompose the most lousy named file in sys/kern; kern_subr.c.
Although this file has historically been used as a dumping ground for
random functions, nowadays it only contains functions related to copying
bits {from,to} userspace and hash table utility functions.

Behold, subr_uio.c and subr_hash.c.
2010-02-21 19:53:33 +00:00
Bjoern A. Zeeb
0a68a45914 Set curvnet earlier so that it also covers calls to sodisconnect(), which
before were possibly panicing the system in ULP code in the VIMAGE case.

Submitted by:	Igor (igor ispsystem.com)
MFC after:	5 days
2010-02-20 22:29:28 +00:00
Attilio Rao
fb7c88e695 Use the cached value within comparison.
Submitted by:	jhb
2010-02-19 15:10:05 +00:00
Attilio Rao
3aadd65dbb Fix the grammar.
Submitted by:	Brandon Gooch <bgooch at se dot edu>
2010-02-19 15:03:55 +00:00
Attilio Rao
dd55eec791 Fix a race in regard of p_numthreads.
Submitted by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
2010-02-19 14:59:41 +00:00
Pawel Jakub Dawidek
d779d44313 - Reduce scope of vnode lock. vfs_mount_alloc() doesn't need vnode to be
locked.
- Remove code duplication.
2010-02-18 22:22:45 +00:00
Pawel Jakub Dawidek
e15d7a0b4c Use vput() instead of VOP_UNLOCK()+vrele(). The comment here is out-dated,
we no longer pass thread pointer to VOP_UNLOCK().
2010-02-18 22:14:44 +00:00
Pawel Jakub Dawidek
0454fe84e4 Use NULL instead of 0 when setting up pointer. 2010-02-18 22:12:40 +00:00
Neel Natu
6f3c632700 Kernel module support for mips.
Reviewed by: gonzo

Tested by: Alexandr Rybalko (ray@dlink.ua)
2010-02-18 05:49:52 +00:00
Konstantin Belousov
d07dc8e3eb Do not leak process lock when current thread is not allowed to see target.
Bumped into by:	ed
MFC after:	3 days
2010-02-14 13:59:01 +00:00
Marcel Moolenaar
af002ab819 Initialize pve_fsid and pve_fileid to VNOVAL. 2010-02-11 21:10:56 +00:00
Marcel Moolenaar
162110273c o Add support for COMPAT_IA32.
o  Incorporate review comments:
   -  Properly reference and lock the map
   -  Take into account that the VM map can change inbetween requests
   -  Add the fileid and fsid attributes

Credits: kib@
Reviewed by: kib@
2010-02-11 18:00:53 +00:00
David Xu
8251549f27 In function umtxq_insert_queue, use parameter q (shared/exclusive queue)
instead of hard coded constant. This does not affect RELENG_8 and previous,
because the code only exists in the HEAD.
2010-02-10 05:47:34 +00:00
Marcel Moolenaar
8a25c0c741 Unbreak building kernels with COMPAT_32 enabled. The actual support
for the PT_VM_ENTRY request from 32-bit processes will follow.

Pointy hat: marcel
2010-02-09 17:20:00 +00:00
Marcel Moolenaar
90b4621a5f Add PT_VM_TIMESTAMP and PT_VM_ENTRY so that the tracing process can
obtain the memory map of the traced process. PT_VM_TIMESTAMP can be
used to check if the memory map changed since the last time to avoid
iterating over all the VM entries unnecesarily.

MFC after:	1 month
2010-02-09 05:52:35 +00:00
Ed Schouten
790f66db55 Remove unused LIBCOMPAT keyword from syscalls.master. 2010-02-08 10:02:01 +00:00
David Xu
93f0162799 Set waiters flag before checking semaphore's counter,
otherwise we might lose a wakeup. Tested on postgresql database server.
2010-02-08 07:31:05 +00:00
Gavin Atkinson
03be779506 Spelling nit 2010-02-07 18:00:13 +00:00
Ed Schouten
f004528903 Remove statistics from the TTY queues.
I added counters to see how often fast copying to userspace was actually
performed, which was only useful during development. Remove these
statistics now we know it to be effective.
2010-02-07 15:42:15 +00:00
Alexander Motin
6f15a274a8 MFp4:
Make CAM to stop all attached devices on system shutdown.
It allows devices to park heads, reducing stress on power loss.
Add `kern.cam.power_down` tunable and sysctl to controll it.
2010-02-03 08:42:08 +00:00
David Xu
43271eacad Fix comments in do_sem_wait(). 2010-02-03 07:21:20 +00:00
David Xu
676e6574a1 After busied the lock, re-read state word before checking waiters flag,
otherwise, the waiters bit may not be set and a wakeup is lost.

Submitted by:	justin.teller at gmail dot com
MFC after:	3 days
2010-02-03 03:56:32 +00:00
Robert Watson
b10c6cf467 Only audit pathnames in namei(9) if copying the directory string completes
successfully.  Continue to do this before the empty path check so that the
ENOENT returned in that case gets an empty string token in the BSM record.

MFC after:	3 days
2010-02-02 23:10:27 +00:00
Andriy Gapon
89fc20cc5e KASSERT that return value of interrupt filter complies with contract
For example a return value of zero could lead to a stuck level-triggered
interrupt line.

Reviewed by:	jhb (for INTR_FILTER case)
MFC after:	3 weeks
2010-01-27 09:59:08 +00:00
Xin LI
215940b3fa Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2010-01-27 00:30:07 +00:00
Attilio Rao
58060789e6 Split out an invariant in order to better check that newtd, when
provided, must be on a runqueue.

Tested by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
MFC:		2 weeks
X-MFC:		r202889
2010-01-24 18:16:38 +00:00
Attilio Rao
a50e80dcdd - Fix the kthread_{suspend, resume, suspend_check}() locking.
In the current code, the locking is completely broken and may lead
  easilly to deadlocks. Fix it by using the proc_mtx, linked to the
  suspending thread, as lock for the operation.  Keep using the
  thread_lock for setting and reading the flag even if it is not entirely
  necessary (atomic ops may do it as well, but this way the code is more
  readable).
- Fix a deadlock within kthread_suspend().
  The suspender should not sleep on a different channel wrt the suspended
  thread, or, otherwise, the awaker should wakeup both. Uniform the
  interface to what the kproc_* counterparts do (sleeping on the same
  channel).
- Change the kthread_suspend_check() prototype.
  kthread_suspend_check() always assumes curthread and must only refer to
  it, so skip the thread pointer as it may be easilly mistaken.
  If curthread is not a kthread, the system will panic.

In collabouration with:	jhb
Tested by:		Giovanni Trematerra
			<giovanni dot trematerra at gmail dot com>
MFC:			2 weeks
2010-01-24 15:07:00 +00:00
Attilio Rao
b0b9dee5c9 - Fix a race in sched_switch() of sched_4bsd.
In the case of the thread being on a sleepqueue or a turnstile, the
  sched_lock was acquired (without the aid of the td_lock interface) and
  the td_lock was dropped. This was going to break locking rules on other
  threads willing to access to the thread (via the td_lock interface) and
  modify his flags (allowed as long as the container lock was different
  by the one used in sched_switch).
  In order to prevent this situation, while sched_lock is acquired there
  the td_lock gets blocked. [0]
- Merge the ULE's internal function thread_block_switch() into the global
  thread_lock_block() and make the former semantic as the default for
  thread_lock_block(). This means that thread_lock_block() will not
  disable interrupts when called (and consequently thread_unlock_block()
  will not re-enabled them when called). This should be done manually
  when necessary.
  Note, however, that ULE's thread_unblock_switch() is not reaped
  because it does reflect a difference in semantic due in ULE (the
  td_lock may not be necessarilly still blocked_lock when calling this).
  While asymmetric, it does describe a remarkable difference in semantic
  that is good to keep in mind.

[0] Reported by:	Kohji Okuno
			<okuno dot kohji at jp dot panasonic dot com>
Tested by:		Giovanni Trematerra
			<giovanni dot trematerra at gmail dot com>
MFC:			2 weeks
2010-01-23 15:54:21 +00:00
Konstantin Belousov
5b1162b964 For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment. Since procfs stopeven requires number of syscall
arguments in p_xstat, this cannot be solved by moving stop/trace point
before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reported by:	Ali Polatel <alip exherbo org>
Briefly discussed with:	jhb
MFC after:	3 weeks
2010-01-23 11:45:35 +00:00
Konstantin Belousov
a5799a4f27 Staticise sigqueue manipulation functions used only in kern_sig.c.
MFC after:	1 week
2010-01-23 11:43:30 +00:00
Konstantin Belousov
6a671a6bde When traced process is about to receive the signal, the process is
stopped and debugger may modify or drop the signal. After the changes to
keep process-targeted signals on the process sigqueue, another thread
may note the old signal on the queue and act before the thread removes
changed or dropped signal from the process queue. Since process is
traced, it usually gets stopped. Or, if the same signal is delivered
while process was stopped, the thread may erronously remove it,
intending to remove the original signal.

Remove the signal from the queue before notifying the debugger. Restore
the siginfo to the head of sigqueue when signal is allowed to be
delivered to the debugee, using newly introduced KSI_HEAD ksiginfo_t
flag. This preserves required order of delivery. Always restore the
unchanged signal on the curthread sigqueue, not to the process queue,
since the thread is about to get it anyway, because sigmask cannot be
changed.

Handle failure of reinserting the siginfo into the queue by falling
back to sq_kill method, calling sigqueue_add with NULL ksi.

If debugger changed the signal to be delivered, use sigqueue_add()
with NULL ksi instead of only setting sq_signals bit.

Reported by:	Gardner Bell <gbell72 rogers com>
Analyzed and first version of fix by:	Tijl Coosemans <tijl coosemans org>
PR:	142757
Reviewed by:	davidxu
MFC after:	2 weeks
2010-01-20 11:58:04 +00:00
Ed Schouten
081a0db343 Remove a dead initialization.
Spotted by:	scan-build (uqs)
2010-01-18 18:58:03 +00:00
Konstantin Belousov
d2f334bfc9 Add new function vunref(9) that decrements vnode use count (and hold
count) while vnode is exclusively locked.

The code for vput(9), vrele(9) and vunref(9) is merged.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	3 weeks
2010-01-17 21:24:27 +00:00
Bjoern A. Zeeb
592bcae802 Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by:	jamie, hrs (ipv6 part)
Pointed out by:	hrs [1]
MFC After:	2 weeks
Asked for by:	Jase Thew (bazerka beardz.net)
2010-01-17 12:57:11 +00:00
Brooks Davis
9126964cdb Only allocate the space we need before calling kern_getgroups instead
of allocating what ever the user asks for up to "ngroups_max + 1".  On
systems with large values of kern.ngroups this will be more efficient.

The now redundant check that the array is large enough in
kern_getgroups() is deliberate to allow this change to be merged to
stable/8 without breaking potential third party consumers of the API.

Reported by:	bde
MFC after:	28 days
2010-01-15 07:18:46 +00:00
Ed Schouten
d3e4b91f9c Remove the 1000 pseudo terminal limit from pts(4).
Even with the old utmp format, we could in fact go to pts/9999, because
ut_line wasn't guaranteed to be null terminated there.
2010-01-13 21:22:23 +00:00
Brooks Davis
93833c1db6 Declare the kern.ngroups sysctl to be read-only, but tunable at boot for
better error reporting.

Submitted by:	Matthew Fleming <matthew dot fleming at isilon dot com>
MFC After:	1 month
2010-01-12 18:20:20 +00:00
Brooks Davis
412f9500e2 Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1.  kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1.  Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after:	1 month
2010-01-12 07:49:34 +00:00
Bjoern A. Zeeb
fe0518e9ee Change DDB show prison:
- name some columns more closely to the user space variables,
  as we do for host.* or allow.* (in the listing) already.
- print pr_childmax (children.max).
- prefix hex values with 0x.

MFC after:	3 weeks
2010-01-11 22:34:25 +00:00
Bjoern A. Zeeb
bef916f99c Adjust a comment to reflect reality, as we have proper source
address selection, even for IPv4, since r183571.

Pointed out by:	Jase Thew (bazerka beardz.net)
MFC after:	3 days
2010-01-11 21:21:30 +00:00
Kirk McKusick
e268f54cb4 Background:
When renaming a directory it passes through several intermediate
states. First its new name will be created causing it to have two
names (from possibly different parents). Next, if it has different
parents, its value of ".." will be changed from pointing to the old
parent to pointing to the new parent. Concurrently, its old name
will be removed bringing it back into a consistent state. When fsck
encounters an extra name for a directory, it offers to remove the
"extraneous hard link"; when it finds that the names have been
changed but the update to ".." has not happened, it offers to rewrite
".." to point at the correct parent. Both of these changes were
considered unexpected so would cause fsck in preen mode or fsck in
background mode to fail with the need to run fsck manually to fix
these problems. Fsck running in preen mode or background mode now
corrects these expected inconsistencies that arise during directory
rename. The functionality added with this update is used by fsck
running in background mode to make these fixes.

Solution:

This update adds three new fsck sysctl commands to support background
fsck in correcting expected inconsistencies that arise from incomplete
directory rename operations. They are:

setcwd(dirinode) - set the current directory to dirinode in the
    filesystem associated with the snapshot.
setdotdot(oldvalue, newvalue) - Verify that the inode number for ".."
    in the current directory is oldvalue then change it to newvalue.
unlink(nameptr, oldvalue) - Verify that the inode number associated
    with nameptr in the current directory is oldvalue then unlink it.

As with all other fsck sysctls, these new ones may only be used by
processes with appropriate priviledge.

Reported by:    	jeff
Security issues:	rwatson
2010-01-11 20:44:05 +00:00
Warner Losh
eae8e367c5 Merge change r198561 from projects/mips to head:
r198561 | thompsa | 2009-10-28 15:25:22 -0600 (Wed, 28 Oct 2009) | 4 lines
Allow a scratch buffer to be set in order to be able to use setenv() while
booting, before dynamic kenv is running. A few platforms implement their own
scratch+sprintf handling to save data from the boot environment.
2010-01-10 22:34:18 +00:00
David Xu
a4b0b4b062 Make a chain be a list of queues, and make threads waiting
for same key coalesce to same queue, this makes searching
path shorter and improves performance.
Also fix comments about shared PI-mutex.
2010-01-10 09:31:57 +00:00