Commit Graph

11606 Commits

Author SHA1 Message Date
Andriy Gapon
5c7e270fcd periodically save system time to hardware time-of-day clock
This is done in kern_ntptime, perhaps not the best place.
This is done using resettodr().
Some features:
- make save period configurable via tunable and sysctl
- period of zero disables saving, setting a non-zero period re-enables
  it or reschedules it
- do saving only if system clock is ntp-synchronized
- save on shutdown

Discussed with:	des, Peter Jeremy <peterjeremy@acm.org>
X-Maybe:		save time near seconds boundary for better precision
MFC after:		2 weeks
2010-04-29 09:02:46 +00:00
Andriy Gapon
9a9ae42a43 kern_ntptime: abstract time error check into a function
... to avoid code duplication

MFC after:	1 week
2010-04-29 09:02:21 +00:00
Lawrence Stewart
7d11e744c1 - Rework the underlying ALQ storage to be a circular buffer, which amongst other
things allows variable length messages to be easily supported.

- Extend KPI with alq_writen() and alq_getn() to support variable length
  messages, which is enabled at ALQ creation time depending on the
  arguments passed to alq_open(). Also add variants of alq_open() and
  alq_post() that accept a flags argument. The KPI is still fully
  backwards compatible and shouldn't require any change in ALQ consumers
  unless they wish to utilise the new features.

- Introduce the ALQ_NOACTIVATE and ALQ_ORDERED flags to allow ALQ consumers
  to have more control over IO scheduling and resource acquisition
  respectively.

- Strengthen invariants checking.

- Document ALQ changes in ALQ(9) man page.

Sponsored by:	FreeBSD Foundation
Reviewed by:	gnn, jeff, rpaulo, rwatson
MFC after:	1 month
2010-04-26 13:48:22 +00:00
Konstantin Belousov
ed7806879b Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by:	pluknet
Reviewed by:	imp, jhb, nwhitehorn
MFC after:	2 weeks
2010-04-24 12:49:52 +00:00
Jeff Roberson
113db2dddb - Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
   for background fsck on unclean shutdown.

Sponsored by:   iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
2010-04-24 07:05:35 +00:00
Bjoern A. Zeeb
2f826fdf53 Remove one zero from the double-0.
This code doesn't have a license to kill.

MFC after:	3 days
2010-04-23 14:32:58 +00:00
Konstantin Belousov
b4bf2ac1ac Fix typo.
Submitted by:	emaste
Pointy hat to:	kib (who needs much bigger wardrobe)
MFC after:	1 week
2010-04-21 20:04:42 +00:00
Konstantin Belousov
05e06d1157 Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to
mostly work on 64bit host.

The work is based on an original patch submitted by emaste, obtained
from Sandvine's source tree.

Reviewed by:	jhb
MFC after:	1 week
2010-04-21 19:32:00 +00:00
Warner Losh
e8b93e4257 Make sure that we free the passed in data message if we don't actually
insert it onto the queue.  Also, fix a mtx leak if someone turns off
devctl while we're processing a messages.

MFC after:	5 days
2010-04-20 20:39:42 +00:00
Attilio Rao
0a2d5fea59 Fix compilation in the !SMP case.
Keep the interrupts disabled in order to avoid preemption problems.

Reported by:	tinderbox, b.f. <bf1783 at googlemail dot com>
MFC:		2 weeks
X-MFC:		r206878
2010-04-20 12:22:06 +00:00
Konstantin Belousov
5673e3cb08 The cache_enter(9) function shall not be called for doomed dvp.
Assert this.

In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.

Verify that dvp is not reclaimed before calling cache_enter().

Reported and tested by:	pho
Reviewed by:	kan
MFC after:	2 weeks
2010-04-20 10:19:27 +00:00
Attilio Rao
95335fd844 getblk lockmgr is mostly used as a msleep() and may lead too easilly to
false positives.
Whitelist it.

Reported by:	Erik Cederstrand <erik at cederstrand dot dk>
2010-04-19 23:40:46 +00:00
Attilio Rao
248bb9379f Fix a deadlock in the shutdown code:
When performing a smp_rendezvous() or more likely, on amd64 and i386,
a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx
spinlock held, busy-waiting for other CPUs to acknowledge the operation.
As long as CPUs are suspended (via cpu_reset()) between the active mask
read and IPI sending there can be a deadlock where the caller will wait
forever for a dead CPU to acknowledge the operation.
Please note that on CPU0 that is going to be someway heavier because of
the spinlocks being disabled earlier than quitting the machine.

Fix this bug by calling cpu_reset() with the smp_ipi_mtx held.
Note that it is very likely that a saner offline/online CPUs mechanism
will help heavilly in fixing similar cases as it is likely more bugs
of this type may arise in the future.

Reported by:	rwatson
Discussed with:	jhb
Tested by:	rnoland, Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
MFC:		2 weeks

Special deciation to:	anyone who made possible to have 16-ways machines
			in Netperf
2010-04-19 23:27:54 +00:00
Konstantin Belousov
3e22320c43 Fix typo.
MFC after:	3 days
2010-04-15 17:17:02 +00:00
Julian Elischer
94a04272dd Change the semantics of the debug.ktr.alq_enable control so that when you
disable alq, it acts as if alq had not been enabled in the build.
in other words, the rest of ktr is still available for use.
If you really don't want that as well, set the mask to 0.

MFC after:3 weeks
2010-04-14 21:42:29 +00:00
Konstantin Belousov
d5ff273563 Handle a case in kern_openat() when vn_open() change file type from
DTYPE_VNODE.

Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode,
since we allow for fcntl(2) to process with advisory locks for
DTYPE_VNODE only. Another reason is that all fo_close() routines need to
check and release locks otherwise.

For O_TRUNC, call fo_truncate() instead of truncating the vnode.

Discussed with:	rwatson
MFC after:	2 week
2010-04-13 08:52:20 +00:00
Konstantin Belousov
5478ba7358 Remove XXX comment. Add another comment, describing why f_vnode assignment
is useful.

MFC after:	3 days
2010-04-13 08:45:55 +00:00
Alan Cox
ac45ee97c9 Initialize the virtual memory-related resource limits in a single place.
Previously, one of these limits was initialized in two places to a
different value in each place.  Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures.  (Currently, this
error is masked by login.conf's default settings.)

Make vm_thread_swapin() and vm_thread_swapout() static.

Submitted by:	bde (an earlier version)
Reviewed by:	kib
2010-04-11 16:26:07 +00:00
Attilio Rao
36e51f655d - Introduce a blessed list for sxlocks that prevents the deadlkres to
panic on those ones. [0]
- Fix ticks counter wrap-up

Sponsored by:		Sandvine Incorporated
[0] Reported by:	jilles
[0] Tested by:		jilles
MFC:			1 week
2010-04-11 16:06:09 +00:00
Konstantin Belousov
6d51747362 Do not leak master pty or ptmx vnode.
Report and test case by:	Petr Salinger <Petr.Salinger seznam cz>
Reviewed by:	ed
MFC after:	1 week
2010-04-08 08:58:18 +00:00
Konstantin Belousov
3f1c4c4f31 When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	4 weeks
2010-04-06 10:43:01 +00:00
Jaakko Heinonen
0e9bd4171f Add missing MNT_NFS4ACLS. 2010-04-04 14:48:43 +00:00
Alan Cox
92351f162e Make _vm_map_init() the one place where the vm map's pmap field is
initialized.

Reviewed by:	kib
2010-04-03 19:07:05 +00:00
Pawel Jakub Dawidek
b9d8d69108 Fix some whitespace nits. 2010-04-03 11:19:20 +00:00
Pawel Jakub Dawidek
000026c809 Add missing mnt_kern_flag flags in 'show mount' output. 2010-04-03 11:15:55 +00:00
Andriy Gapon
364b8a7b33 vn_stat: take into account va_blocksize when setting st_blksize
As currently st_blksize is always PAGE_SIZE, it is playing safe to not
use any smaller value.  For some cases this might not be optimal, but
at least nothing should get broken.

Generally I don't expect this commit to change much for the following
reasons (in case of VREG, VDIR):
- application I/O and physical I/O are sufficiently decoupled by
  filesystem code, buffer cache code, cluster and read-ahead logic
- not all applications use st_blksize as a hint, some use f_iosize, some
  use fixed block sizes

I expect writes to the middle of files on ZFS to benefit the most from
this change.

Silence from:	fs@
MFC after:	2 weeks
2010-04-03 08:39:00 +00:00
Andriy Gapon
1b4bc5f851 bo_bsize: revert r205860 and take an alternative approch in getblk
In r205860 I missed the fact that there is code that strongly assumes
that devvp bo_bsize is equal to underlying provider's sectorsize.
In those places it is hard to obtain the sectorsize in an alternative
way if devvp bo_bsize is set to something else.
So, I am reverting bo_bsize assigment in g_vfs_open.
Instead, in getblk I use DEV_BSIZE block size for b_offset calculation
if vp is a disk vp as reported by vn_isdisk.  This should coinside with
vp being a devvp.

Reported by:	Mykola Dzham <i@levsha.me>
Tested by:	Mykola Dzham <i@levsha.me>
Pointyhat to:	avg
MFC after:	2 weeks
X-ToDo:		convert bread(devvp) in all fs to use bo_bsize-d blocks
2010-04-02 15:12:31 +00:00
Konstantin Belousov
0bc7bd67ca Supply default implementation of VOP_RENAME() that does neccessary
unlocks and unreferences for argument vnodes, as expected by
kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and
reference leaks when rename is attempted on fs that does not
implement rename.

PR:	kern/107439
Based on submission by:	Mikolaj Golub <to.my.trociny gmail com>
Tested by:	Mikolaj Golub
MFC after:	1 week
2010-04-02 14:03:43 +00:00
Konstantin Belousov
ea01588095 Add function vop_rename_fail(9) that performs needed cleanup for locks
and references of the VOP_RENAME(9) arguments. Use vop_rename_fail()
in deadfs_rename().

Tested by:	Mikolaj Golub
MFC after:	1 week
2010-04-02 14:03:01 +00:00
Lawrence Stewart
97c11ef282 The ALQ should not be considered drained until it has been made inactive.
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:27:10 +00:00
Lawrence Stewart
9ffad7a94d According to SLEEP(9), msleep() is deprecated in favour of mtx_sleep().
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:23:36 +00:00
Lawrence Stewart
c0ea37a885 - Factor code to destroy an ALQ out of alq_close() into a private alq_destroy().
- Use the new alq_destroy() to properly handle a failure case in alq_open().

Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:16:00 +00:00
Lawrence Stewart
d28f42f957 Add support for ALQ(9) to be compiled and loaded as a kernel module.
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-03-31 03:58:57 +00:00
John Baldwin
7e3d78ae6d Defer freeing a kevent list until after dropping kqueue locks.
LOR:		185
Submitted by:	Matthew Fleming @ Isilon
MFC after:	1 week
2010-03-30 18:31:55 +00:00
Ed Schouten
510ea843ba Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
2010-03-28 13:13:22 +00:00
Jaakko Heinonen
246b651054 Support only LOOKUP operation for "/" in relookup() because lookup()
can't succeed for CREATE, DELETE and RENAME.

Discussed with:	bde
2010-03-26 11:33:12 +00:00
Nathan Whitehorn
a0ea661f5e Add the ELF relocation base to struct image_params. This will be
required to correctly relocate the executable entry point's function
descriptor on powerpc64.
2010-03-25 14:31:26 +00:00
Nathan Whitehorn
a107d8aac9 Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
Nathan Whitehorn
920acedb80 Change the way text_addr and data_addr are computed to use the
executable status of segments instead of detecting the main text segment
by which segment contains the program entry point. This affects
obreak() and is required for correct operation of that function
on 64-bit PowerPC systems. The previous behavior was apparently
required only for the Alpha, which is no longer supported.

Reviewed by:	jhb
Tested on:	amd64, sparc64, powerpc
2010-03-25 14:21:22 +00:00
Bjoern A. Zeeb
2430ab4629 Print the pointer to the lock with the panic message. The previous
panic: rw lock not unlocked
was not really helpful for debugging. Now one can at least call
	show lock <ptr>
form ddb to learn more about the lock.

MFC after:	3 days
2010-03-24 19:21:26 +00:00
Nathan Whitehorn
f4e26adefc The nargvstr and nenvstr properties of arginfo are ints, not longs,
so should be copied to userspace with suword32() instead of suword().
This alleviates problems on 64-bit big-endian architectures, and is a
no-op on all 32-bit architectures.

Tested on:	amd64, sparc64, powerpc64
2010-03-24 03:13:24 +00:00
Ed Schouten
0fef797f4a Actually make O_DIRECTORY work.
According to POSIX open() must return ENOTDIR when the path name does
not refer to a path name. Change vn_open() to respect this flag. This
also simplifies the Linuxolator a bit.
2010-03-21 20:43:23 +00:00
Bjoern A. Zeeb
42eedeac00 Split eventhandler_register() into an internal part and a wrapper function
that provides the allocated and setup eventhandler entry.

Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.

Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.

Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.

Sponsored by:	ISPsystem
Discussed with:	jhb
Reviewed by:	zec (earlier version), jhb
MFC after:	1 month
2010-03-19 19:51:03 +00:00
Konstantin Belousov
723d37c0ac Convert aio syscall registration to SYSCALL_INIT_HELPER.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:11:34 +00:00
Konstantin Belousov
afde2b6593 Implement compat32 shims for mqueuefs.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:10:24 +00:00
Konstantin Belousov
0e5d5bc279 Implement compat32 shims for ksem syscalls.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:08:43 +00:00
Konstantin Belousov
75d633cbf6 Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding
sysv_{msg,sem,shm}.c files.

Mark SysV IPC freebsd32 syscalls as NOSTD and add required
SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto
register/unregister on module load.

This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded
as modules.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:04:42 +00:00
Konstantin Belousov
4cfc39cfa6 Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to
sysv_ipc.c.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:01:51 +00:00
Konstantin Belousov
0687ba3e90 Introduce SYSCALL_INIT_HELPER and SYSCALL32_INIT_HELPER macros and
neccessary support functions to allow registering dynamically loaded
syscalls from the MOD_LOAD handlers. Helpers handle registration
failures semi-automatically.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 10:56:30 +00:00
Konstantin Belousov
5322f02ec0 Properly handle compat32 calls to sctp generic sendmsd/recvmsg functions that
take iov.

Reviewed by:	tuexen
MFC after:	2 weeks
2010-03-19 10:46:54 +00:00