Commit Graph

11600 Commits

Author SHA1 Message Date
julian
247d9af67e Change the semantics of the debug.ktr.alq_enable control so that when you
disable alq, it acts as if alq had not been enabled in the build.
in other words, the rest of ktr is still available for use.
If you really don't want that as well, set the mask to 0.

MFC after:3 weeks
2010-04-14 21:42:29 +00:00
kib
bf84540647 Handle a case in kern_openat() when vn_open() change file type from
DTYPE_VNODE.

Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode,
since we allow for fcntl(2) to process with advisory locks for
DTYPE_VNODE only. Another reason is that all fo_close() routines need to
check and release locks otherwise.

For O_TRUNC, call fo_truncate() instead of truncating the vnode.

Discussed with:	rwatson
MFC after:	2 week
2010-04-13 08:52:20 +00:00
kib
d5f342f2da Remove XXX comment. Add another comment, describing why f_vnode assignment
is useful.

MFC after:	3 days
2010-04-13 08:45:55 +00:00
alc
89e5d72c2b Initialize the virtual memory-related resource limits in a single place.
Previously, one of these limits was initialized in two places to a
different value in each place.  Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures.  (Currently, this
error is masked by login.conf's default settings.)

Make vm_thread_swapin() and vm_thread_swapout() static.

Submitted by:	bde (an earlier version)
Reviewed by:	kib
2010-04-11 16:26:07 +00:00
attilio
02f2ab87a2 - Introduce a blessed list for sxlocks that prevents the deadlkres to
panic on those ones. [0]
- Fix ticks counter wrap-up

Sponsored by:		Sandvine Incorporated
[0] Reported by:	jilles
[0] Tested by:		jilles
MFC:			1 week
2010-04-11 16:06:09 +00:00
kib
231b3cddc2 Do not leak master pty or ptmx vnode.
Report and test case by:	Petr Salinger <Petr.Salinger seznam cz>
Reviewed by:	ed
MFC after:	1 week
2010-04-08 08:58:18 +00:00
kib
47feb6893a When OOM searches for a process to kill, ignore the processes already
killed by OOM. When killed process waits for a page allocation, try to
satisfy the request as fast as possible.

This removes the often encountered deadlock, where OOM continously
selects the same victim process, that sleeps uninterruptibly waiting
for a page. The killed process may still sleep if page cannot be
obtained immediately, but testing has shown that system has much
higher chance to survive in OOM situation with the patch.

In collaboration with:	pho
Reviewed by:	alc
MFC after:	4 weeks
2010-04-06 10:43:01 +00:00
jh
6b4bef1bca Add missing MNT_NFS4ACLS. 2010-04-04 14:48:43 +00:00
alc
7530e331f2 Make _vm_map_init() the one place where the vm map's pmap field is
initialized.

Reviewed by:	kib
2010-04-03 19:07:05 +00:00
pjd
bd6cec6aca Fix some whitespace nits. 2010-04-03 11:19:20 +00:00
pjd
f0663e1c41 Add missing mnt_kern_flag flags in 'show mount' output. 2010-04-03 11:15:55 +00:00
avg
91cd4478c2 vn_stat: take into account va_blocksize when setting st_blksize
As currently st_blksize is always PAGE_SIZE, it is playing safe to not
use any smaller value.  For some cases this might not be optimal, but
at least nothing should get broken.

Generally I don't expect this commit to change much for the following
reasons (in case of VREG, VDIR):
- application I/O and physical I/O are sufficiently decoupled by
  filesystem code, buffer cache code, cluster and read-ahead logic
- not all applications use st_blksize as a hint, some use f_iosize, some
  use fixed block sizes

I expect writes to the middle of files on ZFS to benefit the most from
this change.

Silence from:	fs@
MFC after:	2 weeks
2010-04-03 08:39:00 +00:00
avg
ad244906c7 bo_bsize: revert r205860 and take an alternative approch in getblk
In r205860 I missed the fact that there is code that strongly assumes
that devvp bo_bsize is equal to underlying provider's sectorsize.
In those places it is hard to obtain the sectorsize in an alternative
way if devvp bo_bsize is set to something else.
So, I am reverting bo_bsize assigment in g_vfs_open.
Instead, in getblk I use DEV_BSIZE block size for b_offset calculation
if vp is a disk vp as reported by vn_isdisk.  This should coinside with
vp being a devvp.

Reported by:	Mykola Dzham <i@levsha.me>
Tested by:	Mykola Dzham <i@levsha.me>
Pointyhat to:	avg
MFC after:	2 weeks
X-ToDo:		convert bread(devvp) in all fs to use bo_bsize-d blocks
2010-04-02 15:12:31 +00:00
kib
dbbce00a33 Supply default implementation of VOP_RENAME() that does neccessary
unlocks and unreferences for argument vnodes, as expected by
kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and
reference leaks when rename is attempted on fs that does not
implement rename.

PR:	kern/107439
Based on submission by:	Mikolaj Golub <to.my.trociny gmail com>
Tested by:	Mikolaj Golub
MFC after:	1 week
2010-04-02 14:03:43 +00:00
kib
86c35b90b7 Add function vop_rename_fail(9) that performs needed cleanup for locks
and references of the VOP_RENAME(9) arguments. Use vop_rename_fail()
in deadfs_rename().

Tested by:	Mikolaj Golub
MFC after:	1 week
2010-04-02 14:03:01 +00:00
lstewart
c1a8ef630e The ALQ should not be considered drained until it has been made inactive.
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:27:10 +00:00
lstewart
f1881c310c According to SLEEP(9), msleep() is deprecated in favour of mtx_sleep().
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:23:36 +00:00
lstewart
4aab292692 - Factor code to destroy an ALQ out of alq_close() into a private alq_destroy().
- Use the new alq_destroy() to properly handle a failure case in alq_open().

Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson (as part of a larger patch)
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-04-01 01:16:00 +00:00
lstewart
a8e85cee7a Add support for ALQ(9) to be compiled and loaded as a kernel module.
Sponsored by:	FreeBSD Foundation
Reviewed by:	dwmalone, jeff, rpaulo, rwatson
Approved by:	kmacy (mentor)
MFC after:	1 month
2010-03-31 03:58:57 +00:00
jhb
5bba6cc028 Defer freeing a kevent list until after dropping kqueue locks.
LOR:		185
Submitted by:	Matthew Fleming @ Isilon
MFC after:	1 week
2010-03-30 18:31:55 +00:00
ed
4f08ecd7ed Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.
2010-03-28 13:13:22 +00:00
jh
7a3d363cfe Support only LOOKUP operation for "/" in relookup() because lookup()
can't succeed for CREATE, DELETE and RENAME.

Discussed with:	bde
2010-03-26 11:33:12 +00:00
nwhitehorn
ac2318460c Add the ELF relocation base to struct image_params. This will be
required to correctly relocate the executable entry point's function
descriptor on powerpc64.
2010-03-25 14:31:26 +00:00
nwhitehorn
d63c82a6ac Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by:	jhb
2010-03-25 14:24:00 +00:00
nwhitehorn
b60f1f5349 Change the way text_addr and data_addr are computed to use the
executable status of segments instead of detecting the main text segment
by which segment contains the program entry point. This affects
obreak() and is required for correct operation of that function
on 64-bit PowerPC systems. The previous behavior was apparently
required only for the Alpha, which is no longer supported.

Reviewed by:	jhb
Tested on:	amd64, sparc64, powerpc
2010-03-25 14:21:22 +00:00
bz
8fb79807f2 Print the pointer to the lock with the panic message. The previous
panic: rw lock not unlocked
was not really helpful for debugging. Now one can at least call
	show lock <ptr>
form ddb to learn more about the lock.

MFC after:	3 days
2010-03-24 19:21:26 +00:00
nwhitehorn
32325226c5 The nargvstr and nenvstr properties of arginfo are ints, not longs,
so should be copied to userspace with suword32() instead of suword().
This alleviates problems on 64-bit big-endian architectures, and is a
no-op on all 32-bit architectures.

Tested on:	amd64, sparc64, powerpc64
2010-03-24 03:13:24 +00:00
ed
6156503467 Actually make O_DIRECTORY work.
According to POSIX open() must return ENOTDIR when the path name does
not refer to a path name. Change vn_open() to respect this flag. This
also simplifies the Linuxolator a bit.
2010-03-21 20:43:23 +00:00
bz
102a1f8933 Split eventhandler_register() into an internal part and a wrapper function
that provides the allocated and setup eventhandler entry.

Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.

Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.

Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.

Sponsored by:	ISPsystem
Discussed with:	jhb
Reviewed by:	zec (earlier version), jhb
MFC after:	1 month
2010-03-19 19:51:03 +00:00
kib
1e468766f7 Convert aio syscall registration to SYSCALL_INIT_HELPER.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:11:34 +00:00
kib
06319cba03 Implement compat32 shims for mqueuefs.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:10:24 +00:00
kib
34d2655cb1 Implement compat32 shims for ksem syscalls.
Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:08:43 +00:00
kib
b27fa06f97 Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding
sysv_{msg,sem,shm}.c files.

Mark SysV IPC freebsd32 syscalls as NOSTD and add required
SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto
register/unregister on module load.

This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded
as modules.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:04:42 +00:00
kib
610214ed4c Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to
sysv_ipc.c.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 11:01:51 +00:00
kib
d19a162142 Introduce SYSCALL_INIT_HELPER and SYSCALL32_INIT_HELPER macros and
neccessary support functions to allow registering dynamically loaded
syscalls from the MOD_LOAD handlers. Helpers handle registration
failures semi-automatically.

Reviewed by:	jhb
MFC after:	2 weeks
2010-03-19 10:56:30 +00:00
kib
b145781d49 Properly handle compat32 calls to sctp generic sendmsd/recvmsg functions that
take iov.

Reviewed by:	tuexen
MFC after:	2 weeks
2010-03-19 10:46:54 +00:00
kib
0c73b78495 Remove dead statement.
Reviewed by:	tuexen
MFC after:	2 weeks
2010-03-19 10:44:02 +00:00
kib
08e36289a4 Fix two style issues.
MFC after:	2 weeks
2010-03-19 10:41:32 +00:00
jhb
5b4b1c75c0 Style fixes.
Submitted by:	bde
2010-03-11 15:13:55 +00:00
nwhitehorn
142a4d2993 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
jhb
3a7e251600 Fix a comment nit.
Submitted by:	Alexander Best
2010-03-11 13:16:06 +00:00
jhb
9167691ca3 Add descriptions for debug.ktr sysctl nodes. 2010-03-10 21:35:42 +00:00
imp
3ebe1a7826 Bump up the firmware_table from 30 to 50. bwn needs more than 30, it
seems.
2010-03-07 22:37:35 +00:00
alfred
88b3bf6496 put calls to gzclose() under ifdef COMPRESS_USER_CORES to prevent
undefined symbols on kernels without this option.

Reported by: Alexander Best
2010-03-04 21:53:45 +00:00
rrs
17cf51b0bb sched_getparam was just plain broke for time-share
processes. It did not return an error but instead
just let garbage be passed back. This I fix so
it actually properly translates the priority the
process is at to a posix's high means more priority.
I also fix it so that if the ULE scheduler has bumped
it up to a realtime process you get back a sane value
i.e. the highest priority (63 for time-share).

sched_setscheduler() had the setting of the
timeshare class priority disabled. With some notes
about rejecting the posix high numbers is greater
priority and use nice instead. This fix also
adjusts that to work, with the cavet that a t-s
process may well get bumped up or down i.e. the
setscheduler() will NOT change the nice value only
the current priority. I think this is reasonable
considering if the user wants to play with nice then
he can. At least all the posix'ish interfaces now
respond sanely.

MFC after:	3 weeks
2010-03-03 21:46:51 +00:00
jhb
f9290c7f6d Allow lseek(SEEK_END) to work on disk devices by using the DIOCGMEDIASIZE
to determine the media size.

Submitted by:	nox
MFC after:	1 week
2010-03-03 16:18:04 +00:00
ivoras
14f9175723 Document the VM detection type and sysctl a bit better. 2010-03-02 23:57:42 +00:00
alfred
f34ce3dd38 Merge projects/enhanced_coredumps (r204346) into HEAD:
Enhanced process coredump routines.

  This brings in the following features:
  1) Limit number of cores per process via the %I coredump formatter.
  Example:
    if corefilename is set to %N.%I.core AND num_cores = 3, then
    if a process "rpd" cores, then the corefile will be named
    "rpd.0.core", however if it cores again, then the kernel will
    generate "rpd.1.core" until we hit the limit of "num_cores".

    this is useful to get several corefiles, but also prevent filling
    the machine with corefiles.

  2) Encode machine hostname in core dump name via %H.

  3) Compress coredumps, useful for embedded platforms with limited space.
    A sysctl kern.compress_user_cores is made available if turned on.

    To enable compressed coredumps, the following config options need to be set:
    options COMPRESS_USER_CORES
    device zlib   # brings in the zlib requirements.
    device gzio   # brings in the kernel vnode gzip output module.

  4) Eventhandlers are fired to indicate coredumps in progress.

  5) The imgact sv_coredump routine has grown a flag to pass in more
  state, currently this is used only for passing a flag down to compress
  the coredump or not.

  Note that the gzio facility can be used for generic output of gzip'd
  streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan
2010-03-02 06:58:58 +00:00
bruno
3bef33deb1 Deliver siginfo when signal is generated by thr_kill(2) (SI_USER with properly
filled si_uid and si_pid).

Reported by:	Joel Bertrand <joel.bertrand systella fr>
PR:		141956
Reviewed by:	kib
MFC after:	2 weeks
2010-03-01 14:27:16 +00:00
rwatson
fc045dee13 Remove stale comment about socket buffer accounting from access(2) code.
It is the case, however, that the uidinfo of the temporary credential
set up for access(2) is not properly updated when its effective uid is
changed.

MFC after:	3 days
2010-02-27 19:57:40 +00:00