12449 Commits

Author SHA1 Message Date
John Baldwin
cd06ae5c1b Regen. 2011-11-04 04:06:31 +00:00
John Baldwin
936c09ac0f Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region.  It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types.  The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE().  These modes are
thus filesystem independent.  Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used.  These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request.  This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf().  This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode.  This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue.  The second change adds
a new vm_object_page_cache() method.  This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by:	jilles, kib, arch@
Approved by:	re (kib)
MFC after:	1 month
2011-11-04 04:02:50 +00:00
John Baldwin
dccc45e4c0 Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by:	kib
MFC after:	1 week
2011-11-04 03:39:31 +00:00
Attilio Rao
2b10b1f872 Disable interrupt and preemption for smp_rendezvous() also in the
UP/!SMP case.
The callbacks may be relying on this feature and having 2 different
ways to deal with them is not correct.

Reported by:	rstone
Reviewed by:	jhb
MFC after:	2 weeks
2011-11-03 14:36:56 +00:00
Marcel Moolenaar
b2f1a8f2b3 Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.
2011-10-30 02:19:39 +00:00
Marcel Moolenaar
056f0ec755 Define systrace_probe_func in subr_syscall.c where it's used, instead
of defining it in MD code. This eliminates porting to other architectures.
2011-10-29 01:26:36 +00:00
Sergey Kandaurov
c241c5e49a Fix arguments list for proc:::signal-discard DTrace probe.
Reported by:	Anton Yuzhaninov <citrin citrin ru>
MFC after:	1 week
2011-10-28 15:22:51 +00:00
John Baldwin
62238a6791 Whitespace fix. 2011-10-27 17:43:36 +00:00
Alan Cox
703dec68bf Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc().  While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.
2011-10-27 16:39:17 +00:00
Sergey Kandaurov
3bedc94069 Remove the long reprecated ``/stand/sysinstall'' from the init_path.
It can be put back using the INIT_PATH config option or init_path
loader variable, if still needed (which I doubt).

MFC after:	1 week
2011-10-27 10:25:11 +00:00
Alan Cox
f346986b76 contigmalloc(9) and contigfree(9) are now implemented in terms of other
more general VM system interfaces.  So, their implementation can now
reside in kern_malloc.c alongside the other functions that are declared
in malloc.h.
2011-10-27 02:52:24 +00:00
John Baldwin
c48fb4da4c - Fixup filenames in a few more places where they are used.
- Some whitespace fixes.
2011-10-26 15:17:42 +00:00
Pawel Jakub Dawidek
4c11f091df The v_data field is a pointer, so set it to NULL, not 0.
MFC after:	3 days
2011-10-25 14:01:17 +00:00
Marcel Moolenaar
421b7fe574 Don't terminate the interactive root mount prompt on mount failure.
This restores the previous behaviour. While here, match '?' and '.'
inputs exactly and improve the error message.

Requested by: avg@
Derived from a patch by: Arnaud Lacombe <lacombar@gmail.com>
2011-10-23 20:03:33 +00:00
Dag-Erling Smørgrav
e141be6f79 Revisit the capability failure trace points. The initial implementation
only logged instances where an operation on a file descriptor required
capabilities which the file descriptor did not have.  By adding a type enum
to struct ktr_cap_fail, we can catch other types of capability failures as
well, such as disallowed system calls or attempts to wrap a file descriptor
with more capabilities than it had to begin with.
2011-10-18 07:28:58 +00:00
Marcel Moolenaar
80f1c58b0a Fix double vision syndrome (read: double output) when in the
debugger without a panic.
2011-10-16 14:16:46 +00:00
Konstantin Belousov
126b36a21e Control the execution permission of the readable segments for
i386 binaries on the amd64 and ia64 with the sysctl, instead of
unconditionally enabling it.

Reviewed by:	marcel
2011-10-15 12:35:18 +00:00
Marcel Moolenaar
676eda08d0 In elf32_trans_prot() and when compiling for amd64 or ia64, add
PROT_EXECUTE when PROT_READ is needed. By default i386 allows
execution when reading is allowed and JDK 1.4.x depends on that.
2011-10-13 16:16:46 +00:00
Gleb Smirnoff
8d689e042f Make memguard(9) capable to guard uma(9) allocations. 2011-10-12 18:08:28 +00:00
Robert Watson
b160c14194 Correct a bug in export of capability-related information from the sysctls
supporting procstat -f: properly provide capability rights information to
userspace.  The bug resulted from a merge-o during upstreaming (or rather,
a failure to properly merge FreeBSD-side changed downstream).

Spotted by:     des, kibab
MFC after:      3 days
2011-10-12 12:08:03 +00:00
Adrian Chadd
df46ae53f6 Don't call fixup_filename() on each witness lock call.
This has been irking me for a while. This causes significant
CPU use on bottlenecked CPUs (eg my older EEEPC w/ an earlier
Celeron CPU and my MIPS24k boards) when they're passing
a lot of traffic.

Since the file/line values are only used for printing, this
should only affect display. It should have no operational
change on the code, besides reducing CPU use.
2011-10-12 09:21:02 +00:00
Dag-Erling Smørgrav
c601ad8eeb Add a new trace point, KTRFAC_CAPFAIL, which traces capability check
failures.  It is included in the default set for ktrace(1) and kdump(1).
2011-10-11 20:37:10 +00:00
Kirk McKusick
cd795a6e1f When unmounting a filesystem always wait for the vfs_busy lock to clear
so that if no vnodes in the filesystem are actively in use the unmount
will succeed rather than failing with EBUSY.

Reported by: Garrett Cooper
Reviewed by: Attilio Rao and Kostik Belousov
Tested by:   Garrett Cooper
PR:          kern/161016
MFC after:   3 weeks
2011-10-11 18:46:41 +00:00
Marius Strobl
f305d1b0db In device_get_children() avoid malloc(0) in order to increase portability
to other operating systems.

PR:     154287
2011-10-09 21:21:37 +00:00
Alan Cox
1549ed03ff Fix the handling of an empty kmem map by sysctl_kmem_map_free(). In
the unlikely event that sysctl_kmem_map_free() was performed on an
empty kmem map, it would incorrectly report the free space as zero.

Discussed with:	avg
MFC after:	1 week
2011-10-08 18:29:30 +00:00
Jonathan Anderson
25e33e625f Change one printf() to log().
As noted in kern/159780, printf() is not very jail-friendly, since it can't be easily monitored by jail management tools. This patch reports an error via log() instead, which, if nobody is watching the log file, still prints to the console.

Approved by: mentor (rwatson)
Submitted by: Eugene Grosbein <eugen@eg.sd.rdtc.ru>
MFC after: 5 days
2011-10-07 09:51:12 +00:00
David E. O'Brien
ef522f9515 Disallow various debug.kdb sysctl's when securelevel is raised.
PR:	161350
2011-10-07 05:47:30 +00:00
Xin LI
2b03effa01 Return proper errno when we hit error when doing sanity check.
This fixes dtrace crashes when module is not compiled with CTF
data.

Submitted by:	Paul Ambrose ambrosehua at gmail.com
MFC after:	1 week
2011-10-07 01:37:58 +00:00
Marius Strobl
880bf8b9bd - Currently, sched_balance_pair() may cause a CPU to send an IPI_PREEMPT to
itself, which sparc64 hardware doesn't support. One way to solve this
  would be to directly call sched_preempt() instead of issuing a self-IPI.
  However, quoting jhb@:
  "On the other hand, you can probably just skip the IPI entirely if we are
  going to send it to the current CPU.  Presumably, once this routine
  finishes, the current CPU will exit softlock (or will do so "soon") and
  will then pick the next thread to run based on the adjustments made in
  this routine, so there's no need to IPI the CPU running this routine
  anyway.  I think this is the better solution.  Right now what is probably
  happening on other platforms is as soon as this routine finishes the CPU
  processes its self-IPI and causes mi_switch() which will just switch back
  to the softclock thread it is already running."
- With r226054 and the the above change in place, sparc64 now no longer is
  incompatible with ULE and vice versa. However, powerpc/E500 still is.

Submitted by:	jhb
Reviewed by:	jeff
2011-10-06 11:48:13 +00:00
Edward Tomasz Napierala
4ce9c95bd8 Remove assertion against empty NFSv4 ACLs. An empty ACL is not exactly
valid - we don't allow for setting it on a file, for example - but it's
not something we should assert on.

For STABLE kernel, it changes nothing, because it's not compiled with
INVARIANTS.  If it was, it would fix crashes.  It also fixes an assert
in libc encountered with NFSv4 without nfsuserd(8) running.

Submitted by:	Yuri Pankov (earlier version)
MFC after:	1 month
2011-10-05 17:29:49 +00:00
Konstantin Belousov
a101072d7f Supply unique (st_dev, st_ino) value pair for the fstat(2) done on the pipes.
Reviewed by:	jhb, Peter Jeremy <peterjeremy acm org>
MFC after:	2 weeks
2011-10-05 16:56:06 +00:00
Konstantin Belousov
837b4d462d Move parts of the commit log for r166167, where Tor explained the
interaction between vnode locks and vfs_busy(), into comment.

MFC after:	1 week
2011-10-04 18:45:29 +00:00
Edward Tomasz Napierala
c0c0936205 Actually enforce limit for inheritable resources on fork.
MFC after:	3 days
2011-10-04 14:56:33 +00:00
Edward Tomasz Napierala
2d8696d1e8 Move some code inside the racct_proc_fork(); it spares a few lock operations
and it's more logical this way.

MFC after:	3 days
2011-10-03 17:40:55 +00:00
Konstantin Belousov
24f3dcfe50 Assert that exiting process does not return to usermode.
Reviewed by:	avg, jhb
MFC after:	1 week
2011-10-03 16:58:58 +00:00
Edward Tomasz Napierala
72a401d918 Fix another bug introduced in r225641, which caused rctl to access certain
fields in 'struct proc' before they got initialized in do_fork().

MFC after:	3 days
2011-10-03 16:23:20 +00:00
Edward Tomasz Napierala
ac6fafe6c2 Fix bug introduced in r225641, which would cause panic if racct_proc_fork()
returned error -- the racct_destroy_locked() would get called twice.

MFC after:	3 days
2011-10-03 15:32:15 +00:00
Konstantin Belousov
8e9a54ee46 The sigwait(3) function shall not return EINTR, according to the
POSIX/SUSvN. The sigwait(2) syscall does return EINTR, and libc.so.7
contains the wrapper sigwait(3) which hides EINTR from callers.  The
EINTR return is used by libthr to handle required cancellation point
in the sigwait(3).

To help the binaries linked against pre-libc.so.7, i.e. RELENG_6 and
earlier, to have right ABI for sigwait(3), transform EINTR return from
sigwait(2) into ERESTART.

Discussed with:	davidxu
MFC after:	1 week
2011-10-01 10:18:55 +00:00
Bjoern A. Zeeb
a06534c3c2 Fix handling of corrupt compress(1)ed data. [11:04]
Add missing length checks on unix socket addresses. [11:05]

Approved by:	so (cperciva)
Approved by:	re (kensmith)
Security:	FreeBSD-SA-11:04.compress
Security:	CVE-2011-2895 [11:04]
Security:	FreeBSD-SA-11:05.unix
2011-09-28 08:47:17 +00:00
Attilio Rao
79a5956c23 Revert r225372:
wdog_kern_pat() acquires eventhandler mutex, thus it cannot work in
kernel context (from where kdb_trap() runs).

The right way to fix this is both offering the
cpu-stop-on-panic-and-skip-locking logic and also a context for KDB
to officially run. We can re-enable this (or a similar) improvement
when these 2 patches hit the tree.

Sponsored by:	Sandvine Incorporated
Discussed with:	emaste, rstone
MFC after:	immediately
2011-09-27 13:42:11 +00:00
Konstantin Belousov
ce8bd78b2a Do not deliver SIGTRAP on exec as the normal signal, use ptracestop() on
syscall exit path. Otherwise, if SIGTRAP is ignored, that tdsendsignal()
do not want to deliver the signal, and debugger never get a notification
of exec.

Found and tested by:	Anton Yuzhaninov <citrin citrin ru>
Discussed with:	jhb
MFC after:	2 weeks
2011-09-27 13:17:02 +00:00
Alexander Motin
556a5850fa Fix interrupt counters dumping on SW_WATCHDOG fire. 2011-09-27 09:30:20 +00:00
Edward Tomasz Napierala
1dbf9dcc20 Fix error handling bug that would prevent MAC structures from getting
freed properly if resource limit got exceeded.

Approved by:	re (kib)
2011-09-17 20:48:49 +00:00
Edward Tomasz Napierala
b38520f09c Fix long-standing thinko regarding maxproc accounting. Basically,
we were accounting the newly created process to its parent instead
of the child itself.  This caused problems later, when the child
changed its credentials - the per-uid, per-jail etc counters were
not properly updated, because the maxproc counter in the child
process was 0.

Approved by:	re (kib)
2011-09-17 19:55:32 +00:00
Kip Macy
9eca9361f9 Auto-generated code from sys_ prefixing makesyscalls.sh change
Approved by:	re(bz)
2011-09-16 14:04:14 +00:00
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Adrian Chadd
d2849f27bc Ensure that ta_pending doesn't overflow u_short by capping its value at USHRT_MAX.
If it overflows before the taskqueue can run, the task will be
re-added to the taskqueue and cause a loop in the task list.

Reported by:	Arnaud Lacombe <lacombar@gmail.com>
Submitted by:	Ryan Stone <rysto32@gmail.com>
Reviewed by:	jhb
Approved by:	re (kib)
MFC after:	1 day
2011-09-15 08:42:06 +00:00
Rick Macklem
4d30adc494 Modify vfs_register() to use a hash calculation
on vfc_name to set vfc_typenum, so that vfc_typenum doesn't
change when file systems are loaded in different orders. This
keeps NFS file handles from changing, for file systems that
use vfc_typenum in their fsid. This change is controlled via
a loader.conf variable called vfs.typenumhash, since vfc_typenum
will change once when this is enabled. It defaults to 1 for
9.0, but will default to 0 when MFC'd to stable/8.

Tested by:	hrs
Reviewed by:	jhb, pjd (earlier version)
Approved by:	re (kib)
MFC after:	1 month
2011-09-13 21:01:26 +00:00
Attilio Rao
58379067a3 dump_write() returns ENXIO if the dump is trying to be written outside
of the device boundry.
While this is generally ok, the problem is that all the consumers
handle similar cases (and expect to catch) ENOSPC for this (for a
reference look at minidumpsys() and dumpsys() constructions). That
ends up in consumers not recognizing the issue and amd64 failing to
retry if the number of pages grows up during minidump.
Fix this by returning ENOSPC in dump_write() and while here add some
more diagnostic on involved values.

Sponsored by:	Sandvine Incorporated
In collabouration with:	emaste
Approved by:	re (kib)
MFC after:	10 days
2011-09-12 20:39:31 +00:00
Ed Schouten
ca0856d3d1 Fix error return codes for ioctls on init/lock state devices.
In revision 223722 we introduced support for driver ioctls on init/lock
state devices. Unfortunately the call to ttydevsw_cioctl() clobbers the
value of the error variable, meaning that in many cases ioctl() will now
return ENOTTY, even though the ioctl() was processed properly.

Reported by:	Boris Samorodov <bsam ipt ru>
Patch by:	jilles@
Approved by:	re@ (kib@)
2011-09-12 10:07:21 +00:00