12542 Commits

Author SHA1 Message Date
kib
6ecd4a2bb2 Fix a race between getvnode() dereferencing half-constructed file
and dupfdopen().

Reported and tested by:	pho
MFC after:	3 days
2011-11-24 20:34:06 +00:00
trociny
878f4f16e9 Fix build without INVARIANTS.
Discussed with:	kib
2011-11-23 08:11:04 +00:00
hselasky
53a216b722 Rename device_delete_all_children() into device_delete_children().
Suggested by:	jhb @ and marius @
MFC after:	1 week
2011-11-22 21:56:55 +00:00
hselasky
9eef52e077 Style change.
Suggested by:	jhb @ and marius @
MFC after:	1 week
2011-11-22 21:53:19 +00:00
trociny
ce852d7df6 Add new sysctls, KERN_PROC_ENV and KERN_PROC_AUXV, to return
environment strings and ELF auxiliary vectors from a process stack.

Make sysctl_kern_proc_args to read not cached arguments from the
process stack.

Export proc_getargv() and proc_getenvv() so they can be reused by
procfs and linprocfs.

Suggested by:	kib
Reviewed by:	kib
Discussed with:	kib, rwatson, jilles
Tested by:	pho
MFC after:	2 weeks
2011-11-22 20:40:18 +00:00
lstewart
09887e1dc5 - Add Pulse-Per-Second timestamping using raw ffcounter and corresponding
ffclock time in seconds.

- Add IOCTL to retrieve ffclock timestamps from userland.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 13:34:29 +00:00
attilio
b95134ea01 Introduce the same mutex-wise fix in r227758 for sx locks.
The functions that offer file and line specifications are:
- sx_assert_
- sx_downgrade_
- sx_slock_
- sx_slock_sig_
- sx_sunlock_
- sx_try_slock_
- sx_try_xlock_
- sx_try_upgrade_
- sx_unlock_
- sx_xlock_
- sx_xlock_sig_
- sx_xunlock_

Now vm_map locking is fully converted and can avoid to know specifics
about locking procedures.
Reviewed by:	kib
MFC after:	1 month
2011-11-21 12:59:52 +00:00
pluknet
819239d461 Remove no more relevant XXXRW comments since accessing the vmspace is now
properly done with the acquired vmspace reference.

Pointed out by:		kib
2011-11-21 12:21:00 +00:00
pluknet
b652a652ff Use the acquired reference to the vmspace instead of direct dereferencing
of p->p_vmspace like it is done in sysctl_kern_proc_vmmap().
2011-11-21 10:36:57 +00:00
lstewart
cca3084242 - Add the ffclock_getcounter(), ffclock_getestimate() and ffclock_setestimate()
system calls to provide feed-forward clock management capabilities to
  userspace processes. ffclock_getcounter() returns the current value of the
  kernel's feed-forward clock counter. ffclock_getestimate() returns the current
  feed-forward clock parameter estimates and ffclock_setestimate() updates the
  feed-forward clock parameter estimates.

- Document the syscalls in the ffclock.2 man page.

- Regenerate the script-derived syscall related files.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 01:26:10 +00:00
attilio
6a69e947d3 Introduce macro stubs in the mutex implementation that will be always
defined and will allow consumers, willing to provide options, file and
line to locking requests, to not worry about options redefining the
interfaces.
This is typically useful when there is the need to build another
locking interface on top of the mutex one.

The introduced functions that consumers can use are:
- mtx_lock_flags_
- mtx_unlock_flags_
- mtx_lock_spin_flags_
- mtx_unlock_spin_flags_
- mtx_assert_
- thread_lock_flags_

Spare notes:
- Likely we can get rid of all the 'INVARIANTS' specification in the
  ppbus code by using the same macro as done in this patch (but this is
  left to the ppbus maintainer)
- all the other locking interfaces may require a similar cleanup, where
  the most notable case is sx which will allow a further cleanup of
  vm_map locking facilities
- The patch should be fully compatible with older branches, thus a MFC
  is previewed (infact it uses all the underlying mechanisms already
  present).

Comments review by:	eadler, Ben Kaduk
Discussed with:		kib, jhb
MFC after:	1 month
2011-11-20 16:33:09 +00:00
hselasky
bcd3918fb3 Given that the typical usage of pause() is pause("zzz", hz / N), where N can
be greater than hz in some cases, simply ignore a timeout value of zero.

Suggested by:	Bruce Evans
MFC after:	1 week
2011-11-20 08:36:18 +00:00
hselasky
09cd8e0f19 Minor style change:
Simplify the description of pause() and shorten the KASSERT message in pause.
Also add a clamp for the timo argument in the non-KASSERT case.

Suggested by:	Bruce Evans
MFC after:	1 week
2011-11-20 08:29:23 +00:00
lstewart
1c7932d870 - Provide a sysctl interface to change the active system clock at runtime.
- Wrap [get]{bin,nano,micro}[up]time() functions of sys/time.h to allow
  requesting time from either the feedback or the feed-forward clock. If a
  feedback (e.g. ntpd) and feed-forward (e.g. radclock) daemon are both running
  on the system, both kernel clocks are updated but only one serves time.

- Add similar wrappers for the feed-forward difference clock.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-20 05:32:12 +00:00
lstewart
603d3fe159 Provide high-level functions to access the feed-forward absolute and difference
clocks. Each routine can output an upper bound on the absolute time or time
interval requested. Different flavours of absolute time can be requested, for
example with or without leap seconds, monotonic or not, etc.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-20 01:20:50 +00:00
lstewart
751092ac03 Core structure and functions to support a feed-forward clock within the kernel.
Implement ffcounter, a monotonically increasing cumulative counter on top of the
active timecounter. Provide low-level functions to read the ffcounter and
convert it to absolute time or a time interval in seconds using the current
ffclock estimates, which track the drift of the oscillator. Add a ring of
fftimehands to track passing of time on each kernel tick and pick up updates of
ffclock estimates.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-19 14:10:16 +00:00
hselasky
1b8ad7ed8e Simplify the usb_pause_mtx() function by factoring out the generic parts
to the kernel's pause() function. The pause() function can now be used
when cold != 0. Also assert that the timeout in system ticks must be
positive.

Suggested by:	Bruce Evans
MFC after:	1 week
2011-11-19 11:17:27 +00:00
hselasky
3bcdb8772a Move the device_delete_all_children() function from usb_util.c
to kern/subr_bus.c. Simplify this function so that it no longer
depends on malloc() to execute. Identify a few other places where
it makes sense to use device_delete_all_children().

MFC after:	1 week
2011-11-19 10:11:50 +00:00
kib
36fd8d0106 Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs.  The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by:	pho
Reviewed by:	mckusick
MFC after:	3 weeks (subject of re approval)
2011-11-19 07:50:49 +00:00
ed
914a7cfed1 Regenerate system call tables. 2011-11-19 06:36:11 +00:00
ed
9cedd4d52c Improve *access*() parameter name consistency.
The current code mixes the use of `flags' and `mode'. This is a bit
confusing, since the faccessat() function as a `flag' parameter to store
the AT_ flag.

Make this less confusing by using the same name as used in the POSIX
specification -- `amode'.
2011-11-19 06:35:15 +00:00
np
2fa1481795 Do not increment the parent firmware's reference count when any other
firmware image in the module is registered.  Instead, do it when the
other image is itself referenced.

This allows a module with multiple firmware images to be automatically
unloaded when none of the firmware images are in use.

Discussed with:	jhb@ (on -hackers)
2011-11-19 00:20:28 +00:00
pho
97e40483cb Added check for negative seconds value. Found by syscall() fuzzing.
MFC after:	1 week
2011-11-18 19:14:42 +00:00
kib
b49a656854 Consistently use process spin lock for protection of the
p->p_boundary_count. Race could cause the execve(2) from the threaded
process to hung since thread boundary counter was incorrect and
single-threading never finished.

Reported by:	pluknet, pho
Tested by:	pho
MFC after:	1 week
2011-11-18 09:12:26 +00:00
kevlo
1a26b28a9b Add unicode support to msdosfs and smbfs; original pathes from imura,
bug fixes by Kuan-Chung Chiu <buganini at gmail dot com>.

Tested by me in production for several days at work.
2011-11-18 03:05:20 +00:00
pjd
a3e664d830 Constify arguments for locking KPIs where possible.
This enables locking consumers to pass their own structures around as const and
be able to assert locks embedded into those structures.

Reviewed by:	ed, kib, jhb
2011-11-16 21:51:17 +00:00
pjd
f01187d1ea Constify stack argument for functions that don't modify it.
Reviewed by:	ed, kib, jhb
2011-11-16 19:06:55 +00:00
marius
4b6458192f As it turns out, r186347 actually is insufficient to avoid the use of the
curthread-accessing part of mtx_{,un}lock(9) when using a r210623-style
curthread implementation on sparc64, crashing the kernel in its early
cycles as PCPU isn't set up, yet (and can't be set up as OFW is one of the
things we need for that, which leads to a chicken-and-egg problem). What
happens is that due to the fact that the idea of r210623 actually is to
allow the compiler to cache invocations of curthread, it factors out
obtaining curthread needed for both mtx_lock(9) and mtx_unlock(9) to
before the branch based on kobj_mutex_inited when compiling the kernel
without the debugging options. So change kobj_class_compile_static(9)
to just never acquire kobj_mtx, effectively restricting it to its
documented use, and add a kobj_init_static(9) for initializing objects
using a class compiled with the former and that also avoids using mutex(9)
(and malloc(9)). Also assert in both of these functions that they are
used in their intended way only.
While at it, inline kobj_register_method() and kobj_unregister_method()
as there wasn't much point for factoring them out in the first place
and so that a reader of the code has to figure out the locking for
fewer functions missing a KOBJ_ASSERT.
Tested on powerpc{,64} by andreast.

Reviewed by:	nwhitehorn (earlier version), jhb
MFC after:	3 days
2011-11-15 20:11:03 +00:00
obrien
def2613c2b Reformat comment to be more readable in standard Xterm.
(while I'm here, wrap other long lines)
2011-11-15 01:48:53 +00:00
rmh
ff5c11fefd Remove a few bits of FreeBSD 2.x compatibility code.
Approved by:	kib (mentor)
2011-11-14 18:21:27 +00:00
jhb
9315d9d42c - Split out a kern_posix_fadvise() from the posix_fadvise() system call so
it can be used by in-kernel consumers.
- Make kern_posix_fallocate() public.
- Use kern_posix_fadvise() and kern_posix_fallocate() to implement the
  freebsd32 wrappers for the two system calls.
2011-11-14 18:00:15 +00:00
alfred
fe73343e27 Constify args to copyiniov and copyinuio. 2011-11-14 07:12:10 +00:00
kib
ca682488f3 To limit amount of the kernel memory allocated, and to optimize the
iteration over the fdsets, kern_select() limits the length of the
fdsets copied in by the last valid file descriptor index. If any bit
is set in a mask above the limit, current implementation ignores the
filedescriptor, instead of returning EBADF.

Fix the issue by scanning the tails of fdset before entering the
select loop and returning EBADF if any bit above last valid
filedescriptor index is set. The performance impact of the additional
check is only imposed on the (somewhat) buggy applications that pass
bad file descriptors to select(2) or pselect(2).

PR:	kern/155606, kern/162379
Discussed with:	cognet, glebius
Tested by:	andreast (powerpc, all 64/32bit ABI combinations, big-endian),
       marius (sparc64, big-endian)
MFC after:    2 weeks
2011-11-13 10:28:01 +00:00
kib
a8b2cc359c Style.
MFC after:	1 week
2011-11-11 04:13:47 +00:00
kib
f3c70bd299 Guard against the unlikely case of the alias path containing the '%' symbols.
Reported by:	arundel
MFC after:	1 week
2011-11-11 04:12:58 +00:00
rstone
f6710005c7 Correct the types of the arguments to return probes of the syscall
provider.  Previously we were erroneously supplying the argument types of
the corresponding entry probe.

Reviewed by:	rpaulo
MFC after:	1 week
2011-11-11 03:49:42 +00:00
ed
3d017846c9 Simplify the code emitted by makeobjops.awk slightly.
Just place the default kobj_method inside the kobjop_desc structure.
There's no need to give these kobj_methods their own symbol. This shaves
off 10 KB of a GENERIC kernel binary.
2011-11-09 11:00:29 +00:00
ed
85414e79bd Make kobj_methods constant.
These structures hold no information that is modified during runtime. By
marking this constant, we see approximately 600 symbols become
read-only (amd64 GENERIC). While there, also mark the kobj_method
structures generated by makeobjops.awk static. They are only referenced
by the kobjop_desc structures within the same file.

Before:

	$ ls -l kernel
	-rwxr-xr-x  1 ed  wheel  15937309 Nov  8 16:29 kernel*
	$ size kernel
	    text    data     bss      dec    hex filename
	12260854 1358468 2848832 16468154 fb48ba kernel
	$ nm kernel | fgrep -c ' r '
	8240

After:

	$ ls -l kernel
	-rwxr-xr-x  1 ed  wheel  15922469 Nov  8 16:25 kernel*
	$ size kernel
	    text    data     bss      dec    hex filename
	12302869 1302660 2848704 16454233 fb1259 kernel
	$ nm kernel | fgrep -c ' r '
	8838
2011-11-08 15:38:21 +00:00
rstone
ae7b6414d5 The in-kernel CTF parser caches the result of its first attempt to parse
CTF data from a module.  On subsequent attempts to retrieve CTF data for
a module, return an error if there no CTF data.

This fixes a panic if you try to enable fbt probes on a module with CTF
data twice.

Submitted by:	Paul Ambrose (ambrosehua AT gmail DOT com)
MFC after:	3 days
2011-11-08 15:17:54 +00:00
attilio
8e918ec439 Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by:	gianni
Reviewed by:	kib
2011-11-08 10:18:07 +00:00
trociny
9cd1f9add2 Add KVME_FLAG_SUPER and use it in sysctl_kern_proc_vmmap for marking
entries with superpages.

Submitted by:	Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net>
Reviewed by:	alc, rwatson
2011-11-07 21:13:19 +00:00
trociny
f25b803473 In lim_fork() assert that processes locks are held.
Suggested by:	kib
2011-11-07 21:09:04 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
ed
e97eae1577 Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
fjoe
afccaaff1c Add KLD_DEBUG option. 2011-11-06 08:10:41 +00:00
jhb
767a02dc14 Regen. 2011-11-04 04:06:31 +00:00
jhb
78c075174e Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region.  It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types.  The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE().  These modes are
thus filesystem independent.  Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used.  These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request.  This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf().  This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode.  This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue.  The second change adds
a new vm_object_page_cache() method.  This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by:	jilles, kib, arch@
Approved by:	re (kib)
MFC after:	1 month
2011-11-04 04:02:50 +00:00
jhb
1e2d8c9d67 Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by:	kib
MFC after:	1 week
2011-11-04 03:39:31 +00:00
attilio
90682e14db Disable interrupt and preemption for smp_rendezvous() also in the
UP/!SMP case.
The callbacks may be relying on this feature and having 2 different
ways to deal with them is not correct.

Reported by:	rstone
Reviewed by:	jhb
MFC after:	2 weeks
2011-11-03 14:36:56 +00:00
marcel
11d8234b97 Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.
2011-10-30 02:19:39 +00:00