12439 Commits

Author SHA1 Message Date
trociny
d3234bb2c0 Fix style and white spaces.
MFC after:	1 week
2011-12-17 22:18:26 +00:00
trociny
fe0982e586 On start most of sysctl_kern_proc functions use the same pattern:
locate a process calling pfind() and do some additional checks like
p_candebug(). To reduce this code duplication a new function pget() is
introduced and used.

As the function may be useful not only in kern_proc.c it is in the
kernel name space.

Suggested by:	kib
Reviewed by:	kib
MFC after:	2 weeks
2011-12-17 16:59:22 +00:00
avg
34b585c853 belatedly transfer copyrights from libkern/gets.c to kern_cons.c
MFC after:	2 months
MFC with:	r228642
2011-12-17 15:50:45 +00:00
avg
abb1713421 replace uses of libkern gets with cngets
MFC after:	2 months
2011-12-17 15:26:34 +00:00
avg
53f09b7daf introduce cngets, a method for kernel to read a string from console
This is intended as a replacement for libkern's gets and mostly borrows
its implementation.  It uses cngrab/cnungrab to delimit kernel's access
to console input.

Note: libkern's gets obviously doesn't share any bits of implementation
iwth libc's gets.  They also have different APIs and the former doesn't
have the overflow problems of the latter.

Inspired by:	bde
MFC after:	2 months
2011-12-17 15:16:54 +00:00
avg
d062f5e7d8 introduce cngrab/cnungrab stub calls in some places where they make sense
MFC after:	2 months
2011-12-17 15:11:22 +00:00
avg
f6def40e18 kern cons: introduce infrastructure for console grabbing by kernel
At the moment grab and ungrab methods of all console drivers are no-ops.

Current intended meaning of the calls is that the kernel takes control of
console input.  In the future the semantics may be extended to mean that
the calling thread takes full ownership of the console (e.g. console
output from other threads could be suspended).

Inspired by:	bde
MFC after:	2 months
2011-12-17 15:08:43 +00:00
jhb
fa881e0cf4 Fire a kevent if necessary after seeking on a regular file. This fixes a
case where a kevent would not fire on a regular file if an application read
to EOF and then seeked backwards into the file.

Reviewed by:	kib
MFC after:	2 weeks
2011-12-16 20:10:00 +00:00
jhb
6d2ab3b363 Use vm_mmap_to_errno().
Submitted by:	kib
2011-12-15 15:17:19 +00:00
jilles
5200c1964b Fix select/poll/kqueue for write on reverse direction before first write.
The reverse direction of a pipe is lazily allocated on the first write in
that direction (because pipes are usually used in one direction only).  A
special case is needed to ensure the pipe appears writable before the first
write because there are 0 bytes of pending data in 0 bytes of buffer space
at that point, leaving 0 bytes of data that can be written with the normal
code.

Note that the first write returns [ENOMEM] if kern.ipc.maxpipekva is
exceeded and does not block or return [EAGAIN], so selecting true for write
is correct even in that case.

PR:		kern/93685
Submitted by:	gianni
MFC after:	2 weeks
2011-12-14 22:26:39 +00:00
jhb
6d1299a388 Add a helper API to allow in-kernel code to map portions of shared memory
objects created by shm_open(2) into the kernel's address space.  This
provides a convenient way for creating shared memory buffers between
userland and the kernel without requiring custom character devices.
2011-12-14 22:22:19 +00:00
obrien
0a772623db Match other formatting. 2011-12-14 02:31:32 +00:00
obrien
47a0230f28 Disallow various debug.kdb sysctl's when securelevel is raised.
PR:	161350
2011-12-13 17:59:16 +00:00
eadler
036b3a534b - Add a sysctl to allow non-root users the ability to set idle
priorities.

- While here fix up some style nits.

Discussed with: cperciva (breifly)
Reviewed by:	pjd (earlier version)
Reviewed by:	bde
Approved by:	jhb
MFC after:	1 month
2011-12-13 14:00:27 +00:00
eadler
3072a90209 Document a large number of currently undocumented sysctls. While here
fix some style(9) issues and reduce redundancy.

PR:		kern/155491
PR:		kern/155490
PR:		kern/155489
Submitted by:	Galimov Albert <wtfcrap@mail.ru>
Approved by:	bde
Reviewed by:	jhb
MFC after:	1 week
2011-12-13 00:38:50 +00:00
avg
7d367dba13 put sys/systm.h at its proper place or add it if missing
Reported by:	lstewart, tinderbox
Pointyhat to:	avg, attilio
MFC after:	1 week
MFC with:	r228430
2011-12-12 10:05:13 +00:00
avg
5fd0e7aabd kern_racct: move sys/systm.h inclusion to its proper place
This should fix the build failure introduced with r228424.
Also remove duplicate inclusion of sys/param.h.

Pointyhat to:	avg
MFC after:	1 week
2011-12-12 07:46:10 +00:00
avg
75ddaeae80 panic: add a switch and infrastructure for stopping other CPUs in SMP case
Historical behavior of letting other CPUs merily go on is a default for
time being.  The new behavior can be switched on via
kern.stop_scheduler_on_panic tunable and sysctl.

Stopping of the CPUs has (at least) the following benefits:
- more of the system state at panic time is preserved intact
- threads and interrupts do not interfere with dumping of the system
  state

Only one thread runs uninterrupted after panic if stop_scheduler_on_panic
is set.  That thread might call code that is also used in normal context
and that code might use locks to prevent concurrent execution of certain
parts.  Those locks might be held by the stopped threads and would never
be released.  To work around this issue, it was decided that instead of
explicit checks for panic context, we would rather put those checks
inside the locking primitives.

This change has substantial portions written and re-written by attilio
and kib at various times.  Other changes are heavily based on the ideas
and patches submitted by jhb and mdf.  bde has provided many insights
into the details and history of the current code.

The new behavior may cause problems for systems that use a USB keyboard
for interfacing with system console.  This is because of some unusual
locking patterns in the ukbd code which have to be used because on one
hand ukbd is below syscons, but on the other hand it has to interface
with other usb code that uses regular mutexes/Giant for its concurrency
protection.  Dumping to USB-connected disks may also be affected.

PR:			amd64/139614 (at least)
In cooperation with:	attilio, jhb, kib, mdf
Discussed with:		arch@, bde
Tested by:		Eugene Grosbein <eugen@grosbein.net>,
			gnn,
			Steven Hartland <killing@multiplay.co.uk>,
			glebius,
			Andrew Boyer <aboyer@averesystems.com>
			(various versions of the patch)
MFC after:		3 months (or never)
2011-12-11 21:02:01 +00:00
pho
109ecfbeb1 Move cpu_set_upcall(newtd, td) up before the first call of
thread_free(newtd).  This to avoid a possible page fault in
cpu_thread_clean() as seen on amd64 with syscall fuzzing.

Reviewed by:	kib
MFC after:	1 week
2011-12-09 17:19:41 +00:00
eadler
bc26bd381f - Fix ktrace leakage if error is set
PR:		kern/163098
Submitted by:	Loganaden Velvindron <loganaden@devio.us>
Approved by:	sbruno@
MFC after:	1 month
2011-12-08 03:20:38 +00:00
alc
af2fd888b2 Eliminate stale numbers from a comment. 2011-12-07 16:27:23 +00:00
alc
93edaa86af Eliminate the possibility of 32-bit arithmetic overflow in the calculation
of vm_kmem_size that may occur if the system administrator has specified a
vm.vm_kmem_size tunable value that exceeds the hard cap.

PR:		162741
Submitted by:	Adam McDougall
Reviewed by:	bde@
MFC after:	3 weeks
2011-12-07 07:03:14 +00:00
kib
12e8232d4d Most users of pipe(2) do not call fstat(2) on the returned pipe descriptors.
Optimize for the case, by lazily allocating the pipe inode number at the
fstat(2) time. If alloc_unr(9) returns failure, do not fail fstat(2), since
uses of inode numbers are even rare then fstat(2), but provide zero inode
forever. Note that alloc_unr() failure is unlikely due to total number
of pipes in the system limited by the number of file descriptors.

Based on the submission by:	gianni
MFC after:	2 weeks
2011-12-06 11:24:03 +00:00
trociny
1216d4d43b Really protect kern.proc.ps_strings sysctls with p_candebug(). This
was intended to be in r228288.

Spotted by:	many
MFC after:	1 week
2011-12-06 06:40:14 +00:00
trociny
b9b4d81b8d Protect kern.proc.auxv and kern.proc.ps_strings sysctls with p_candebug().
Citing jilles:

If we are ever going to do ASLR, the AUXV information tells an attacker
where the stack, executable and RTLD are located, which defeats much of
the point of randomizing the addresses in the first place.

Given that the AUXV information seems to be used by debuggers only anyway,
I think it would be good to move it to p_candebug() now.

The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already
under p_candebug().

Suggested by:	jilles
Discussed with:	rwatson
MFC after:	1 week
2011-12-05 19:34:02 +00:00
kevlo
07781bc7da Add a missing curly bracket 2011-12-05 10:34:52 +00:00
avg
c3db2f30bb critical_exit: ignore td_owepreempt if kdb_active is set
calling mi_switch in such a context results in a recursion via
kdb_switch

Suggested by:	jhb
Reviewed by:	jhb
MFC after:	5 weeks
2011-12-04 21:27:41 +00:00
trociny
b7c0f10867 In sysctl_kern_proc_ps_strings() there is no much sense in checking
for P_WEXIT and P_SYSTEM flags.

Reviewed by:	kib
2011-12-04 21:24:01 +00:00
hselasky
5829cbdab5 Make sure the description of pause() is
equivalent to its implementation.
No code change.

Suggested by:	Bruce Evans
MFC after:	3 days
2011-12-03 15:51:15 +00:00
eadler
3be1835ce6 - Fix typos s/(more|less) then|\1 than/
Submitted by:	Davide Italiano <davide.italiano@gmail.com>
Approved by:	brucec
MFC after:	3 days
2011-12-03 15:41:37 +00:00
pho
cbbd4e13db Use umtx_copyin_timeout() to copy and check timeout parameter.
In collaboration with:	kib
MFC after:	1 week
2011-12-03 12:35:13 +00:00
pho
7195560d66 Add umtx_copyin_timeout() and move parameter checks here.
In collaboration with:	kib
MFC after:	1 week
2011-12-03 12:30:58 +00:00
pho
4a62c2f04c Rename copyin_timeout32 to umtx_copyin_timeout32 and move parameter
check here. Include check for negative seconds value.

In collaboration with:	kib
MFC after:	1 week
2011-12-03 12:28:33 +00:00
marius
c1dda66820 It doesn't make much sense to check whether child is NULL after already
having dereferenced it. We either should generally check the device_t's
supplied to bus functions before using them (which we seem to virtually
never do) or just assume that they are not NULL.
While at it make this code fit 78 columns.

Found with:	Coverity Prevent(tm)
CID:		4230
2011-12-02 22:03:27 +00:00
marius
7b5b9bafe6 - In device_probe_child(9) check the return value of device_set_driver(9)
when actually setting a driver as especially ENOMEM is fatal in these
  cases.
- Annotate other calls to device_set_devclass(9) and device_set_driver(9)
  without the return value being checked and that are okay to fail.

Reviewed by:	yongari (slightly earlier version)
2011-12-02 21:19:14 +00:00
jhb
dd3857eae1 When changing the user priority of a thread, change the real priority
in addition to the user priority for threads whose current real priority
is equal to the previous user priority or if the new priority is a
real-time priority.  This allows priority changes of other threads to
have an immediate effect.

MFC after:	2 weeks
2011-12-02 19:59:46 +00:00
kib
132ad7aa9b If alloc_unr() call in the pipe_create() failed, then pipe->pipe_ino is
-1. But, because ino_t is unsigned, this case was not covered by the
test ino > 0 in pipeclose(), leading to the free_unr(-1). Fix it by
explicitely comparing with 0 and -1. [1]

Do no access freed memory, the inode number was cached to prevent access
to cpipe after it possibly was freed, but I failed to commit the right
patch.

Noted by:	gianni [1]
Pointy hat to:	kib
MFC after:	3 days
2011-12-01 11:36:41 +00:00
lstewart
15250e8f2c Revise the sysctl handling code and restructure the hierarchy of sysctls
introduced when feed-forward clock support is enabled in the kernel:

- Rename the "choice" variable to "available".

- Streamline the implementation of the "active" variable's sysctl handler
  function.

- Create a kern.sysclock sysctl node for general sysclock related configuration
  options. Place the "available" and "active" variables under this node.

- Create a kern.sysclock.ffclock sysctl node for feed-forward clock specific
  configuration options. Place the "version" and "ffcounter_bypass" variables
  under this node.

- Tweak some of the description strings.

Discussed with:	Julien Ridoux (jridoux at unimelb edu au)
2011-12-01 07:19:13 +00:00
kib
d326d5565d Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by:	attilio, alc
2011-11-30 17:39:00 +00:00
lstewart
81cb526f0e Make sysclock_active publicly available to external consumers.
Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Discussed with:	Julien Ridoux (jridoux at unimelb edu au)
Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-29 08:43:04 +00:00
lstewart
58cb09352f Do away with the somewhat clunky sysclock_ops structure and associated code,
reimplementing the [get]{bin,nano,micro}[up]time() wrapper functions in terms of
the new "fromclock" API instead.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Discussed with:	Julien Ridoux (jridoux at unimelb edu au)
Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-29 08:33:40 +00:00
lstewart
d76140d56b Make the fbclock_[get]{bin,nano,micro}[up]time() function prototypes public so
that new APIs with some performance sensitivity can be built on top of them.
These functions should not be called directly except in special circumstances.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Discussed with:	Julien Ridoux (jridoux at unimelb edu au)
Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-29 06:53:36 +00:00
lstewart
f039559048 Fix an oversight in r227747 by calling fbclock_bin{up}time() directly from the
fbclock_{nanouptime|microuptime|bintime|nanotime|microtime}() functions to avoid
indirecting through a sysclock_ops wrapper function.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-29 06:12:19 +00:00
trociny
a95ae73e49 Add sysctl to retrieve ps_strings structure location of another process.
Suggested by:	kib
Reviewed by:	kib
2011-11-27 17:05:26 +00:00
trociny
c4d555a317 In sysctl_kern_proc_auxv the process was released too early: we still
need to hold it when checking process sv_flags.

MFC after:	2 weeks
2011-11-27 16:56:01 +00:00
lstewart
ba968938a8 Export the "ffclock" feature for kernels compiled with feed-forward clock
support.

Suggested by:	netchild
Reviewed by:	netchild
2011-11-26 01:44:37 +00:00
trociny
7ca3e358b8 Add sysctl to get process resource limits.
Reviewed by:	kib
MFC after:	2 weeks
2011-11-24 20:43:37 +00:00
kib
6ecd4a2bb2 Fix a race between getvnode() dereferencing half-constructed file
and dupfdopen().

Reported and tested by:	pho
MFC after:	3 days
2011-11-24 20:34:06 +00:00
trociny
878f4f16e9 Fix build without INVARIANTS.
Discussed with:	kib
2011-11-23 08:11:04 +00:00
hselasky
53a216b722 Rename device_delete_all_children() into device_delete_children().
Suggested by:	jhb @ and marius @
MFC after:	1 week
2011-11-22 21:56:55 +00:00