Commit Graph

11874 Commits

Author SHA1 Message Date
avg
1f20beb47f debug.kdb.stop_cpus sysctl: hint that this is also a tunable
MFC after:	1 week
2010-09-30 16:47:01 +00:00
avg
54a87db1fb kmem_size* sysctls: hint that these are also tunables
MFC after:	1 week
2010-09-30 16:45:27 +00:00
davidxu
6580ce86ea - kern_sched_rr_get_interval should return interval for thread 1 in
target process.
- eliminate a goto.

MFC after: 1 week
2010-09-29 07:31:05 +00:00
imp
4087eace5d This file has been unused for ages. Retire it.
Submitted by:	pluknet
2010-09-28 15:33:30 +00:00
emaste
2d28788fe8 Remove extra braces for style(9) (found while cleaning up an old work tree). 2010-09-28 01:36:01 +00:00
avg
9864877541 kdb_backtrace: use stack_print_ddb instead of stack_print
This is a followup to r212964.
stack_print call chain obtains linker sx lock and thus potentially may
lead to a deadlock depending on a kind of a panic.
stack_print_ddb doesn't acquire any locks and it doesn't use any
facilities of ddb backend.
Using stack_print_ddb outside of DDB ifdef required taking a number of
helper functions from under it as well.

It is a good idea to rename linker_ddb_* and stack_*_ddb functions to
have 'unlocked' component in their name instead of 'ddb', because those
functions do not use any DDB services, but instead they provide unlocked
access to linker symbol information.  The latter was previously needed
only for DDB, hence the 'ddb' name component.

Alternative is to ditch unlocked versions altogether after implementing
proper panic handling:
1. stop other cpus upon a panic
2. make all non-spinlock lock operations (mutex, sx, rwlock) be a no-op
   when panicstr != NULL

Suggested by:	mdf
Discussed with:	attilio
MFC after:	2 weeks
2010-09-22 06:45:07 +00:00
mav
351da3e73c If kernel built with DEVICE_POLLING, keep one CPU always in active state
to handle it.
2010-09-22 05:32:37 +00:00
jhb
e350ad7930 Comment nit, set TDF_NEEDRESCHED after the comment describing why it is
done rather than before.

MFC after:	1 week
2010-09-21 19:12:22 +00:00
mav
10e1b075c5 If new callout scheduled to another CPU and we are using global timer,
there is high probability that timer is already programmed by some other
CPU. Especially by one that registered this callout, and so active now.
2010-09-21 17:37:28 +00:00
mav
e7b0e3848a Remember last kern.eventtimer.periodic value, explicitly set by user.
If timer capabilities forcing us to change periodicity mode, try to restore
it back later, as soon as new choosen timer capable to do it. Without this,
timer change like HPET->RTC->HPET always results in enabling periodic mode.
2010-09-21 16:50:24 +00:00
alc
524cb00f17 Fix exec_imgact_shell()'s handling of two error cases: (1) Previously, if
the first line of a script exceeded MAXSHELLCMDLEN characters, then
exec_imgact_shell() silently truncated the line and passed on the truncated
interpreter name or argument.  Now, exec_imgact_shell() will fail and return
ENOEXEC, which is the commonly used errno among Unix variants for this type
of error. (2) Previously, exec_imgact_shell()'s check on the length of the
interpreter's name was ineffective.  In other words, exec_imgact_shell()
could not possibly fail and return ENAMETOOLONG.  The reason being that the
length of the interpreter name had to exceed MAXSHELLCMDLEN characters in
order that ENAMETOOLONG be returned.  But, the search for the end of the
interpreter name stops after at most MAXSHELLCMDLEN - 2 characters are
scanned.  (In the end, this particular error is eventually discovered
outside of exec_imgact_shell() and ENAMETOOLONG is returned.  So, the real
effect of this second change is that the error is detected earlier, in
exec_imgact_shell().)

Update the definition of MAXINTERP to the actual limit on the size of
the interpreter name that has been in effect since r142453 (from
2005).

In collaboration with: kib
2010-09-21 16:24:51 +00:00
avg
fe208ba095 kdb_backtrace: stack(9)-based code to print backtrace without any backend
The idea is to add KDB and KDB_TRACE options to GENERIC kernels on
stable branches, so that at least the minimal information is produced
for non-specific panics like traps on page faults.
The GENERICs in stable branches seem to already include STACK option.

Reviewed by:	attilio
MFC after:	2 weeks
2010-09-21 15:07:44 +00:00
mav
16369ea8b2 Until hardclock() and respectively tc_windup() called first time, system
is running on "dummy" time counter. But to function properly in one-shot
mode, event timer management code requires working time counter. Slow
moving "dummy" time counter delays first hardclock() call by few seconds
on my systems, even though timer interrupts were correctly kicking kernel.
That causes few seconds delay during boot with one-shot mode enabled.

To break this loop, explicitly call tc_windup() first time during
initialization process to let it switch to some real time counter.
2010-09-21 08:02:02 +00:00
trasz
3e2d23f909 First step at adopting FreeBSD to support PSARC/2010/029. This makes
acl_is_trivial_np(3) properly recognize the new trivial ACLs.  From
the user point of view, that means "ls -l" no longer shows plus signs
for all the files when running ZFS v28.
2010-09-20 17:10:06 +00:00
ed
a67dfa17fa Just make callout devices and /dev/console force CLOCAL on open().
Instead of adding custom checks to wait for DCD on open(), just modify
the termios structure to set CLOCAL. This means SIGHUP is no longer
generated when losing DCD as well.

Reviewed by:	kib@
MFC after:	1 week
2010-09-19 16:35:42 +00:00
ed
99ba5ac113 Ignore DCD handling on /dev/console entirely.
This makes /dev/console more fail-safe and prevents a potential console
lock-up during boot.

Discussed on:	stable@
Tested by:	koitsu@
MFC after:	1 week
2010-09-19 14:21:39 +00:00
rwatson
b9d3291981 With reworking of the socket life cycle in 7.x, the need for a "sotryfree()"
was eliminated: all references to sockets are explicitly managed by sorele()
and the protocols.  As such, garbage collect sotryfree(), and update
sofree() comments to make the new world order more clear.

MFC after:	3 days
Reported by:	Anuranjan Shukla <anshukla at juniper dot net>
2010-09-18 11:18:42 +00:00
avg
6fb9e57674 kern.sched.topology_spec sysctl: use step of 1 for group levels numeration
This is just a cosmetic change for prettier output.
'indent' variable/parameter serves two purposes: it specifies whitespace
indentation level and also implies cpu group level/depth.
It would have been better to split those two uses,
but for now just a simple change.

MFC after:	1 week
2010-09-18 11:16:43 +00:00
mav
a168297469 When global timer used at SMP system, update nextevent field on BSP before
sending IPI to other CPUs. Otherwise, other CPUs will try to honor stale
value, programming timer for zero interval. If timer is fast enough,
it caused extra interrupt before timer correctly reprogrammed by BSP.
2010-09-18 07:18:30 +00:00
imp
1ffadf8af3 By popular demand, kill all the non GIANT related interrupt messages.
They are confusing and add little value.

Reviewed by:	jhb@
2010-09-17 16:05:25 +00:00
mdf
5695ef4698 Re-add r212370 now that the LOR in powerpc64 has been resolved:
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte.  This behaviour was preserved, though it should not be
necessary.

Reviewed by:    phk (original patch)
2010-09-16 16:13:12 +00:00
mav
6eed5acb73 Fix panic on NULL dereference possible after r212541. 2010-09-14 10:26:49 +00:00
mav
6c05aa4db6 Make kern_tc.c provide minimum frequency of tc_ticktock() calls, required
to handle current timecounter wraps. Make kern_clocksource.c to honor that
requirement, scheduling sleeps on first CPU for no more then specified
period. Allow other CPUs to sleep up to 1/4 second (for any case).
2010-09-14 08:48:06 +00:00
mav
5864d6e457 Replace spin lock with the set of atomics. It is impractical for one
tc_ticktock() call to wait for another's completion -- just skip it.
2010-09-14 04:57:30 +00:00
mav
5f7bd119f7 Add some foot shooting protection by checking singlemul value correctness.
Rephrase sysctls descriptions.

Suggested by:	edmaste
2010-09-14 04:48:04 +00:00
mdf
3ed6eac561 Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn
2010-09-13 18:48:23 +00:00
avg
ab04d6fe3f bus_add_child: add specialized default implementation that calls panic
If a kobj method doesn't have any explicitly provided default
implementation, then it is auto-assigned kobj_error_method.
kobj_error_method is proper only for methods that return error code,
because it just returns ENXIO.
So, in the case of unimplemented bus_add_child caller would get
(device_t)ENXIO as a return value, which would cause the mistake to go
unnoticed, because return value is typically checked for NULL.
Thus, a specialized null_add_child is added.  It would have sufficied
for correctness to return NULL, but this type of mistake was deemed to
be rare and serious enough to call panic instead.

Watch out for this kind of problem with other kobj methods.

Suggested by:	jhb, imp
MFC after:	2 weeks
2010-09-13 08:34:20 +00:00
mav
eb4931dc6c Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
  kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
  kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
  kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
  kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by:	many (on i386, amd64, sparc64 and powerc)
H/W donated by:	Gheorghe Ardelean
Sponsored by:	iXsystems, Inc.
2010-09-13 07:25:35 +00:00
mav
29989e4d9c Do not print "frequency 0 Hz", when frequency is unknown. 2010-09-11 20:18:15 +00:00
kan
a47d3fbf19 Add missing pointer increment to sbuf_cat. 2010-09-11 19:42:50 +00:00
kib
107ea66c07 Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak
when mount and update are executed in parallel.

Encapsulate syncer vnode deallocation into the helper function
vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c.

Found and reviewed by:	jh (previous version of the patch)
Tested by:	pho
MFC after:	3 weeks
2010-09-11 13:06:06 +00:00
mav
90db957786 Merge some SCHED_ULE features to SCHED_4BSD:
- Teach SCHED_4BSD to inform cpu_idle() about high sleep/wakeup rate to
choose optimized handler. In case of x86 it is MONITOR/MWAIT. Also it
will be needed to bypass forthcoming idle tick skipping logic to not
consume resources on events rescheduling when it won't give any benefits.
- Teach SCHED_4BSD to wake up idle CPUs without using IPI. In case of x86,
when MONITOR/MWAIT is active, it require just single memory write. This
doubles performance on some heavily switching test loads.
2010-09-11 07:08:22 +00:00
jamie
5a233127aa Don't exit kern_jail_set without freeing options when enforce_statfs
has an illegal value.

MFC after:	3 days
2010-09-10 21:45:42 +00:00
mdf
ab3a8b533a Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
mav
aa2a743453 Do not IPI CPU that is already spinning for load. It doubles effect of
spining (comparing to MWAIT) on some heavly switching test loads.
2010-09-10 13:24:47 +00:00
avg
c9fe8ad7f0 bus_add_child: change type of order parameter to u_int
This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to:	r212213
MFC after:	10 days
2010-09-10 11:19:03 +00:00
mdf
bc54684253 Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte.  This behaviour was preserved, though it should not be necessary.

Reviewed by:	phk
2010-09-09 18:33:46 +00:00
mdf
73d2d3f18e Add drain functionality to sbufs. The drain is a function that is
called when the sbuf internal buffer is filled.  For kernel sbufs with a
drain, the internal buffer will never be expanded.  For userland sbufs
with a drain, the internal buffer may still be expanded by
sbuf_[v]printf(3).

Sbufs now have three basic uses:
1) static string manipulation.  Overflow is marked.
2) dynamic string manipulation.  Overflow triggers string growth.
3) drained string manipulation.  Overflow triggers draining.

In all cases the manipulation is 'safe' in that overflow is detected and
managed.

Reviewed by:	phk (the previous version)
2010-09-09 17:49:18 +00:00
mdf
3b526ad3fb Refactor sbuf code so that most uses of sbuf_extend() are in a new
sbuf_put_byte().  This makes it easier to add drain functionality when a
buffer would overflow as there are fewer code points.

Reviewed by:	phk
2010-09-09 16:51:52 +00:00
rpaulo
a2f2f93652 Fix two bugs in DTrace:
* when the process exits, remove the associated USDT probes
* when the process forks, duplicate the USDT probes.

Sponsored by:	The FreeBSD Foundation
2010-09-09 09:58:05 +00:00
pjd
621aa135f8 Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure. 2010-09-09 07:55:13 +00:00
pjd
cb66a2a961 Doing first mount and updating mount points are both handled by the same
syscall and the same function, but are very different and share almost no code.
To make it easier to read and analyze, split vfs_domount() into
vfs_domount_first() and vfs_domount_update().

Reviewed by:	kib
2010-09-08 21:00:53 +00:00
pjd
86d6e6cf79 - Log all the problems in devfs_fixup().
- Correct error paths. The system will be useless on devfs_fixup() failure, so
  why bother?  Maybe for the same reason why a dead body is washed and dressed
  in a nice suit before it is put into a coffin? Maybe system's last will is to
  panic without any locks held?

Reviewed by:	kib
2010-09-08 20:56:18 +00:00
avg
ef26efcc43 subr_bus: use hexadecimal representation for bit flags
It seems that this format is more custom in our code, and it is more
convenient too.

Suggested by:	jhb
No objection:	imp
MFC after:	1 week
2010-09-08 17:35:06 +00:00
tuexen
908a61906a Implement correct handling of address parameter and
sendinfo for SCTP send calls.

MFC after: 4 weeks.
2010-09-05 20:13:07 +00:00
mav
4f9dee93a3 Initialize buffer for case of empty string. Happens only on non-refactored
platforms.
2010-09-05 06:16:04 +00:00
avg
f8094d8ba9 struct device: widen type of flags and order fields to u_int
Also change int -> u_int for order parameter in device_add_child_ordered.
There should not be any ABI change as struct device is private to subr_bus.c
and the API change should be compatible.

To do: change int -> u_int for order parameter of bus_add_child method
and its implementations.  The change should also be API compatible, but
is a bit more churn.

Suggested by:	imp, jhb
MFC after:	1 week
2010-09-04 17:28:29 +00:00
mdf
62a144da37 Use a better #if guard.
Suggested by pluknet <pluknet at gmail dot com>.
2010-09-03 17:42:17 +00:00
mdf
253bb8c9c3 Style(9) fixes and eliminate the use of min(). 2010-09-03 17:42:12 +00:00
mdf
9b6269741b Fix user-space libsbuf build. Why isn't CTASSERT available to
user-space?
2010-09-03 17:23:26 +00:00