This introduce all the underlying support for making this possible (via
the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64
implements its own assembly primitives for tracing events and needs to
properly check it. Anyway the sparc64 logic is not implemented yet due
to lack of knowledge (by me) and time (by marius), but it is just a
matter of using ktr_cpumask when possible.
Tested and fixed by: pluknet
Reviewed by: marius
While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix
scrambled console output, the message buffer and syslog were still getting
log messages one character at a time. While all of the characters still
made it into the log (courtesy of atomic operations), they were often
interleaved when there were multiple threads writing to the buffer at the
same time.
This fixes message buffer accesses to use buffering logic as well, so that
strings that are less than PRINTF_BUFR_SIZE will be put into the message
buffer atomically. So now dmesg output should look the same as console
output.
subr_msgbuf.c: Convert most message buffer calls to use a new spin
lock instead of atomic variables in some places.
Add a new routine, msgbuf_addstr(), that adds a
NUL-terminated string to a message buffer. This
takes a priority argument, which allows us to
eliminate some races (at least in the the string
at a time case) that are present in the
implementation of msglogchar(). (dangling and
lastpri are static variables, and are subject to
races when multiple callers are present.)
msgbuf_addstr() also allows the caller to request
that carriage returns be stripped out of the
string. This matches the behavior of msglogchar(),
but in testing so far it doesn't appear that any
newlines are being stripped out. So the carriage
return removal functionality may be a candidate
for removal later on if further analysis shows
that it isn't necessary.
subr_prf.c: Add a new msglogstr() routine that calls
msgbuf_logstr().
Rename putcons() to putbuf(). This now handles
buffered output to the message log as well as
the console. Also, remove the logic in putcons()
(now putbuf()) that added a carriage return before
a newline. The console path was the only path that
needed it, and cnputc() (called by cnputs())
already adds a carriage return. So this
duplication resulted in kernel-generated console
output lines ending in '\r''\r''\n'.
Refactor putchar() to handle the new buffering
scheme.
Add buffering to log().
Change log_console() to use msglogstr() instead of
msglogchar(). Don't add extra newlines by default
in log_console(). Hide that behavior behind a
tunable/sysctl (kern.log_console_add_linefeed) for
those who would like the old behavior. The old
behavior led to the insertion of extra newlines
for log output for programs that print out a
string, and then a trailing newline on a separate
write. (This is visible with dmesg -a.)
msgbuf.h: Add a prototype for msgbuf_addstr().
Add three new fields to struct msgbuf, msg_needsnl,
msg_lastpri and msg_lock. The first two are needed
for log message functionality previously handled
by msglogchar(). (Which is still active if
buffering isn't enabled.)
Include sys/lock.h and sys/mutex.h for the new
mutex.
Reviewed by: gibbs
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.
Reviewed by: jhb
larger than the receive buffer, we have to receive in sections.
When notifying the protocol that some data has been drained the
lock is released for a moment. Returning we block waiting for the
rest of data. There is a race, when data could arrive while the
lock was released and then the connection stalls in sbwait.
Fix this by checking for data before blocking and skip blocking
if there are some.
PR: kern/154504
Reported by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Tested by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Reviewed by: rwatson
Approved by: kib (co-mentor)
MFC after: 2 weeks
Specifically, a critical_exit() call that drops the nesting level to zero
has a brief window where the pending preemption flag is set and the
nesting level is set to zero. This is done purposefully to avoid races
where a preemption scheduled by an interrupt could be lost otherwise (see
revision 144777). However, this does mean that if an interrupt fires
during this window and enters and exits a critical section, it may preempt
from the interrupt context. This is generally fine as the interrupt code
is careful to arrange critical sections so that they are not exited until
it is safe to preempt (e.g. interrupts EOI'd and masked if necessary).
However, the SMP rendezvous IPI handler does not quite follow this rule,
and in general a rendezvous can never be preempted. Rendezvous handlers
are also not permitted to schedule threads to execute, so they will not
typically trigger preemptions. SMP rendezvous handlers may use
spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks,
but using a spinlock also enters and exits a critical section. If the
interrupted top-half code is in the brief window of critical_exit() where
the nesting level is zero but a preemption is pending, then releasing the
spinlock can trigger a preemption. Because we know that SMP rendezvous
handlers can never schedule a thread, we know that a critical_exit() in
an SMP rendezvous handler will only preempt in this edge case. We also
know that the top-half thread will happily handle the deferred preemption
once the SMP rendezvous has completed, so the preemption will not be lost.
This makes it safe to employ a workaround where we use a nested critical
section in the SMP rendezvous code itself around rendezvous action
routines to prevent any preemptions during an SMP rendezvous. The
workaround intentionally avoids checking for a deferred preemption
when leaving the critical section on the assumption that if there is a
pending preemption it will be handled by the interrupted top-half code.
Submitted by: mlaier (variation specific to rm_cleanIPI())
Obtained from: Isilon
MFC after: 1 week
now the preferred typical return value from a probe routine. Discourage
the use of 0 (BUS_PROBE_SPECIFIC) as it should be used very rarely.
Point the reader to the DEVICE_PROBE(9) manpage for more detailed notes
on possible probe return values.
Submitted by: Philip Soeberg philip-dev of soeberg net
least significant cpuset_t word at the outmost right part of the string
(more far from the beginning of it). This follows the natural build of
bits rappresentation in the words.
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
and destroy_devl() drops dev_mtx. The protection against the race
with dev_rel(), introduced in r163328, should be extended to cover
destroy_devl() calls for the children of the destroyed dev.
Reported and tested by: joerg
MFC after: 1 week
- Remove the following sysctl:
kern.sched.ipiwakeup.onecpu
kern.sched.ipiwakeup.htt2
Because they are absolutely obsolete. Probabilly the whole wakeup
forward mechanism should be revisited for a better fitting in modern
hw, in the future.
- As map2 variable is no longer used rename map3 to map2
- Fix a string by making more informative the msg and removing the
arguments passing.
Reviewed by: julian
Tested by: several
last CPU to to finish the rendezvous action may become visible to
different CPUs at different times. As a result, the CPU that initiated
the rendezvous may exit the rendezvous and drop the lock allowing another
rendezvous to be initiated on the same CPU or a different CPU. In that
case the exit sentinel may be cleared before all CPUs have noticed causing
those CPUs to hang forever.
Workaround this by using a generation count to notice when this race
occurs and to exit the rendezvous in that case.
The problem was independently diagnosted by mlaier@ and avg@ as well.
Submitted by: neel
Reviewed by: avg, mlaier
Obtained from: NetApp
MFC after: 1 week
is no relevant difference for sbufs, and it increases portability of
the source code.
Split the actual initialization of the sbuf into a separate local
function, so that certain static code checkers can understand
what sbuf_new() does, thus eliminating on silly annoyance of
MISRA compliance testing.
Contributed by: An anonymous company in the last business I
expected sbufs to invade.
choice of default size in the first place)
Reverse the order of arguments to the internal static sbuf_put_byte()
function to match everything else in this file.
Move sbuf_putc_func() inside the kernel version of sbuf_vprintf
where it belongs.
sbuf_putc() incorrectly used sbuf_putc_func() which supress NUL
characters, it should use sbuf_put_byte().
Make sbuf_finish() return -1 on error.
Minor stylistic nits fixed.
Now in the case when one-shot timers are used cyclic events should fire
closer to theier scheduled times. As the cyclic is currently used only
to drive DTrace profile provider, this is the area where the change
makes a difference.
Reviewed by: mav (earlier version, a while ago)
X-MFC after: clocksource/eventtimer subsystem
Xen timer and time counter to provide one-shot and periodic time events.
On my tests this reduces idle interruts rate down to about 30Hz, and accor-
ding to Xen VM Manager reduces host CPU load by three times comparing to
the previous periodic 100Hz clock. Also now, when needed, it is possible to
increase HZ rate without useless CPU burning during idle periods.
Now only ia64 and some ARMs left not migrated to the new event timers.
should not change. Fetch the td_user_pri under the thread lock. This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.
Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
file and processes information retrieval from the running kernel via sysctl
in the form of new library, libprocstat. The library also supports KVM backend
for analyzing memory crash dumps. Both procstat(1) and fstat(1) utilities have
been modified to take advantage of the library (as the bonus point the fstat(1)
utility no longer need superuser privileges to operate), and the procstat(1)
utility is now able to display information from memory dumps as well.
The newly introduced fuser(1) utility also uses this library and able to operate
via sysctl and kvm backends.
The library is by no means complete (e.g. KVM backend is missing vnode name
resolution routines, and there're no manpages for the library itself) so I
plan to improve it further. I'm commiting it so it will get wider exposure
and review.
We won't be able to MFC this work as it relies on changes in HEAD, which
was introduced some time ago, that break kernel ABI. OTOH we may be able
to merge the library with KVM backend if we really need it there.
Discussed with: rwatson