Commit Graph

12802 Commits

Author SHA1 Message Date
ray
f2d384bbd0 Style fixes.
Suggested by:   mdf
Approved by:	adrian (menthor)
2012-09-04 23:16:55 +00:00
ray
da392b2a99 Add missing braces.
Approved by:	bschmidt (while mentor offline)
Pointed by:     gcooper
Pointy hat to:  ray
2012-09-03 09:46:46 +00:00
zont
867f7e9a12 - Mark some sysctls with CTLFLAG_TUN flag instead of CTLFLAG_RDTUN.
Pointed out by:	avg
Approved by:	kib (mentor)
MFC after:	1 week
2012-09-03 09:26:56 +00:00
ray
d9abd894ec Add kern.hintmode sysctl variable to show current state of hints:
0 - loader hints in environment only;
1 - static hints only
2 - fallback mode (Dynamic KENV with fallback to kernel environment)
Add kern.hintmode write handler, accept only value 2. That will switch
static KENV to dynamic. So it will be possible to change device hints.

Approved by:	adrian (mentor)
2012-09-03 08:52:05 +00:00
zont
b017762c15 - Make kern.maxtsiz, kern.dfldsiz, kern.maxdsiz, kern.dflssiz, kern.maxssiz
and kern.sgrowsiz sysctls writable.

Approved by:	kib (mentor)
2012-09-02 17:39:02 +00:00
trociny
ae35ddce9c In soreceive_generic() remove the optimization for the case when
MSG_WAITALL is set, and it is possible to do the entire receive
operation at once if we block (resid <= hiwat). Actually it might make
the recv(2) with MSG_WAITALL flag get stuck when there is enough space
in the receiver buffer to satisfy the request but not enough to open
the window closed previously due to the buffer being full.

The issue can be reproduced using the following scenario:

On the sender side do 2 send(2) requests:

1) data of size much smaller than SOBUF_SIZE (e.g. SOBUF_SIZE / 10);
2) data of size equal to SOBUF_SIZE.

On the receiver side do 2 recv(2) requests with MSG_WAITALL flag set:

1) recv() data of SOBUF_SIZE / 10 size;
2) recv() data of SOBUF_SIZE size;

We totally fill the receiver buffer with one SOBUF_SIZE/10 size request
and partial SOBUF_SIZE request. When the first request is processed we
get SOBUF_SIZE/10 free space. It is just enough to receive the rest of
bytes for the second request, and soreceive_generic() blocks in the
part that is a subject of this change waiting for the rest. But the
window was closed when the buffer was filled and to avoid silly window
syndrome it opens only when available space is larger than sb_hiwat/4
or maxseg. So it is stuck and pending data is only sent via TCP window
probes.

Discussed with:	kib (long ago)
MFC after:	2 weeks
2012-09-02 07:33:52 +00:00
trociny
e483d14d74 In soreceive_generic() when checking if the type of mbuf has changed
check it for MT_CONTROL type too, otherwise the assertion
"m->m_type == MT_DATA" below may be triggered by the following scenario:

- the sender sends some data (MT_DATA) and then a file descriptor
  (MT_CONTROL);
- the receiver calls recv(2) with a MSG_WAITALL asking for data larger
  than the receive buffer (uio_resid > hiwat).

MFC after:	2 week
2012-09-02 07:29:37 +00:00
pjd
09be0bbef1 Fix panic in procdesc that can be triggered in the following scenario:
1. Process A pdfork(2)s process B.
2. Process A passes process descriptor of B to unrelated process C.
3. Hit CTRL+C to terminate process A. Process B is also terminated
   with SIGINT.
4. init(8) collects status of process B.
5. Process C closes process descriptor associated with process B.

When we have such order of events, init(8), by collecting status of
process B, will call procdesc_reap(). This function sets pd_proc to NULL.

Now when process C calls close on this process descriptor,
procdesc_close() is called. Unfortunately procdesc_close() assumes that
pd_proc points at a valid proc structure, but it was set to NULL earlier,
so the kernel panics.

The patch also adds setting 'p->p_procdesc' to NULL in procdesc_reap(),
which I think should be done.

MFC after:	1 week
2012-09-01 11:21:56 +00:00
attilio
ec1be6b9d0 Post r222812 KTR_CPUMASK started being initialized only as a tunable
handler and not more statically.

Unfortunately, it seems that this is not ideal for new platform bringup
and boot low level development (which needs ktr_cpumask to be effective
before tunables can be setup).

Because of this, add a way to statically initialize cpusets, by passing
an list of initializers, divided by commas. Also, provide a way to enforce
an all-set mask, for above mentioned initializers.

This imposes some differences on how KTR_CPUMASK is setup now as a
kernel option, and in particular this makes the words specifications
backward wrt. what is currently in -CURRENT. In order to avoid mismatches
between KTR_CPUMASK definition and other way to setup the mask
(tunable, sysctl) and to print it, change the ordering how
cpusetobj_print() and cpusetobj_scan() acquire the words belonging
to the set.
Please give a look to sys/conf/NOTES in order to understand how the
new format is supposed to work.

Also, ktr manpages will be updated shortly by gjb which volountereed
for this.

This patch won't be merged because it changes a POLA (at least
from the theoretical standpoint) and this is however a patch that
proves to be effective only in development environments.

Requested by:	rpaulo
Reviewed by:	jeff, rpaulo
2012-08-30 21:22:47 +00:00
marius
9f542929d1 - Unlike cache invalidation and TLB demapping IPIs, reading registers from
other CPUs doesn't require locking so get rid of it. As the latter is used
  for the timecounter on certain machine models, using a spin lock in this
  case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
  Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
  for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
  demapping IPIs.
- Mark some unused function arguments as such.

MFC after:	1 week
2012-08-29 16:56:50 +00:00
ed
986966dbe7 Remove unused SI_* flags.
The SI_DEVOPEN, SI_CONSOPEN and SI_CANDELETE flags are not used by any
piece of code in the tree.
2012-08-28 19:30:29 +00:00
jhb
9f74866305 Shorten the name of the fast SWI taskqueue to "fast taskq" so that
it fits.

Reported by:	lev
MFC after:	1 week
2012-08-28 13:35:37 +00:00
np
d86771afc3 Allow nmbjumbop, nmbjumbo9, and nmbjumbo16 to be set directly via loader
tunables.

MFC after:	1 month
2012-08-23 21:32:02 +00:00
kib
9d2e20143f Provide some compat32 shims for sysctl vfs.conflist. It is required
for getvfsbyname(3) operation when called from 32bit process, and
getvfsbyname(3) is used by recent bsdtar import.

Reported by:	many
Tested by:	David Naylor <naylor.b.david@gmail.com>
MFC after:	5 days
2012-08-22 20:05:34 +00:00
jhb
9e9c95ef1d Assert that system calls do not leak a pinned thread (via sched_pin()) to
userland.
2012-08-22 20:02:42 +00:00
jhb
e62296e213 Fix a typo. 2012-08-22 20:01:57 +00:00
jhb
ca55558465 Mark the idle threads as non-sleepable and also assert that an idle
thread never blocks on a turnstile.
2012-08-22 20:01:38 +00:00
jhb
9559a94c6b Fix the 'show witness' DDB command to honor db_pager_quit. 2012-08-22 20:00:41 +00:00
jhb
2698863029 Add a BUS_CHILD_DELETED() method that a bus can hook to allow it to cleanup
any bus-specific state (such as ivars) when a child device is deleted.

Requested by:	kan
MFC after:	1 month
2012-08-21 18:13:09 +00:00
kib
4382478fb0 Deliver SIGSYS to the guilty thread, not to the process.
MFC after:	1 week
2012-08-18 18:17:10 +00:00
davidxu
b788233f5b regen. 2012-08-17 02:47:16 +00:00
davidxu
3f0806aa1f Implement syscall clock_getcpuclockid2, so we can get a clock id
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.

PR:	168417
2012-08-17 02:26:31 +00:00
jhb
05949946d9 Remove D_NEEDGIANT from dead_devsw. biofinish() (and thus dead_strategy)
does not need Giant.

MFC after:	1 month
2012-08-16 18:04:33 +00:00
kib
0160404611 As a safety measure, disable lowering pid_max too much.
Requested by:	Peter Jeremy <peter@rulingia.com>
MFC after:	1 week
2012-08-16 13:04:21 +00:00
kib
ee1919e406 Fix grammar.
Submitted by:	jh
MFC after:	1 week
2012-08-16 13:01:56 +00:00
imp
193f715d75 Limit popcorn limit to something sane (either 2ns or 2 ticks if that's
longer).

PR:		156481
Submitted by:	Ian Lepore
2012-08-16 02:35:44 +00:00
alc
6100ffde84 Correct a KASSERT message.
Submitted by:	bde
2012-08-15 22:12:01 +00:00
kib
604f552e40 Add a sysctl kern.pid_max, which limits the maximum pid the system is
allowed to allocate, and corresponding tunable with the same
name. Note that existing processes with higher pids are left intact.

MFC after:	1 week
2012-08-15 15:56:21 +00:00
hselasky
2e74f8d2a5 Revert r239178 and implement two new functions, namely
"device_free_softc()" and "device_claim_softc()",
to allow USB serial drivers refcounting the softc.
These functions are used to grab the softc from
auto-free and to free the softc back to the correct
malloc type, respectivly.

Discussed with:	jhb
MFC after:	2 weeks
2012-08-15 15:42:57 +00:00
obrien
069822c9f5 Don't include opt_ddb.h & <ddb/ddb.h> twice. 2012-08-15 14:18:54 +00:00
jh
6e556a5317 Reserve room for the terminating NUL when setting or getting kernel
environment variables. KENV_MNAMELEN and KENV_MVALLEN doesn't include
space for the terminating NUL.
2012-08-14 19:16:30 +00:00
davidxu
988528484f Some style fixes inspired by @bde. 2012-08-11 23:48:39 +00:00
mav
60e552193c Some more minor tunings inspired by bde@. 2012-08-11 20:24:39 +00:00
mav
e7c56ae436 Allow idle threads to steal second threads from other cores on systems with
8 or more cores to improve utilization.  None of my tests on 2xXeon (2x6x2)
system shown any slowdown from mentioned "excess thrashing".  Same time in
pbzip2 test with number of threads more then number of CPUs I see up to 10%
speedup with SMT disabled and up 5% with SMT enabled.  Thinking about
trashing I was trying to limit that stealing within same last level cache,
but got only worse results.  Present code any way prefers to steal threads
from topologically closer cores.

Sponsored by:	iXsystems, Inc.
2012-08-11 15:08:19 +00:00
davidxu
43c8a8efc2 tvtohz will print out an error message if a negative value is given
to it, avoid this problem by detecting timeout earlier.

Reported by: pho
2012-08-11 00:06:56 +00:00
mav
5b837de0b3 Some minor tunings/cleanups inspired by bde@ after previous commits:
- remove extra dynamic variable initializations;
 - restore (4BSD) and implement (ULE) hogticks variable setting;
 - make sched_rr_interval() more tolerant to options;
 - restore (4BSD) and implement (ULE) kern.sched.quantum sysctl, a more
user-friendly wrapper for sched_slice;
 - tune some sysctl descriptions;
 - make some style fixes.
2012-08-10 19:02:49 +00:00
mav
d8898a45d7 sched_rr_interval() seems always returned period in hz ticks, but same
always it was used as rate.  Fix use side units to period in hz ticks.
2012-08-10 18:19:57 +00:00
hselasky
3ccdeed507 Add new device method to free the automatically
allocated softc structure which is returned by
device_get_softc(). This method can be used to
easily implement softc refcounting. This can be
desirable when the softc has memory references
which are controlled by userspace handles for
example.

This solves the problem of blocking the caller
of device_detach() for a non-deterministic time.

Discussed with:	kib, ed
MFC after:	2 weeks
2012-08-10 15:02:49 +00:00
mav
1f0ad947a6 Rework r220198 change (by fabient). I believe it solves the problem from
the wrong direction. Before it, if preemption and end of time slice happen
same time, thread was put to the head of the queue as for only preemption.
It could cause single thread to run for indefinitely long time. r220198
handles it by not clearing TDF_NEEDRESCHED in case of preemption. But that
causes delayed context switch every time preemption happens, even when not
needed.

Solve problem by introducing scheduler-specifoc thread flag TDF_SLICEEND,
set when thread's time slice is over and it should be put to the tail of
queue. Using SW_PREEMPT flag for that purpose as it was before just not
enough informative to work correctly.

On my tests this by 2-3 times reduces run time deviation (improves fairness)
in cases when several threads share one CPU.

Reviewed by:	fabient
MFC after:	2 months
Sponsored by:	iXsystems, Inc.
2012-08-09 19:26:13 +00:00
mav
f0fe2cf739 SCHED_4BSD scheduling quantum mechanism appears to be broken for some time.
With switchticks variable being reset each time thread preempted (that is
done regularly by interrupt threads) scheduling quantum may never expire.
It was not noticed in time because several other factors still regularly
trigger context switches.

Handle the problem by replacing that mechanism with its equivalent from
SCHED_ULE called time slice. It is effectively the same, just measured in
context of stathz instead of hz. Some unification is probably not bad.
2012-08-09 18:09:59 +00:00
kib
2ded2254c0 Always initialize pl_event.
Submitted by:	Andrey Zonov <andrey@zonov.org>
MFC after:	3 days
2012-08-08 00:20:30 +00:00
kan
387a0565b3 Do not add handler to event handlers list until ithread is created.
In rare event when fast and ithread interrupts share the same vector
and the fast handler was registered first, we can end up trying to
schedule the ithread that is not created yet. The kernel built with
INVARIANTS then triggers an assertion.

Change the order to create the ithread first and only then add the
handler that needs it to the interrupt event handlers list.

Reviewed by: jhb
2012-08-06 16:37:43 +00:00
kib
cac2fe116f After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
mav
6d37c6c863 Particlly MFcalloutng r238425 (by davide):
Fix an issue related to old periodic timers. The code in kern_clocksource.c
uses interrupt to keep track of time, and this time may not match with
binuptime(). In order to address such incoherency, switch periodic timers
to binuptime().

Except further calloutng it is needed for already present cyclic subsystem.
2012-08-04 08:06:37 +00:00
mav
1fde1ad39c Partialy MFcalloutng r236894 (by davide):
...
While here, Bruce Evans told me that "unsigned int" is spelled "u_int" in
KNF, so replace it where needed.
2012-08-04 07:46:58 +00:00
mav
f443afa848 Microoptimize time math. As soon as our event periods are always below ome
second we may not add intereger parts by using bintime_addx() instead of
bintime_add().  Profiling shows handleevents() time redction by 15%.
2012-08-03 09:08:20 +00:00
jhb
cdbfd348a5 Reorder the managament of advisory locks on open files so that the advisory
lock is obtained before the write count is increased during open() and the
lock is released after the write count is decreased during close().

The first change closes a race where an open() that will block with O_SHLOCK
or O_EXLOCK can increase the write count while it waits.  If the process
holding the current lock on the file then tries to call exec() on the file
it has locked, it can fail with ETXTBUSY even though the advisory lock is
preventing other threads from succesfully completeing a writable open().

The second change closes a race where a read-only open() with O_SHLOCK or
O_EXLOCK may return successfully while the write count is non-zero due to
another descriptor that had the advisory lock and was blocking the open()
still being in the process of closing.  If the process that completed the
open() then attempts to call exec() on the file it locked, it can fail with
ETXTBUSY even though the other process that held a write lock has closed
the file and released the lock.

Reviewed by:	kib
MFC after:	1 month
2012-07-31 18:25:00 +00:00
davidxu
c8c77f184e I am comparing current pipe code with the one in 8.3-STABLE r236165,
I found 8.3 is a history BSD version using socket to implement FIFO
pipe, it uses per-file seqcount to compare with writer generation
stored in per-pipe object. The concept is after all writers are gone,
the pipe enters next generation, all old readers have not closed the
pipe should get the indication that the pipe is disconnected, result
is they should get EPIPE, SIGPIPE or get POLLHUP in poll().
But newcomer should not know that previous writters were gone, it
should treat it as a fresh session.
I am trying to bring back FIFO pipe to history behavior. It is still
unclear that if single EOF flag can represent SBS_CANTSENDMORE and
SBS_CANTRCVMORE which socket-based version is using, but I have run
the poll regression test in tool directory, output is same as the one
on 8.3-STABLE now.
I think the output "not ok 18 FIFO state 6b: poll result 0 expected 1.
expected POLLHUP; got 0" might be bogus, because newcomer should not
know that old writers were gone. I got the same behavior on Linux.
Our implementation always return POLLIN for disconnected pipe even it
should return POLLHUP, but I think it is not wise to remove POLLIN for
compatible reason, this is our history behavior.

Regression test: /usr/src/tools/regression/poll
2012-07-31 05:48:35 +00:00
davidxu
d2b97b9193 When a thread is blocked in direct write state, it only sets PIPE_DIRECTW
flag but not PIPE_WANTW, but FIFO pipe code does not understand this internal
state, when a FIFO peer reader closes the pipe, it wants to notify the writer,
it checks PIPE_WANTW, if not set, it skips calling wakeup(), so blocked writer
never noticed the case, but in general, the writer should return from the
syscall with EPIPE error code and may get SIGPIPE signal. Setting the
PIPE_WANTW fixed problem, or you can turn off direct write, it should fix the
problem too. This bug is found by PR/170203.

Another bug in FIFO pipe code is when peer closes the pipe, another end which
is being blocked in select() or poll() is not notified, it missed to call
pipeselwakeup().

Third problem is found in poll regression test, the existing code can not
pass 6b,6c,6d tests, but FreeBSD-4 works. This commit does not fix the
problem, I still need to study more to find the cause.

PR: 170203
Tested by: Garrett Copper &lt; yanegomi at gmail dot com &gt;
2012-07-31 02:00:37 +00:00
davide
06da175461 Until now KTR_ENTRIES, which defines the size of circular buffer used in
ktr(4), was constrained to be a power of two. Remove this constraint and
update sys/conf/NOTES accordingly.

Reviewed by:		jhb
Approved by:		gnn (mentor)
Sponsored by:		Google Summer of Code 2012
2012-07-30 22:46:42 +00:00