11885 Commits

Author SHA1 Message Date
Alexander Motin
93fc07b434 Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI
(FSB interrupts) to be used by non-PCI devices, such as HPET.
2010-06-14 07:10:37 +00:00
Konstantin Belousov
f1bb758d4b Add another variation of make_dev(9), make_dev_p(9), that is allowed
to fail and can return useful error code.

Requested by:	jh
Reviewed by:	imp, jh
MFC after:	3 weeks
2010-06-12 13:22:39 +00:00
Konstantin Belousov
76d43557d8 When make_dev_credf(MAKEDEV_WAITOK) is called, use
devctl_notify_f(M_WAITOK) for devfs notifications.

Suggested by:	jh
Reviewed by:	imp, jh
MFC after:	3 weeks
2010-06-12 13:21:25 +00:00
Konstantin Belousov
bebc339116 Add modifications of devctl_notify(9) functions that take flags. Use
flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for
the memory allocation.

As Warner noted, allowing the functions to sleep might cause
reordering of the queued notifications.

Reviewed by:	imp, jh
MFC after:	3 weeks
2010-06-12 13:20:38 +00:00
Andriy Gapon
1bdfff2252 fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness.  In one case it even allowed to drop an intermediate buffer.

Found by:	clang
MFC after:	2 week
2010-06-11 19:27:21 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Matthew D Fleming
d19511c357 Add INVARIANTS checking that numfreebufs values are sane. Also add a
per-buf flag to catch if a buf is double-counted in the free count.
This code was useful to debug an instance where a local patch at Isilon
was incorrectly managing numfreebufs for a new buf state.

Reviewed by:	jeff
Approved by:	zml (mentor)
2010-06-11 17:03:26 +00:00
Ivan Voras
c1e34abff8 In another move to join with the age of the Fruitbat, increase SYSV
shared resources defaults beyond absolute minimums.

The new values are chosen mostly by magic. They are still fairly
small and will need increasing for large installations (especially
SHMMAX). However, they are now enough to e.g. start PostgreSQL
installations with ~~300 users and nearly 512 MB of shared buffers.

Reviewed by:	A short discussion on hackers@
2010-06-11 09:27:33 +00:00
Alexander Motin
1f255bd340 Store interrupt trap frame into struct thread. It allows interrupt handler
to obtain both trap frame and opaque argument submitted on registrction.
After kernel and all drivers get used to it, legacy hack can be removed.

Reviewed by:	jhb@
2010-06-10 16:14:05 +00:00
Ivan Voras
a401f2d098 Unconfuse THREAD and SMT flags 2010-06-10 11:48:14 +00:00
Ivan Voras
5368befb66 Cosmetic change to XML - less ugly newlines 2010-06-10 11:01:17 +00:00
Konstantin Belousov
8d4a7be84d Reorganize the code in bdwrite() which handles move of dirtiness
from the buffer pages to buffer. Combine the code to set buffer
dirty range (previously in vfs_setdirty()) and to clean the pages
(vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now
the vm object lock is acquired only once.

Drain the VPO_BUSY bit of the buffer pages before setting valid
and clean bits in vfs_clean_pages_dirty_buf() with new helper
vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not
busy.

In vfs_busy_pages(), move the wait for draining of VPO_BUSY before
the dirtyness handling, to follow the structure of
vfs_clean_pages_dirty_buf().

Reported and tested by:	pho
Suggested and reviewed by:	alc
MFC after:	2 weeks
2010-06-08 17:54:28 +00:00
John Baldwin
8545538b6a Fix a sign bug that caused adaptive spinning in sx_xlock() to not work
properly.  Among other things it did not drop Giant while spinning
leading to livelocks.

Reviewed by:	rookie, kib, jmallett
MFC after:	3 days
2010-06-08 16:17:47 +00:00
Alexander Motin
e7d83347c0 Call BUS_PROBE_NOMATCH() when device detached due to driver unload.
This allows bus to power-down device when driver unloaded on-flight.
2010-06-07 18:47:53 +00:00
Colin Percival
3beefaed5e Declare ip6 as (struct in6_addr *) instead of (struct in_addr *). This is
a harmless bug since we never actually use ip6 as anything other than an
opaque pointer.

Found with:	Coverty Prevent(tm)
CID:		4319
MFC after:	1 month
2010-06-04 14:38:24 +00:00
John Baldwin
3da35a0a52 Assert that the thread lock is held in sched_pctcpu() instead of
recursively acquiring it.  All of the current callers already hold the
lock.

MFC after:	1 month
2010-06-03 16:02:11 +00:00
Edward Tomasz Napierala
ce9d79aa61 The 'acl_cnt' field is unsigned; no point in checking if it's >= 0.
Found with:	Coverity Prevent
CID:		3688
2010-06-03 13:45:27 +00:00
Edward Tomasz Napierala
019b32dabd The 'acl_cnt' field is unsigned; no point in checking if it's >= 0.
Found with:	Coverity Prevent
CID:		3684
2010-06-03 13:43:58 +00:00
Edward Tomasz Napierala
c977cdf961 The acl_cnt field is unsigned; no point in checking if it's >= 0.
Found with:	Coverity Prevent
CID:		3683
2010-06-03 13:41:55 +00:00
Konstantin Belousov
882da14c3d Sometimes vnodes share the lock despite being different vnodes on
different mount points, e.g. the nullfs vnode and the covered vnode
from the lower filesystem. In this case, existing assertion in
vop_rename_pre() may be triggered.

Check for vnode locks equiality instead of the vnodes itself to
not trip over the situation.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
Tested by:	pho
MFC after:	2 weeks
2010-06-03 10:20:08 +00:00
Alan Cox
c8fa870982 Minimize the use of the page queues lock for synchronizing access to the
page's dirty field.  With the exception of one case, access to this field
is now synchronized by the object lock.
2010-06-02 15:46:37 +00:00
Konstantin Belousov
3286375480 Add a facility to dynamically adjust or unconfigure p1003_1b mib.
Use it to allow to tune sem_nsem_max at runtime, only when sem.ko
module is present in kernel.

Requested and tested by:	amdmi3
Reviewed by:	jhb
MFC after:	3 days
2010-06-02 09:59:05 +00:00
Zachary Loafman
121e802b07 Revert taskqueue(9) related commits until mdf@ is approved and can
resolve issues.

This reverts commits r207439, r208623, r208624
2010-06-01 16:04:01 +00:00
Zachary Loafman
911de7741d Avoid a wakeup(9) if we can be sure no one is waiting on the task.
Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, jhb
2010-05-28 18:15:34 +00:00
Zachary Loafman
6e86cdb85c Revert r207439 and solve the problem differently. The task handler
ta_func may free the task structure, so no references to its members
are valid after the handler has been called. Using a per-queue member
and having waits longer than strictly necessary was suggested by jhb.

Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, jhb
2010-05-28 18:15:28 +00:00
Robert Watson
e35973e4b8 When close() is called on a connected socket pair, SO_ISCONNECTED might be
set but be cleared before the call to sodisconnect().  In this case,
ENOTCONN is returned: suppress this error rather than returning it to
userspace so that close() doesn't report an error improperly.

PR:		kern/144061
Reported by:	Matt Reimer <mreimer at vpop.net>,
		Nikolay Denev <ndenev at gmail.com>,
		Mikolaj Golub <to.my.trociny at gmail.com>
MFC after:	3 days
2010-05-27 15:27:31 +00:00
Attilio Rao
937912ea04 Add the support for reporting the NOCOREDUMP flag from
sysctl_kern_proc_vmmap().

Sponsored by:	Sandvine Incorporated
Reviewed by:	kib, emaste
MFC after:	1 week
2010-05-27 08:10:12 +00:00
Konstantin Belousov
b2318c2860 Allow to use syscallname(9) outside subr_trap.c.
MFC after:	1 month
2010-05-26 15:39:43 +00:00
John Baldwin
0bfbf4d220 Ignore the 'addr' argument passed to PT_STEP (it is required to be '1'
for PT_STEP which means "ignore") and PT_DETACH.

PR:		kern/146167
MFC after:	1 week
2010-05-25 21:32:37 +00:00
Alan Cox
e98d019d3c Eliminate the acquisition and release of the page queues lock from
vfs_busy_pages().  It is no longer needed.

Submitted by:	kib
2010-05-25 02:26:25 +00:00
Alan Cox
567e51e18c Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
Alexander Motin
dbd55f3ff0 - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
Konstantin Belousov
41fd9c6369 Fix the double counting of the last process thread td_incruntime
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.

Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg()
instead of ruxagg_locked() and use it from thread_exit().

Diagnosed and tested by:	neel
MFC after:	3 days
2010-05-24 10:23:49 +00:00
Konstantin Belousov
afe1a68827 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
John Baldwin
e826ef1ec4 - Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
  the idle process is now shared across all idle threads.

MFC after:	1 month
2010-05-21 17:17:56 +00:00
John Baldwin
1d7830edd5 Assert that the thread passed to sched_bind() and sched_unbind() is
curthread as those routines are only supported for curthread currently.

MFC after:	1 month
2010-05-21 17:15:56 +00:00
John Baldwin
07969f1d4d Allow a const char * to be passed as the process name to kproc_kthread_add()
without generating a warning.

MFC after:	1 month
2010-05-21 17:14:36 +00:00
Konstantin Belousov
61e53a389f Remove PIOLLHUP from the flags used to test for to set exceptfsd
fd_set bits in select(2). It seems that historical behaviour is to not
reporting exception on EOF, and several applications are broken.

Reported by:	Yoshihiko Sarumaru <ysarumaru gmail com>
Discussed with:	bde
PR:	ports/140934
MFC after:	2 weeks
2010-05-21 10:36:29 +00:00
Alan Cox
aa12e8b71d The page queues lock is no longer required by vm_page_set_invalid(), so
eliminate it.

Assert that the object containing the page is locked in
vm_page_test_dirty().  Perform some style clean up while I'm here.

Reviewed by:	kib
2010-05-18 16:40:29 +00:00
Randall Stewart
4542827d4d This pushes all of JC's patches that I have in place. I
am now able to run 32 cores ok.. but I still will hang
on buildworld with a NFS problem. I suspect I am missing
a patch for the netlogic rge driver.

JC check and see if I am missing anything except your
core-mask changes

Obtained from:	JC
2010-05-16 19:43:48 +00:00
Bjoern A. Zeeb
793f71bf2e Fix an issue with the dynamic pcpu/vnet data allocators.
We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous.  For the moment do not support more than
*_MODMIN space and ignore the extra space (*).

(*) We know how to get it back but it'll need testing.

Discussed with:	jeff, rwatson (briefly)
Reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-05-14 21:11:58 +00:00
Zachary Loafman
7fd32ea923 Add VOP_ADVLOCKPURGE so that the file system is called when purging
locks (in the case where the VFS impl isn't using lf_*)

Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, dfr
2010-05-12 21:24:46 +00:00
Pawel Jakub Dawidek
408a7c5093 When there is no memory or KVA, try to help by reclaiming some vnodes.
This helps with 'kmem_map too small' panics.

No objections from:	kib
Tested by:		Alexander V. Ribchansky <shurik@zk.informjust.ua>
MFC after:		1 week
2010-05-12 16:42:28 +00:00
Pawel Jakub Dawidek
c60c36a745 I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

MFC after:	3 days
2010-05-11 22:46:36 +00:00
Attilio Rao
98332c8c71 Right now, WITNESS just blindly pipes all the output to the
(TOCONS | TOLOG) mask even when called from DDB points.
That breaks several output, where the most notable is textdump output.
Fix this by having configurable callbacks passed to witness_list_locks()
and witness_display_spinlock() for printing out datas.

Reported by:	several broken textdump outputs
Tested by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
MFC after:	7 days
X-MFC:		r207922
2010-05-11 18:24:22 +00:00
Attilio Rao
3caaaae046 There is not a good reason to have a different prototype for db_printf()
when compared to printf().
Unify it by returning the number of characters displayed for db_printf()
as well.

MFC after:	7 days
2010-05-11 17:01:14 +00:00
Attilio Rao
de6648745c Fix a hang introduced in r206878 for kernel compiled with SMP support but
being not actual SMP and similar situations by always initializing the
smp ipi mutex.

Reported by:	marius
MFC after:	3 days
X-MFC:		r206878
2010-05-11 15:36:16 +00:00
Alan Cox
7d1d2ef60a Update a comment: It no longer makes sense to talk about the page queues
lock here.
2010-05-08 23:01:47 +00:00
Alan Cox
3c4a24406b Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00
Konstantin Belousov
d2ba618a63 Add MAKEDEV_NOWAIT flag to make_dev_credf(9), to create a device node
in a no-sleep context. If resource allocation cannot be done without
sleep, make_dev_credf() fails and returns NULL.

Reviewed by:	jh
MFC after:	2 weeks
2010-05-06 19:22:50 +00:00