Commit Graph

11694 Commits

Author SHA1 Message Date
jkim
14f08fd627 Implement flexible BPF timestamping framework.
- Allow setting format, resolution and accuracy of BPF time stamps per
listener.  Previously, we were only able to use microtime(9).  Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command.  Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts.  bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp.  The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries.  On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long.  On 32-bit platforms, the size may
increase by 8 bytes.  For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested.  However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver.  Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by:	net@
2010-06-15 19:28:44 +00:00
mav
ea954fa396 Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI
(FSB interrupts) to be used by non-PCI devices, such as HPET.
2010-06-14 07:10:37 +00:00
kib
bbe91d0e0f Add another variation of make_dev(9), make_dev_p(9), that is allowed
to fail and can return useful error code.

Requested by:	jh
Reviewed by:	imp, jh
MFC after:	3 weeks
2010-06-12 13:22:39 +00:00
kib
9e98593ebc When make_dev_credf(MAKEDEV_WAITOK) is called, use
devctl_notify_f(M_WAITOK) for devfs notifications.

Suggested by:	jh
Reviewed by:	imp, jh
MFC after:	3 weeks
2010-06-12 13:21:25 +00:00
kib
2605a178f6 Add modifications of devctl_notify(9) functions that take flags. Use
flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for
the memory allocation.

As Warner noted, allowing the functions to sleep might cause
reordering of the queued notifications.

Reviewed by:	imp, jh
MFC after:	3 weeks
2010-06-12 13:20:38 +00:00
avg
324886002f fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness.  In one case it even allowed to drop an intermediate buffer.

Found by:	clang
MFC after:	2 week
2010-06-11 19:27:21 +00:00
jhb
9b74a62d73 Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
mdf
09830f0c6f Add INVARIANTS checking that numfreebufs values are sane. Also add a
per-buf flag to catch if a buf is double-counted in the free count.
This code was useful to debug an instance where a local patch at Isilon
was incorrectly managing numfreebufs for a new buf state.

Reviewed by:	jeff
Approved by:	zml (mentor)
2010-06-11 17:03:26 +00:00
ivoras
5a89fd1114 In another move to join with the age of the Fruitbat, increase SYSV
shared resources defaults beyond absolute minimums.

The new values are chosen mostly by magic. They are still fairly
small and will need increasing for large installations (especially
SHMMAX). However, they are now enough to e.g. start PostgreSQL
installations with ~~300 users and nearly 512 MB of shared buffers.

Reviewed by:	A short discussion on hackers@
2010-06-11 09:27:33 +00:00
mav
b8bbab8130 Store interrupt trap frame into struct thread. It allows interrupt handler
to obtain both trap frame and opaque argument submitted on registrction.
After kernel and all drivers get used to it, legacy hack can be removed.

Reviewed by:	jhb@
2010-06-10 16:14:05 +00:00
ivoras
04624ee0ea Unconfuse THREAD and SMT flags 2010-06-10 11:48:14 +00:00
ivoras
7937017072 Cosmetic change to XML - less ugly newlines 2010-06-10 11:01:17 +00:00
kib
317abde372 Reorganize the code in bdwrite() which handles move of dirtiness
from the buffer pages to buffer. Combine the code to set buffer
dirty range (previously in vfs_setdirty()) and to clean the pages
(vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now
the vm object lock is acquired only once.

Drain the VPO_BUSY bit of the buffer pages before setting valid
and clean bits in vfs_clean_pages_dirty_buf() with new helper
vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not
busy.

In vfs_busy_pages(), move the wait for draining of VPO_BUSY before
the dirtyness handling, to follow the structure of
vfs_clean_pages_dirty_buf().

Reported and tested by:	pho
Suggested and reviewed by:	alc
MFC after:	2 weeks
2010-06-08 17:54:28 +00:00
jhb
72cdd6ef99 Fix a sign bug that caused adaptive spinning in sx_xlock() to not work
properly.  Among other things it did not drop Giant while spinning
leading to livelocks.

Reviewed by:	rookie, kib, jmallett
MFC after:	3 days
2010-06-08 16:17:47 +00:00
mav
4363e5b2ce Call BUS_PROBE_NOMATCH() when device detached due to driver unload.
This allows bus to power-down device when driver unloaded on-flight.
2010-06-07 18:47:53 +00:00
cperciva
4adc6d09d8 Declare ip6 as (struct in6_addr *) instead of (struct in_addr *). This is
a harmless bug since we never actually use ip6 as anything other than an
opaque pointer.

Found with:	Coverty Prevent(tm)
CID:		4319
MFC after:	1 month
2010-06-04 14:38:24 +00:00
jhb
16dab63fe9 Assert that the thread lock is held in sched_pctcpu() instead of
recursively acquiring it.  All of the current callers already hold the
lock.

MFC after:	1 month
2010-06-03 16:02:11 +00:00
trasz
253bf0319d The 'acl_cnt' field is unsigned; no point in checking if it's >= 0.
Found with:	Coverity Prevent
CID:		3688
2010-06-03 13:45:27 +00:00
trasz
9985f972fd The 'acl_cnt' field is unsigned; no point in checking if it's >= 0.
Found with:	Coverity Prevent
CID:		3684
2010-06-03 13:43:58 +00:00
trasz
cbfca8b888 The acl_cnt field is unsigned; no point in checking if it's >= 0.
Found with:	Coverity Prevent
CID:		3683
2010-06-03 13:41:55 +00:00
kib
2ba33ab98e Sometimes vnodes share the lock despite being different vnodes on
different mount points, e.g. the nullfs vnode and the covered vnode
from the lower filesystem. In this case, existing assertion in
vop_rename_pre() may be triggered.

Check for vnode locks equiality instead of the vnodes itself to
not trip over the situation.

Submitted by:	Mikolaj Golub <to.my.trociny@gmail.com>
Tested by:	pho
MFC after:	2 weeks
2010-06-03 10:20:08 +00:00
alc
24ac89cf14 Minimize the use of the page queues lock for synchronizing access to the
page's dirty field.  With the exception of one case, access to this field
is now synchronized by the object lock.
2010-06-02 15:46:37 +00:00
kib
5e1e617f5e Add a facility to dynamically adjust or unconfigure p1003_1b mib.
Use it to allow to tune sem_nsem_max at runtime, only when sem.ko
module is present in kernel.

Requested and tested by:	amdmi3
Reviewed by:	jhb
MFC after:	3 days
2010-06-02 09:59:05 +00:00
zml
7f5d6a35d6 Revert taskqueue(9) related commits until mdf@ is approved and can
resolve issues.

This reverts commits r207439, r208623, r208624
2010-06-01 16:04:01 +00:00
zml
cadeb05108 Avoid a wakeup(9) if we can be sure no one is waiting on the task.
Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, jhb
2010-05-28 18:15:34 +00:00
zml
f1e0737c28 Revert r207439 and solve the problem differently. The task handler
ta_func may free the task structure, so no references to its members
are valid after the handler has been called. Using a per-queue member
and having waits longer than strictly necessary was suggested by jhb.

Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, jhb
2010-05-28 18:15:28 +00:00
rwatson
c7e8976175 When close() is called on a connected socket pair, SO_ISCONNECTED might be
set but be cleared before the call to sodisconnect().  In this case,
ENOTCONN is returned: suppress this error rather than returning it to
userspace so that close() doesn't report an error improperly.

PR:		kern/144061
Reported by:	Matt Reimer <mreimer at vpop.net>,
		Nikolay Denev <ndenev at gmail.com>,
		Mikolaj Golub <to.my.trociny at gmail.com>
MFC after:	3 days
2010-05-27 15:27:31 +00:00
attilio
e56433dd50 Add the support for reporting the NOCOREDUMP flag from
sysctl_kern_proc_vmmap().

Sponsored by:	Sandvine Incorporated
Reviewed by:	kib, emaste
MFC after:	1 week
2010-05-27 08:10:12 +00:00
kib
4f460f2f9a Allow to use syscallname(9) outside subr_trap.c.
MFC after:	1 month
2010-05-26 15:39:43 +00:00
jhb
6caceffefa Ignore the 'addr' argument passed to PT_STEP (it is required to be '1'
for PT_STEP which means "ignore") and PT_DETACH.

PR:		kern/146167
MFC after:	1 week
2010-05-25 21:32:37 +00:00
alc
54739180f5 Eliminate the acquisition and release of the page queues lock from
vfs_busy_pages().  It is no longer needed.

Submitted by:	kib
2010-05-25 02:26:25 +00:00
alc
32b13ee957 Roughly half of a typical pmap_mincore() implementation is machine-
independent code.  Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified().  Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization.  Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean().  Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by:	kib (an earlier version)
2010-05-24 14:26:57 +00:00
mav
48198e3ddd - Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.
2010-05-24 11:40:49 +00:00
kib
70f08890fc Fix the double counting of the last process thread td_incruntime
on exit, that is done once in thread_exit() and the second time in
proc_reap(), by clearing td_incruntime.

Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg()
instead of ruxagg_locked() and use it from thread_exit().

Diagnosed and tested by:	neel
MFC after:	3 days
2010-05-24 10:23:49 +00:00
kib
4208ccbe79 Reorganize syscall entry and leave handling.
Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
  usermode into struct syscall_args. The structure is machine-depended
  (this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
  from the syscall. It is a generalization of
  cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
  return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively.  The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by:	jhb, marcel, marius, nwhitehorn, stas
Tested by:	marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
	stas (mips)
MFC after:	1 month
2010-05-23 18:32:02 +00:00
jhb
cf780ce267 - Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
  the idle process is now shared across all idle threads.

MFC after:	1 month
2010-05-21 17:17:56 +00:00
jhb
ce208e1f41 Assert that the thread passed to sched_bind() and sched_unbind() is
curthread as those routines are only supported for curthread currently.

MFC after:	1 month
2010-05-21 17:15:56 +00:00
jhb
b7fc8e97f1 Allow a const char * to be passed as the process name to kproc_kthread_add()
without generating a warning.

MFC after:	1 month
2010-05-21 17:14:36 +00:00
kib
890c865dcf Remove PIOLLHUP from the flags used to test for to set exceptfsd
fd_set bits in select(2). It seems that historical behaviour is to not
reporting exception on EOF, and several applications are broken.

Reported by:	Yoshihiko Sarumaru <ysarumaru gmail com>
Discussed with:	bde
PR:	ports/140934
MFC after:	2 weeks
2010-05-21 10:36:29 +00:00
alc
f8bed5b288 The page queues lock is no longer required by vm_page_set_invalid(), so
eliminate it.

Assert that the object containing the page is locked in
vm_page_test_dirty().  Perform some style clean up while I'm here.

Reviewed by:	kib
2010-05-18 16:40:29 +00:00
rrs
8ea4ab29a0 This pushes all of JC's patches that I have in place. I
am now able to run 32 cores ok.. but I still will hang
on buildworld with a NFS problem. I suspect I am missing
a patch for the netlogic rge driver.

JC check and see if I am missing anything except your
core-mask changes

Obtained from:	JC
2010-05-16 19:43:48 +00:00
bz
c9d1ca826b Fix an issue with the dynamic pcpu/vnet data allocators.
We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous.  For the moment do not support more than
*_MODMIN space and ignore the extra space (*).

(*) We know how to get it back but it'll need testing.

Discussed with:	jeff, rwatson (briefly)
Reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-05-14 21:11:58 +00:00
zml
773cda6040 Add VOP_ADVLOCKPURGE so that the file system is called when purging
locks (in the case where the VFS impl isn't using lf_*)

Submitted by:       Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by:        zml, dfr
2010-05-12 21:24:46 +00:00
pjd
05f836c1c3 When there is no memory or KVA, try to help by reclaiming some vnodes.
This helps with 'kmem_map too small' panics.

No objections from:	kib
Tested by:		Alexander V. Ribchansky <shurik@zk.informjust.ua>
MFC after:		1 week
2010-05-12 16:42:28 +00:00
pjd
f1b200bbcc I added vfs_lowvnodes event, but it was only used for a short while and now
it is totally unused. Remove it.

MFC after:	3 days
2010-05-11 22:46:36 +00:00
attilio
4d95c325dd Right now, WITNESS just blindly pipes all the output to the
(TOCONS | TOLOG) mask even when called from DDB points.
That breaks several output, where the most notable is textdump output.
Fix this by having configurable callbacks passed to witness_list_locks()
and witness_display_spinlock() for printing out datas.

Reported by:	several broken textdump outputs
Tested by:	Giovanni Trematerra
		<giovanni dot trematerra at gmail dot com>
MFC after:	7 days
X-MFC:		r207922
2010-05-11 18:24:22 +00:00
attilio
a6a1f012b7 There is not a good reason to have a different prototype for db_printf()
when compared to printf().
Unify it by returning the number of characters displayed for db_printf()
as well.

MFC after:	7 days
2010-05-11 17:01:14 +00:00
attilio
31c196b3b9 Fix a hang introduced in r206878 for kernel compiled with SMP support but
being not actual SMP and similar situations by always initializing the
smp ipi mutex.

Reported by:	marius
MFC after:	3 days
X-MFC:		r206878
2010-05-11 15:36:16 +00:00
alc
bc80981f79 Update a comment: It no longer makes sense to talk about the page queues
lock here.
2010-05-08 23:01:47 +00:00
alc
40b44f9713 Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free().  Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped().  (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.
2010-05-08 20:34:01 +00:00