freebsd-skq

Author	SHA1	Message	Date
jkim	14f08fd627	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
mav	ea954fa396	Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI (FSB interrupts) to be used by non-PCI devices, such as HPET.	2010-06-14 07:10:37 +00:00
kib	bbe91d0e0f	Add another variation of make_dev(9), make_dev_p(9), that is allowed to fail and can return useful error code. Requested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:22:39 +00:00
kib	9e98593ebc	When make_dev_credf(MAKEDEV_WAITOK) is called, use devctl_notify_f(M_WAITOK) for devfs notifications. Suggested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:21:25 +00:00
kib	2605a178f6	Add modifications of devctl_notify(9) functions that take flags. Use flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for the memory allocation. As Warner noted, allowing the functions to sleep might cause reordering of the queued notifications. Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:20:38 +00:00
avg	324886002f	fix a few cases where a string is passed via format argument instead of via %s Most of the cases looked harmless, but this is done for the sake of correctness. In one case it even allowed to drop an intermediate buffer. Found by: clang MFC after: 2 week	2010-06-11 19:27:21 +00:00
jhb	9b74a62d73	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
mdf	09830f0c6f	Add INVARIANTS checking that numfreebufs values are sane. Also add a per-buf flag to catch if a buf is double-counted in the free count. This code was useful to debug an instance where a local patch at Isilon was incorrectly managing numfreebufs for a new buf state. Reviewed by: jeff Approved by: zml (mentor)	2010-06-11 17:03:26 +00:00
ivoras	5a89fd1114	In another move to join with the age of the Fruitbat, increase SYSV shared resources defaults beyond absolute minimums. The new values are chosen mostly by magic. They are still fairly small and will need increasing for large installations (especially SHMMAX). However, they are now enough to e.g. start PostgreSQL installations with ~~300 users and nearly 512 MB of shared buffers. Reviewed by: A short discussion on hackers@	2010-06-11 09:27:33 +00:00
mav	b8bbab8130	Store interrupt trap frame into struct thread. It allows interrupt handler to obtain both trap frame and opaque argument submitted on registrction. After kernel and all drivers get used to it, legacy hack can be removed. Reviewed by: jhb@	2010-06-10 16:14:05 +00:00
ivoras	04624ee0ea	Unconfuse THREAD and SMT flags	2010-06-10 11:48:14 +00:00
ivoras	7937017072	Cosmetic change to XML - less ugly newlines	2010-06-10 11:01:17 +00:00
kib	317abde372	Reorganize the code in bdwrite() which handles move of dirtiness from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks	2010-06-08 17:54:28 +00:00
jhb	72cdd6ef99	Fix a sign bug that caused adaptive spinning in sx_xlock() to not work properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days	2010-06-08 16:17:47 +00:00
mav	4363e5b2ce	Call BUS_PROBE_NOMATCH() when device detached due to driver unload. This allows bus to power-down device when driver unloaded on-flight.	2010-06-07 18:47:53 +00:00
cperciva	4adc6d09d8	Declare ip6 as (struct in6_addr ) instead of (struct in_addr ). This is a harmless bug since we never actually use ip6 as anything other than an opaque pointer. Found with: Coverty Prevent(tm) CID: 4319 MFC after: 1 month	2010-06-04 14:38:24 +00:00
jhb	16dab63fe9	Assert that the thread lock is held in sched_pctcpu() instead of recursively acquiring it. All of the current callers already hold the lock. MFC after: 1 month	2010-06-03 16:02:11 +00:00
trasz	253bf0319d	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3688	2010-06-03 13:45:27 +00:00
trasz	9985f972fd	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3684	2010-06-03 13:43:58 +00:00
trasz	cbfca8b888	The acl_cnt field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3683	2010-06-03 13:41:55 +00:00
kib	2ba33ab98e	Sometimes vnodes share the lock despite being different vnodes on different mount points, e.g. the nullfs vnode and the covered vnode from the lower filesystem. In this case, existing assertion in vop_rename_pre() may be triggered. Check for vnode locks equiality instead of the vnodes itself to not trip over the situation. Submitted by: Mikolaj Golub <to.my.trociny@gmail.com> Tested by: pho MFC after: 2 weeks	2010-06-03 10:20:08 +00:00
alc	24ac89cf14	Minimize the use of the page queues lock for synchronizing access to the page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.	2010-06-02 15:46:37 +00:00
kib	5e1e617f5e	Add a facility to dynamically adjust or unconfigure p1003_1b mib. Use it to allow to tune sem_nsem_max at runtime, only when sem.ko module is present in kernel. Requested and tested by: amdmi3 Reviewed by: jhb MFC after: 3 days	2010-06-02 09:59:05 +00:00
zml	7f5d6a35d6	Revert taskqueue(9) related commits until mdf@ is approved and can resolve issues. This reverts commits r207439, r208623, r208624	2010-06-01 16:04:01 +00:00
zml	cadeb05108	Avoid a wakeup(9) if we can be sure no one is waiting on the task. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:34 +00:00
zml	f1e0737c28	Revert r207439 and solve the problem differently. The task handler ta_func may free the task structure, so no references to its members are valid after the handler has been called. Using a per-queue member and having waits longer than strictly necessary was suggested by jhb. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:28 +00:00
rwatson	c7e8976175	When close() is called on a connected socket pair, SO_ISCONNECTED might be set but be cleared before the call to sodisconnect(). In this case, ENOTCONN is returned: suppress this error rather than returning it to userspace so that close() doesn't report an error improperly. PR: kern/144061 Reported by: Matt Reimer <mreimer at vpop.net>, Nikolay Denev <ndenev at gmail.com>, Mikolaj Golub <to.my.trociny at gmail.com> MFC after: 3 days	2010-05-27 15:27:31 +00:00
attilio	e56433dd50	Add the support for reporting the NOCOREDUMP flag from sysctl_kern_proc_vmmap(). Sponsored by: Sandvine Incorporated Reviewed by: kib, emaste MFC after: 1 week	2010-05-27 08:10:12 +00:00
kib	4f460f2f9a	Allow to use syscallname(9) outside subr_trap.c. MFC after: 1 month	2010-05-26 15:39:43 +00:00
jhb	6caceffefa	Ignore the 'addr' argument passed to PT_STEP (it is required to be '1' for PT_STEP which means "ignore") and PT_DETACH. PR: kern/146167 MFC after: 1 week	2010-05-25 21:32:37 +00:00
alc	54739180f5	Eliminate the acquisition and release of the page queues lock from vfs_busy_pages(). It is no longer needed. Submitted by: kib	2010-05-25 02:26:25 +00:00
alc	32b13ee957	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
mav	48198e3ddd	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
kib	70f08890fc	Fix the double counting of the last process thread td_incruntime on exit, that is done once in thread_exit() and the second time in proc_reap(), by clearing td_incruntime. Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg() instead of ruxagg_locked() and use it from thread_exit(). Diagnosed and tested by: neel MFC after: 3 days	2010-05-24 10:23:49 +00:00
kib	4208ccbe79	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
jhb	cf780ce267	- Adjust the whitespace for the lines that output fields in 'show pcpu' in DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month	2010-05-21 17:17:56 +00:00
jhb	ce208e1f41	Assert that the thread passed to sched_bind() and sched_unbind() is curthread as those routines are only supported for curthread currently. MFC after: 1 month	2010-05-21 17:15:56 +00:00
jhb	b7fc8e97f1	Allow a const char * to be passed as the process name to kproc_kthread_add() without generating a warning. MFC after: 1 month	2010-05-21 17:14:36 +00:00
kib	890c865dcf	Remove PIOLLHUP from the flags used to test for to set exceptfsd fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken. Reported by: Yoshihiko Sarumaru <ysarumaru gmail com> Discussed with: bde PR: ports/140934 MFC after: 2 weeks	2010-05-21 10:36:29 +00:00
alc	f8bed5b288	The page queues lock is no longer required by vm_page_set_invalid(), so eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib	2010-05-18 16:40:29 +00:00
rrs	8ea4ab29a0	This pushes all of JC's patches that I have in place. I am now able to run 32 cores ok.. but I still will hang on buildworld with a NFS problem. I suspect I am missing a patch for the netlogic rge driver. JC check and see if I am missing anything except your core-mask changes Obtained from: JC	2010-05-16 19:43:48 +00:00
bz	c9d1ca826b	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
zml	773cda6040	Add VOP_ADVLOCKPURGE so that the file system is called when purging locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr	2010-05-12 21:24:46 +00:00
pjd	05f836c1c3	When there is no memory or KVA, try to help by reclaiming some vnodes. This helps with 'kmem_map too small' panics. No objections from: kib Tested by: Alexander V. Ribchansky <shurik@zk.informjust.ua> MFC after: 1 week	2010-05-12 16:42:28 +00:00
pjd	f1b200bbcc	I added vfs_lowvnodes event, but it was only used for a short while and now it is totally unused. Remove it. MFC after: 3 days	2010-05-11 22:46:36 +00:00
attilio	4d95c325dd	Right now, WITNESS just blindly pipes all the output to the (TOCONS \| TOLOG) mask even when called from DDB points. That breaks several output, where the most notable is textdump output. Fix this by having configurable callbacks passed to witness_list_locks() and witness_display_spinlock() for printing out datas. Reported by: several broken textdump outputs Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC after: 7 days X-MFC: r207922	2010-05-11 18:24:22 +00:00
attilio	a6a1f012b7	There is not a good reason to have a different prototype for db_printf() when compared to printf(). Unify it by returning the number of characters displayed for db_printf() as well. MFC after: 7 days	2010-05-11 17:01:14 +00:00
attilio	31c196b3b9	Fix a hang introduced in r206878 for kernel compiled with SMP support but being not actual SMP and similar situations by always initializing the smp ipi mutex. Reported by: marius MFC after: 3 days X-MFC: r206878	2010-05-11 15:36:16 +00:00
alc	bc80981f79	Update a comment: It no longer makes sense to talk about the page queues lock here.	2010-05-08 23:01:47 +00:00
alc	40b44f9713	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00

1 2 3 4 5 ...

11694 Commits