freebsd-skq

Author	SHA1	Message	Date
trasz	dd1ffe6ba1	Remove outdated comment and move part of it into more applicable place.	2010-07-18 19:29:12 +00:00
ivoras	56cd1257b0	In keeping with the Age-of-the-fruitbat theme, scale up hirunningspace on machines which can clearly afford the memory. This is a somewhat conservative version of the patch - more fine tuning may be necessary. Idea from: Thread on hackers@ Discussed with: alc	2010-07-18 10:15:33 +00:00
jhb	96d598c33f	Retire td_syscalls now that it is no longer needed.	2010-07-15 20:24:37 +00:00
ivoras	3fb9f87a34	A cosmetic change - don't output empty <flags>.	2010-07-15 13:46:30 +00:00
mav	bd622e7c20	Rename timeevents.c to kern_clocksource.c. Suggested by: jhb@	2010-07-14 18:43:27 +00:00
jhb	fb1e0aa66f	- Document layout of KTR_STRUCT payload in a comment. - Simplify ktrstruct() calling convention by having ktrstruct() use strlen() rather than requiring the caller to hand-code the length of constant strings. MFC after: 1 month	2010-07-14 17:38:01 +00:00
mav	b8b00841c9	Move timeevents.c to MI code, as it is not x86-specific. I already have it working on Marvell ARM SoCs, and it would be nice to unify timer code between more platforms.	2010-07-14 13:31:27 +00:00
cperciva	14d1adbf2c	Correctly copy the M_RDONLY flag when duplicating a reference to an mbuf external buffer. Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-10:07.mbuf	2010-07-13 02:45:17 +00:00
jkim	06b6c2769b	Use type-specific inline function imax() instead of deprecated macro MAX(). Prodded by: bde	2010-07-12 15:32:45 +00:00
alc	db4ca9f5c2	Change the implementation of vm_hold_free_pages() so that it performs at most one call to pmap_qremove(), and thus one TLB shootdown, instead of one call and TLB shootdown per page. Simplify the interface to vm_hold_free_pages(). MFC after: 3 weeks	2010-07-11 20:11:44 +00:00
mav	d760bd51fb	Remove interval validation from cpu_tick_calibrate(). As I found, check was needed at preliminary version of the patch, where number of CPU ticks was divided strictly on 16 seconds. Final code instead uses real interval duration, so precise interval should not be important. Same time aliasing issues around second boundary causes false positives, periodically logging useless "t_delta ... too long/short" messages when HZ set below 256.	2010-07-11 16:47:45 +00:00
alc	7c09dc242c	Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently, the maintenance of vm_pageout_deficit can be localized to just two places: vm_page_alloc() and vm_pageout_scan(). This change also corrects an off-by-one error in the maintenance of vm_pageout_deficit. Historically, the buffer cache functions, allocbuf() and vm_hold_load_pages(), have not taken into account that vm_page_alloc() already increments vm_pageout_deficit by one. Reviewed by: kib	2010-07-09 19:38:30 +00:00
jhb	f338f6d0f8	Accidentally committed an older version of this comment rather than the final one.	2010-07-09 13:59:53 +00:00
jhb	7e3b216a37	Refine a comment. Reviewed by: bde	2010-07-09 13:53:25 +00:00
jh	d171161918	Remove redundant high >= 0. Reported by: rstone	2010-07-09 10:57:55 +00:00
jkim	93b88a93da	Implement optional 'precision' for numbers. Previously, it was parsed but ignored. Some third-party modules (e.g., APCICA) prefer this format over zero padding flag '0'.	2010-07-08 22:13:23 +00:00
jhb	1f4cf66ed2	- Various style and whitespace fixes. - Make sugid_coredump and kern_logsigexit private to kern_sig.c. Submitted by: bde (partially) MFC after: 1 month	2010-07-08 19:15:26 +00:00
jh	f673b7098a	Assert that low and high are >= 0. The allocator doesn't support the negative range.	2010-07-08 16:53:19 +00:00
attilio	865de58a04	- Simplify logic in handling ticks wrap-up - Fix a bug where thread may be in sleeping state but the wchan won't be set, leading to an empty container for sleepq_type(). [0] Sponsored by: Sandvine Incorporated [0] Submitted by: Bryan Venteicher <bryanv at daemoninthecloset dot org> MFC after: 3 days X-MFC: 209577	2010-07-07 12:00:11 +00:00
kib	15d16124c2	In revoke(), verify that VCHR vnode indeed belongs to devfs. Found and tested by: pho MFC after: 1 week	2010-07-06 18:20:49 +00:00
ed	1075ceb3e2	Fix a race condition, where a TTY could be destroyed twice. There are special cases where tty_rel_free() can be called twice in a row, namely when closing and revoking the TTY at the same moment. Only call destroy_dev_sched_cb() once. Reported by: Jeremie Le Hen MFC after: 1 week	2010-07-06 08:56:34 +00:00
kib	15a394fbba	Add the ability for the allocflag argument of the vm_page_grab() to specify the increment of vm_pageout_deficit when sleeping due to page shortage. Then, in allocbuf(), the code to allocate pages when extending vmio buffer can be replaced by a call to vm_page_grab(). Suggested and reviewed by: alc MFC after: 2 weeks	2010-07-05 21:13:32 +00:00
jh	b0744cfb8d	Extend the kernel unit number allocator for allocating specific unit numbers. This change adds a new function alloc_unr_specific() which returns the requested unit number if it is free. If the number is already allocated or out of the range, -1 is returned. Update alloc_unr(9) manual page accordingly and add a MLINK for alloc_unr_specific(9). Discussed on: freebsd-hackers	2010-07-05 16:23:55 +00:00
kib	4de7ec3dbb	Obey sv_syscallnames bounds in syscallname(). Reported and tested by: pho	2010-07-04 18:16:17 +00:00
kib	22a31bdc6e	Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused debugee stop. The change should keep the ABI. Take care of compat32. Discussed with: davidxu, jhb MFC after: 2 weeks	2010-07-04 11:48:30 +00:00
alc	afd002fb75	Use vm_page_next() instead of vm_page_lookup() in exec_map_first_page() because vm_page_next() is faster.	2010-07-02 15:50:30 +00:00
jhb	de324e256c	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.	2010-06-30 18:03:42 +00:00
jhb	738cd61a3d	Update comment for tdsignal() -> tdsendsignal() rename. Forgot to include this in 209592.	2010-06-30 18:00:45 +00:00
alc	df23299909	Improve bufdone_finish()'s handling of the bogus page. Specifically, if one or more mappings to the bogus page must be replaced, call pmap_qenter() just once. Previously, pmap_qenter() was called for each mapping to the bogus page. MFC after: 3 weeks	2010-06-30 04:52:42 +00:00
jhb	44b49a3eaa	Send SIGPIPE to the thread that issued the offending system call rather than to the entire process. Reported by: Anit Chakraborty Reviewed by: kib, deischen (concept) MFC after: 1 week	2010-06-29 20:44:19 +00:00
jhb	df7979cf76	Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process. Reviewed by: kib	2010-06-29 20:41:52 +00:00
dougb	ebed8715b6	If i is going to be used in the loop unconditionally the declaration has to be unconditional as well. Conical head covering to: kib	2010-06-29 01:04:24 +00:00
kib	180cca1c2d	Regenerate	2010-06-28 18:17:21 +00:00
kib	2ab2a361d3	Despite system call deregistration drains the threads executing System V shm syscalls, and initial check for the number of allocated segments in the module deinitialization code, the following might happen: after the check for active segment, while waiting for threads to leave some other syscall, shmget(2) is called. Then, we can end up with the shared segment that cannot be detached since sysvshm module is unloaded. Prevent the leak by rechecking and disclaiming a reference to the vm object owned by sysvshm module, that might have grown during the drain. Tested by: pho Reviewed by: jhb MFC after: 1 month	2010-06-28 18:12:42 +00:00
kib	b6d8416eac	Count number of threads that enter and leave dynamically registered syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading. Reviewed by: jhb Tested by: pho MFC after: 1 month	2010-06-28 18:06:46 +00:00
attilio	f818dc9368	Fix a lock leak in the deadlock resolver in case the ticks counter wrapped up. Sponsored by: Sandvine Incorporated Submitted by: pluknet <pluknet at gmail dot com> Reported by: Anton Yuzhaninov <citrin at citrin dot ru> Reviewed by: jhb MFC after: 3 days	2010-06-28 17:45:00 +00:00
jh	0a8e6bb738	Correct a comment typo.	2010-06-27 12:19:09 +00:00
pjd	6ff3cc04b0	Correct arguments order.	2010-06-26 21:44:45 +00:00
tuexen	d27c0f60a0	* Do not dereference a NULL pointer when calling an SCTP send syscall not providing a destination address and using ktrace. * Do not copy out kernel memory when providing sinfo for sctp_recvmsg(). Both bug where reported by Valentin Nechayev. The first bug results in a kernel panic. MFC after: 3 days.	2010-06-26 19:26:20 +00:00
nwhitehorn	ecf1995ac7	Reverse the logic of the if statement that sets the default value of HZ; the list of 1000 Hz platforms was getting unwieldy. Suggested by: marcel	2010-06-24 00:27:20 +00:00
nwhitehorn	da5a28c706	Move default HZ from 100 to 1000 on powerpc. Reviewed by: marcel MFC after: 2 weeks	2010-06-23 23:26:14 +00:00
kib	6375d4e4db	Remove the support for int13 FPU exception reporting on i386. It is believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)	2010-06-23 11:12:58 +00:00
mav	a21b0b9d72	"time lock" is no longer a spin-lock since r209371. Reported by: kib@	2010-06-21 21:15:51 +00:00
ed	76489ac1ea	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.	2010-06-21 09:55:56 +00:00
kib	107ec73aad	Do not report a stack garbage as the old value for debug.ncores sysctl. Reported by: brucec	2010-06-21 09:51:25 +00:00
mav	d1175426d7	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
pjd	b3024a4af9	Backout r207970 for now, it can lead to deadlocks. Reported by: kan MFC after: 3 days	2010-06-17 17:39:51 +00:00
rpaulo	a8c5bafed5	Make DTrace syscall provider work again by including opt_kdtrace.h here.	2010-06-17 17:34:45 +00:00
jh	8a203f841c	- Fix compilation of the subr_unit.c user space test program. - Use %zu for size_t in a few format strings.	2010-06-17 16:12:06 +00:00
avg	9f2d4c3357	lock_profile_release_lock: do not compare unsigned with zero Found by: Coverity Prevent CID: 3660 Reviewed by: jhb MFC after: 2 weeks	2010-06-17 10:15:13 +00:00
ed	70171ee94e	Remove the unit argument from the recently added make_dev_p(). New code that creates character devices shouldn't use device unit numbers, but only si_drv[12] to hold pointer to per-device data. Make this function more future proof by removing the unit number argument. Discussed with: kib	2010-06-17 08:49:31 +00:00
jh	1c0174e29a	Correct the function name in a KASSERT.	2010-06-16 16:02:17 +00:00
jkim	14f08fd627	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
mav	ea954fa396	Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI (FSB interrupts) to be used by non-PCI devices, such as HPET.	2010-06-14 07:10:37 +00:00
kib	bbe91d0e0f	Add another variation of make_dev(9), make_dev_p(9), that is allowed to fail and can return useful error code. Requested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:22:39 +00:00
kib	9e98593ebc	When make_dev_credf(MAKEDEV_WAITOK) is called, use devctl_notify_f(M_WAITOK) for devfs notifications. Suggested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:21:25 +00:00
kib	2605a178f6	Add modifications of devctl_notify(9) functions that take flags. Use flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for the memory allocation. As Warner noted, allowing the functions to sleep might cause reordering of the queued notifications. Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:20:38 +00:00
avg	324886002f	fix a few cases where a string is passed via format argument instead of via %s Most of the cases looked harmless, but this is done for the sake of correctness. In one case it even allowed to drop an intermediate buffer. Found by: clang MFC after: 2 week	2010-06-11 19:27:21 +00:00
jhb	9b74a62d73	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
mdf	09830f0c6f	Add INVARIANTS checking that numfreebufs values are sane. Also add a per-buf flag to catch if a buf is double-counted in the free count. This code was useful to debug an instance where a local patch at Isilon was incorrectly managing numfreebufs for a new buf state. Reviewed by: jeff Approved by: zml (mentor)	2010-06-11 17:03:26 +00:00
ivoras	5a89fd1114	In another move to join with the age of the Fruitbat, increase SYSV shared resources defaults beyond absolute minimums. The new values are chosen mostly by magic. They are still fairly small and will need increasing for large installations (especially SHMMAX). However, they are now enough to e.g. start PostgreSQL installations with ~~300 users and nearly 512 MB of shared buffers. Reviewed by: A short discussion on hackers@	2010-06-11 09:27:33 +00:00
mav	b8bbab8130	Store interrupt trap frame into struct thread. It allows interrupt handler to obtain both trap frame and opaque argument submitted on registrction. After kernel and all drivers get used to it, legacy hack can be removed. Reviewed by: jhb@	2010-06-10 16:14:05 +00:00
ivoras	04624ee0ea	Unconfuse THREAD and SMT flags	2010-06-10 11:48:14 +00:00
ivoras	7937017072	Cosmetic change to XML - less ugly newlines	2010-06-10 11:01:17 +00:00
kib	317abde372	Reorganize the code in bdwrite() which handles move of dirtiness from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks	2010-06-08 17:54:28 +00:00
jhb	72cdd6ef99	Fix a sign bug that caused adaptive spinning in sx_xlock() to not work properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days	2010-06-08 16:17:47 +00:00
mav	4363e5b2ce	Call BUS_PROBE_NOMATCH() when device detached due to driver unload. This allows bus to power-down device when driver unloaded on-flight.	2010-06-07 18:47:53 +00:00
cperciva	4adc6d09d8	Declare ip6 as (struct in6_addr ) instead of (struct in_addr ). This is a harmless bug since we never actually use ip6 as anything other than an opaque pointer. Found with: Coverty Prevent(tm) CID: 4319 MFC after: 1 month	2010-06-04 14:38:24 +00:00
jhb	16dab63fe9	Assert that the thread lock is held in sched_pctcpu() instead of recursively acquiring it. All of the current callers already hold the lock. MFC after: 1 month	2010-06-03 16:02:11 +00:00
trasz	253bf0319d	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3688	2010-06-03 13:45:27 +00:00
trasz	9985f972fd	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3684	2010-06-03 13:43:58 +00:00
trasz	cbfca8b888	The acl_cnt field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3683	2010-06-03 13:41:55 +00:00
kib	2ba33ab98e	Sometimes vnodes share the lock despite being different vnodes on different mount points, e.g. the nullfs vnode and the covered vnode from the lower filesystem. In this case, existing assertion in vop_rename_pre() may be triggered. Check for vnode locks equiality instead of the vnodes itself to not trip over the situation. Submitted by: Mikolaj Golub <to.my.trociny@gmail.com> Tested by: pho MFC after: 2 weeks	2010-06-03 10:20:08 +00:00
alc	24ac89cf14	Minimize the use of the page queues lock for synchronizing access to the page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.	2010-06-02 15:46:37 +00:00
kib	5e1e617f5e	Add a facility to dynamically adjust or unconfigure p1003_1b mib. Use it to allow to tune sem_nsem_max at runtime, only when sem.ko module is present in kernel. Requested and tested by: amdmi3 Reviewed by: jhb MFC after: 3 days	2010-06-02 09:59:05 +00:00
zml	7f5d6a35d6	Revert taskqueue(9) related commits until mdf@ is approved and can resolve issues. This reverts commits r207439, r208623, r208624	2010-06-01 16:04:01 +00:00
zml	cadeb05108	Avoid a wakeup(9) if we can be sure no one is waiting on the task. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:34 +00:00
zml	f1e0737c28	Revert r207439 and solve the problem differently. The task handler ta_func may free the task structure, so no references to its members are valid after the handler has been called. Using a per-queue member and having waits longer than strictly necessary was suggested by jhb. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:28 +00:00
rwatson	c7e8976175	When close() is called on a connected socket pair, SO_ISCONNECTED might be set but be cleared before the call to sodisconnect(). In this case, ENOTCONN is returned: suppress this error rather than returning it to userspace so that close() doesn't report an error improperly. PR: kern/144061 Reported by: Matt Reimer <mreimer at vpop.net>, Nikolay Denev <ndenev at gmail.com>, Mikolaj Golub <to.my.trociny at gmail.com> MFC after: 3 days	2010-05-27 15:27:31 +00:00
attilio	e56433dd50	Add the support for reporting the NOCOREDUMP flag from sysctl_kern_proc_vmmap(). Sponsored by: Sandvine Incorporated Reviewed by: kib, emaste MFC after: 1 week	2010-05-27 08:10:12 +00:00
kib	4f460f2f9a	Allow to use syscallname(9) outside subr_trap.c. MFC after: 1 month	2010-05-26 15:39:43 +00:00
jhb	6caceffefa	Ignore the 'addr' argument passed to PT_STEP (it is required to be '1' for PT_STEP which means "ignore") and PT_DETACH. PR: kern/146167 MFC after: 1 week	2010-05-25 21:32:37 +00:00
alc	54739180f5	Eliminate the acquisition and release of the page queues lock from vfs_busy_pages(). It is no longer needed. Submitted by: kib	2010-05-25 02:26:25 +00:00
alc	32b13ee957	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
mav	48198e3ddd	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
kib	70f08890fc	Fix the double counting of the last process thread td_incruntime on exit, that is done once in thread_exit() and the second time in proc_reap(), by clearing td_incruntime. Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg() instead of ruxagg_locked() and use it from thread_exit(). Diagnosed and tested by: neel MFC after: 3 days	2010-05-24 10:23:49 +00:00
kib	4208ccbe79	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
jhb	cf780ce267	- Adjust the whitespace for the lines that output fields in 'show pcpu' in DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month	2010-05-21 17:17:56 +00:00
jhb	ce208e1f41	Assert that the thread passed to sched_bind() and sched_unbind() is curthread as those routines are only supported for curthread currently. MFC after: 1 month	2010-05-21 17:15:56 +00:00
jhb	b7fc8e97f1	Allow a const char * to be passed as the process name to kproc_kthread_add() without generating a warning. MFC after: 1 month	2010-05-21 17:14:36 +00:00
kib	890c865dcf	Remove PIOLLHUP from the flags used to test for to set exceptfsd fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken. Reported by: Yoshihiko Sarumaru <ysarumaru gmail com> Discussed with: bde PR: ports/140934 MFC after: 2 weeks	2010-05-21 10:36:29 +00:00
alc	f8bed5b288	The page queues lock is no longer required by vm_page_set_invalid(), so eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib	2010-05-18 16:40:29 +00:00
rrs	8ea4ab29a0	This pushes all of JC's patches that I have in place. I am now able to run 32 cores ok.. but I still will hang on buildworld with a NFS problem. I suspect I am missing a patch for the netlogic rge driver. JC check and see if I am missing anything except your core-mask changes Obtained from: JC	2010-05-16 19:43:48 +00:00
bz	c9d1ca826b	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
zml	773cda6040	Add VOP_ADVLOCKPURGE so that the file system is called when purging locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr	2010-05-12 21:24:46 +00:00
pjd	05f836c1c3	When there is no memory or KVA, try to help by reclaiming some vnodes. This helps with 'kmem_map too small' panics. No objections from: kib Tested by: Alexander V. Ribchansky <shurik@zk.informjust.ua> MFC after: 1 week	2010-05-12 16:42:28 +00:00
pjd	f1b200bbcc	I added vfs_lowvnodes event, but it was only used for a short while and now it is totally unused. Remove it. MFC after: 3 days	2010-05-11 22:46:36 +00:00
attilio	4d95c325dd	Right now, WITNESS just blindly pipes all the output to the (TOCONS \| TOLOG) mask even when called from DDB points. That breaks several output, where the most notable is textdump output. Fix this by having configurable callbacks passed to witness_list_locks() and witness_display_spinlock() for printing out datas. Reported by: several broken textdump outputs Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC after: 7 days X-MFC: r207922	2010-05-11 18:24:22 +00:00
attilio	a6a1f012b7	There is not a good reason to have a different prototype for db_printf() when compared to printf(). Unify it by returning the number of characters displayed for db_printf() as well. MFC after: 7 days	2010-05-11 17:01:14 +00:00
attilio	31c196b3b9	Fix a hang introduced in r206878 for kernel compiled with SMP support but being not actual SMP and similar situations by always initializing the smp ipi mutex. Reported by: marius MFC after: 3 days X-MFC: r206878	2010-05-11 15:36:16 +00:00
alc	bc80981f79	Update a comment: It no longer makes sense to talk about the page queues lock here.	2010-05-08 23:01:47 +00:00
alc	40b44f9713	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
kib	77dcee6926	Add MAKEDEV_NOWAIT flag to make_dev_credf(9), to create a device node in a no-sleep context. If resource allocation cannot be done without sleep, make_dev_credf() fails and returns NULL. Reviewed by: jh MFC after: 2 weeks	2010-05-06 19:22:50 +00:00
alc	fecc56fac1	Eliminate page queues locking around most calls to vm_page_free().	2010-05-06 18:58:32 +00:00
trasz	f26ccb52af	Avoid overflow. Submitted by: bde@	2010-05-06 18:52:41 +00:00
trasz	e6f92048fa	Style fixes and removal of unneeded variable. Submitted by: bde@	2010-05-06 18:43:19 +00:00
alc	e6c77ecaea	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
alc	a4eb017f2a	Eliminate a small bit of unneeded code from kern_sendfile(): While kern_sendfile() is running, the file's vm object can't be destroyed because kern_sendfile() increments the vm object's reference count. (Once kern_sendfile() decrements the reference count and returns, the vm object can, however, be destroyed. So, sf_buf_mext() must handle the case where the vm object is destroyed.) Reviewed by: kib	2010-05-06 15:52:08 +00:00
joel	c8dfd5c0cb	Switch to our preferred 2-clause BSD license. Approved by: kmacy	2010-05-05 20:39:02 +00:00
alc	5c7ca3ee73	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib	2010-05-05 18:16:06 +00:00
trasz	402e3baade	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib	2010-05-05 16:44:25 +00:00
kib	a3da7d7e69	Fix a mistake in r207603. td_rux.rux_runtime still needs conversion. Reported and tested by: nwhitehorn Pointy hat to: kib MFC after: 6 days	2010-05-05 16:05:51 +00:00
alc	ea7b6345be	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
alc	c9aaa1e2a2	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
kib	26be0345aa	Fix typo in comment. MFC after: 3 days	2010-05-04 06:06:01 +00:00
kib	e5f4727bbf	Remove a comment that merely repeats code. Submitted by: bde MFC after: 1 week	2010-05-04 06:04:33 +00:00
kib	7ef4b25b49	Use td_rux.rux_runtime for ki_runtime instead of redoing calculation. Submitted by: bde MFC after: 1 week	2010-05-04 06:00:39 +00:00
kib	b13e838a49	Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks information for thread to allow calcru1() (re)use. Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1]. The ruxagg_locked() function no longer clears thread ticks nor td_incruntime. Requested by: attilio [1] Discussed with: attilio, bde Reviewed by: bde Based on submission by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week X-MFC-Note: td_rux shall be moved to the end of struct thread	2010-05-04 05:55:37 +00:00
alc	1923b6ded3	Acquire the page lock around vm_page_unwire() and vm_page_wire(). Reviewed by: kib	2010-05-03 16:41:11 +00:00
alc	387e15c45a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
kib	9c4f2e9ab2	Lock the page around hold_count access. Reviewed by: alc	2010-05-02 19:25:22 +00:00
alc	299c89c6fb	Properly synchronize access to the page's hold_count in vfs_vmio_release(). Reviewed by: kib	2010-05-02 19:10:27 +00:00
alc	f35e97166b	It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.	2010-05-02 17:33:46 +00:00
zec	139551016d	Remove a redundant variable assignment. Reviewed by: bz, rwatson MFC after: 3 days	2010-05-01 18:34:50 +00:00
kib	64dab823a0	Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility function ruxagg_tlock(). Convert the definition of kern_getrusage() to ANSI C. Submitted by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week	2010-05-01 14:46:17 +00:00
zml	3eac0000f0	Handle taskqueue_drain(9) correctly on a threaded taskqueue: taskqueue_drain(9) will not correctly detect whether a task is currently running. The check is against a field in the taskqueue struct, but for a threaded queue with more than one thread, multiple threads can simultaneously be running a task, thus stomping over the tq_running field. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: jhb Approved by: dfr (mentor)	2010-04-30 16:29:05 +00:00
alfred	12d5232340	Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(), use the heap instead. Obtained from: Juniper Networks Reviewed by: jhb	2010-04-30 03:15:00 +00:00
alfred	993bf6ff36	Don't leak core_buf or gzfile if doing a compressed core file and we hit an error condition. Obtained from: Juniper Networks	2010-04-30 03:13:24 +00:00
alfred	20fdc94b9e	Do not set IO_NODELOCKED while writing to vnodes as our consumers do not lock the vnodes. Obtained from: Juniper Networks Reviewed by: jhb	2010-04-30 03:10:53 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
kib	a22b32df4a	Remove caddr_t casts. Requested by: bde MFC after: 10 days	2010-04-29 09:55:51 +00:00
avg	2cfe78bdd9	kern_ntptime: drop a comment that became stale after r207359 MFC after: 1 week X-MFC after: r207359	2010-04-29 09:18:36 +00:00
avg	cce2a4186b	periodically save system time to hardware time-of-day clock This is done in kern_ntptime, perhaps not the best place. This is done using resettodr(). Some features: - make save period configurable via tunable and sysctl - period of zero disables saving, setting a non-zero period re-enables it or reschedules it - do saving only if system clock is ntp-synchronized - save on shutdown Discussed with: des, Peter Jeremy <peterjeremy@acm.org> X-Maybe: save time near seconds boundary for better precision MFC after: 2 weeks	2010-04-29 09:02:46 +00:00
avg	cbff4850b6	kern_ntptime: abstract time error check into a function ... to avoid code duplication MFC after: 1 week	2010-04-29 09:02:21 +00:00
lstewart	bf49d6a9f9	- Rework the underlying ALQ storage to be a circular buffer, which amongst other things allows variable length messages to be easily supported. - Extend KPI with alq_writen() and alq_getn() to support variable length messages, which is enabled at ALQ creation time depending on the arguments passed to alq_open(). Also add variants of alq_open() and alq_post() that accept a flags argument. The KPI is still fully backwards compatible and shouldn't require any change in ALQ consumers unless they wish to utilise the new features. - Introduce the ALQ_NOACTIVATE and ALQ_ORDERED flags to allow ALQ consumers to have more control over IO scheduling and resource acquisition respectively. - Strengthen invariants checking. - Document ALQ changes in ALQ(9) man page. Sponsored by: FreeBSD Foundation Reviewed by: gnn, jeff, rpaulo, rwatson MFC after: 1 month	2010-04-26 13:48:22 +00:00
kib	e91c695f77	Move the constants specifying the size of struct kinfo_proc into machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks	2010-04-24 12:49:52 +00:00
jeff	a574495410	- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm	2010-04-24 07:05:35 +00:00
bz	8b9f8a6735	Remove one zero from the double-0. This code doesn't have a license to kill. MFC after: 3 days	2010-04-23 14:32:58 +00:00
kib	f504a7390f	Fix typo. Submitted by: emaste Pointy hat to: kib (who needs much bigger wardrobe) MFC after: 1 week	2010-04-21 20:04:42 +00:00
kib	1b4a81ab7e	Provide compat32 shims for kinfo_proc sysctl. This allows 32bit ps(1) to mostly work on 64bit host. The work is based on an original patch submitted by emaste, obtained from Sandvine's source tree. Reviewed by: jhb MFC after: 1 week	2010-04-21 19:32:00 +00:00
imp	f6c103ea8b	Make sure that we free the passed in data message if we don't actually insert it onto the queue. Also, fix a mtx leak if someone turns off devctl while we're processing a messages. MFC after: 5 days	2010-04-20 20:39:42 +00:00
attilio	6dda1433c8	Fix compilation in the !SMP case. Keep the interrupts disabled in order to avoid preemption problems. Reported by: tinderbox, b.f. <bf1783 at googlemail dot com> MFC: 2 weeks X-MFC: r206878	2010-04-20 12:22:06 +00:00
kib	d5b92466a9	The cache_enter(9) function shall not be called for doomed dvp. Assert this. In the reported panic, vdestroy() fired the assertion "vp has namecache for ..", because pseudofs may end up doing cache_enter() with reclaimed dvp, after dotdot lookup temporary unlocked dvp. Similar problem exists in ufs_lookup() for "." lookup, when vnode lock needs to be upgraded. Verify that dvp is not reclaimed before calling cache_enter(). Reported and tested by: pho Reviewed by: kan MFC after: 2 weeks	2010-04-20 10:19:27 +00:00
attilio	70405fc79f	getblk lockmgr is mostly used as a msleep() and may lead too easilly to false positives. Whitelist it. Reported by: Erik Cederstrand <erik at cederstrand dot dk>	2010-04-19 23:40:46 +00:00
attilio	fca97c8d7a	Fix a deadlock in the shutdown code: When performing a smp_rendezvous() or more likely, on amd64 and i386, a smp_tlb_shootdown() the caller will end up with the smp_ipi_mtx spinlock held, busy-waiting for other CPUs to acknowledge the operation. As long as CPUs are suspended (via cpu_reset()) between the active mask read and IPI sending there can be a deadlock where the caller will wait forever for a dead CPU to acknowledge the operation. Please note that on CPU0 that is going to be someway heavier because of the spinlocks being disabled earlier than quitting the machine. Fix this bug by calling cpu_reset() with the smp_ipi_mtx held. Note that it is very likely that a saner offline/online CPUs mechanism will help heavilly in fixing similar cases as it is likely more bugs of this type may arise in the future. Reported by: rwatson Discussed with: jhb Tested by: rnoland, Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC: 2 weeks Special deciation to: anyone who made possible to have 16-ways machines in Netperf	2010-04-19 23:27:54 +00:00
kib	502beaff46	Fix typo. MFC after: 3 days	2010-04-15 17:17:02 +00:00
julian	247d9af67e	Change the semantics of the debug.ktr.alq_enable control so that when you disable alq, it acts as if alq had not been enabled in the build. in other words, the rest of ktr is still available for use. If you really don't want that as well, set the mask to 0. MFC after:3 weeks	2010-04-14 21:42:29 +00:00
kib	bf84540647	Handle a case in kern_openat() when vn_open() change file type from DTYPE_VNODE. Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode, since we allow for fcntl(2) to process with advisory locks for DTYPE_VNODE only. Another reason is that all fo_close() routines need to check and release locks otherwise. For O_TRUNC, call fo_truncate() instead of truncating the vnode. Discussed with: rwatson MFC after: 2 week	2010-04-13 08:52:20 +00:00
kib	d5f342f2da	Remove XXX comment. Add another comment, describing why f_vnode assignment is useful. MFC after: 3 days	2010-04-13 08:45:55 +00:00
alc	89e5d72c2b	Initialize the virtual memory-related resource limits in a single place. Previously, one of these limits was initialized in two places to a different value in each place. Moreover, because an unsigned int was used to represent the amount of pageable physical memory, some of these limits were incorrectly initialized on 64-bit architectures. (Currently, this error is masked by login.conf's default settings.) Make vm_thread_swapin() and vm_thread_swapout() static. Submitted by: bde (an earlier version) Reviewed by: kib	2010-04-11 16:26:07 +00:00
attilio	02f2ab87a2	- Introduce a blessed list for sxlocks that prevents the deadlkres to panic on those ones. [0] - Fix ticks counter wrap-up Sponsored by: Sandvine Incorporated [0] Reported by: jilles [0] Tested by: jilles MFC: 1 week	2010-04-11 16:06:09 +00:00
kib	231b3cddc2	Do not leak master pty or ptmx vnode. Report and test case by: Petr Salinger <Petr.Salinger seznam cz> Reviewed by: ed MFC after: 1 week	2010-04-08 08:58:18 +00:00
kib	47feb6893a	When OOM searches for a process to kill, ignore the processes already killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks	2010-04-06 10:43:01 +00:00
jh	6b4bef1bca	Add missing MNT_NFS4ACLS.	2010-04-04 14:48:43 +00:00
alc	7530e331f2	Make _vm_map_init() the one place where the vm map's pmap field is initialized. Reviewed by: kib	2010-04-03 19:07:05 +00:00
pjd	bd6cec6aca	Fix some whitespace nits.	2010-04-03 11:19:20 +00:00
pjd	f0663e1c41	Add missing mnt_kern_flag flags in 'show mount' output.	2010-04-03 11:15:55 +00:00
avg	91cd4478c2	vn_stat: take into account va_blocksize when setting st_blksize As currently st_blksize is always PAGE_SIZE, it is playing safe to not use any smaller value. For some cases this might not be optimal, but at least nothing should get broken. Generally I don't expect this commit to change much for the following reasons (in case of VREG, VDIR): - application I/O and physical I/O are sufficiently decoupled by filesystem code, buffer cache code, cluster and read-ahead logic - not all applications use st_blksize as a hint, some use f_iosize, some use fixed block sizes I expect writes to the middle of files on ZFS to benefit the most from this change. Silence from: fs@ MFC after: 2 weeks	2010-04-03 08:39:00 +00:00
avg	ad244906c7	bo_bsize: revert r205860 and take an alternative approch in getblk In r205860 I missed the fact that there is code that strongly assumes that devvp bo_bsize is equal to underlying provider's sectorsize. In those places it is hard to obtain the sectorsize in an alternative way if devvp bo_bsize is set to something else. So, I am reverting bo_bsize assigment in g_vfs_open. Instead, in getblk I use DEV_BSIZE block size for b_offset calculation if vp is a disk vp as reported by vn_isdisk. This should coinside with vp being a devvp. Reported by: Mykola Dzham <i@levsha.me> Tested by: Mykola Dzham <i@levsha.me> Pointyhat to: avg MFC after: 2 weeks X-ToDo: convert bread(devvp) in all fs to use bo_bsize-d blocks	2010-04-02 15:12:31 +00:00
kib	dbbce00a33	Supply default implementation of VOP_RENAME() that does neccessary unlocks and unreferences for argument vnodes, as expected by kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and reference leaks when rename is attempted on fs that does not implement rename. PR: kern/107439 Based on submission by: Mikolaj Golub <to.my.trociny gmail com> Tested by: Mikolaj Golub MFC after: 1 week	2010-04-02 14:03:43 +00:00
kib	86c35b90b7	Add function vop_rename_fail(9) that performs needed cleanup for locks and references of the VOP_RENAME(9) arguments. Use vop_rename_fail() in deadfs_rename(). Tested by: Mikolaj Golub MFC after: 1 week	2010-04-02 14:03:01 +00:00
lstewart	c1a8ef630e	The ALQ should not be considered drained until it has been made inactive. Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch) Approved by: kmacy (mentor) MFC after: 1 month	2010-04-01 01:27:10 +00:00
lstewart	f1881c310c	According to SLEEP(9), msleep() is deprecated in favour of mtx_sleep(). Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch) Approved by: kmacy (mentor) MFC after: 1 month	2010-04-01 01:23:36 +00:00
lstewart	4aab292692	- Factor code to destroy an ALQ out of alq_close() into a private alq_destroy(). - Use the new alq_destroy() to properly handle a failure case in alq_open(). Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson (as part of a larger patch) Approved by: kmacy (mentor) MFC after: 1 month	2010-04-01 01:16:00 +00:00
lstewart	a8e85cee7a	Add support for ALQ(9) to be compiled and loaded as a kernel module. Sponsored by: FreeBSD Foundation Reviewed by: dwmalone, jeff, rpaulo, rwatson Approved by: kmacy (mentor) MFC after: 1 month	2010-03-31 03:58:57 +00:00
jhb	5bba6cc028	Defer freeing a kevent list until after dropping kqueue locks. LOR: 185 Submitted by: Matthew Fleming @ Isilon MFC after: 1 week	2010-03-30 18:31:55 +00:00
ed	4f08ecd7ed	Rename st_timespec fields to st_tim for POSIX 2008 compliance. A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names. This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous. I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.	2010-03-28 13:13:22 +00:00
jh	7a3d363cfe	Support only LOOKUP operation for "/" in relookup() because lookup() can't succeed for CREATE, DELETE and RENAME. Discussed with: bde	2010-03-26 11:33:12 +00:00
nwhitehorn	ac2318460c	Add the ELF relocation base to struct image_params. This will be required to correctly relocate the executable entry point's function descriptor on powerpc64.	2010-03-25 14:31:26 +00:00
nwhitehorn	d63c82a6ac	Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb	2010-03-25 14:24:00 +00:00
nwhitehorn	b60f1f5349	Change the way text_addr and data_addr are computed to use the executable status of segments instead of detecting the main text segment by which segment contains the program entry point. This affects obreak() and is required for correct operation of that function on 64-bit PowerPC systems. The previous behavior was apparently required only for the Alpha, which is no longer supported. Reviewed by: jhb Tested on: amd64, sparc64, powerpc	2010-03-25 14:21:22 +00:00
bz	8fb79807f2	Print the pointer to the lock with the panic message. The previous panic: rw lock not unlocked was not really helpful for debugging. Now one can at least call show lock <ptr> form ddb to learn more about the lock. MFC after: 3 days	2010-03-24 19:21:26 +00:00
nwhitehorn	32325226c5	The nargvstr and nenvstr properties of arginfo are ints, not longs, so should be copied to userspace with suword32() instead of suword(). This alleviates problems on 64-bit big-endian architectures, and is a no-op on all 32-bit architectures. Tested on: amd64, sparc64, powerpc64	2010-03-24 03:13:24 +00:00
ed	6156503467	Actually make O_DIRECTORY work. According to POSIX open() must return ENOTDIR when the path name does not refer to a path name. Change vn_open() to respect this flag. This also simplifies the Linuxolator a bit.	2010-03-21 20:43:23 +00:00
bz	102a1f8933	Split eventhandler_register() into an internal part and a wrapper function that provides the allocated and setup eventhandler entry. Add a new wrapper for VIMAGE that allocates extra space to hold the callback function and argument in addition to an extra wrapper function. While the wrapper function goes as normal callback function the argument points to the extra space allocated holding the original func and arg that the wrapper function can then call. Provide an iterator function for the virtual network stack (vnet) that will call the callback function for each network stack. Provide a new set of macros for VNET that in the non-VIMAGE case will just call eventhandler_register() while in the VIMAGE case it will use vimage_eventhandler_register() passing in the extra iterator function but will only register once rather than per-vnet. We need a special macro in case we are interested in the tag returned as we must check for curvnet and can neither simply assign the return value, nor not change it in the non-vnet0 case without that. Sponsored by: ISPsystem Discussed with: jhb Reviewed by: zec (earlier version), jhb MFC after: 1 month	2010-03-19 19:51:03 +00:00
kib	1e468766f7	Convert aio syscall registration to SYSCALL_INIT_HELPER. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:11:34 +00:00
kib	06319cba03	Implement compat32 shims for mqueuefs. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:10:24 +00:00
kib	34d2655cb1	Implement compat32 shims for ksem syscalls. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:08:43 +00:00
kib	b27fa06f97	Move SysV IPC freebsd32 compat shims from freebsd32_misc.c to corresponding sysv_{msg,sem,shm}.c files. Mark SysV IPC freebsd32 syscalls as NOSTD and add required SYSCALL_INIT_HELPER/SYSCALL32_INIT_HELPERs to provide auto register/unregister on module load. This makes COMPAT_FREEBSD32 functional with SysV IPC compiled and loaded as modules. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:04:42 +00:00
kib	610214ed4c	Move SysV IPC freebsd32 compat shims helpers from freebsd32_misc.c to sysv_ipc.c. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 11:01:51 +00:00
kib	d19a162142	Introduce SYSCALL_INIT_HELPER and SYSCALL32_INIT_HELPER macros and neccessary support functions to allow registering dynamically loaded syscalls from the MOD_LOAD handlers. Helpers handle registration failures semi-automatically. Reviewed by: jhb MFC after: 2 weeks	2010-03-19 10:56:30 +00:00
kib	b145781d49	Properly handle compat32 calls to sctp generic sendmsd/recvmsg functions that take iov. Reviewed by: tuexen MFC after: 2 weeks	2010-03-19 10:46:54 +00:00
kib	0c73b78495	Remove dead statement. Reviewed by: tuexen MFC after: 2 weeks	2010-03-19 10:44:02 +00:00
kib	08e36289a4	Fix two style issues. MFC after: 2 weeks	2010-03-19 10:41:32 +00:00
jhb	5b4b1c75c0	Style fixes. Submitted by: bde	2010-03-11 15:13:55 +00:00
nwhitehorn	142a4d2993	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb	2010-03-11 14:49:06 +00:00
jhb	3a7e251600	Fix a comment nit. Submitted by: Alexander Best	2010-03-11 13:16:06 +00:00
jhb	9167691ca3	Add descriptions for debug.ktr sysctl nodes.	2010-03-10 21:35:42 +00:00
imp	3ebe1a7826	Bump up the firmware_table from 30 to 50. bwn needs more than 30, it seems.	2010-03-07 22:37:35 +00:00
alfred	88b3bf6496	put calls to gzclose() under ifdef COMPRESS_USER_CORES to prevent undefined symbols on kernels without this option. Reported by: Alexander Best	2010-03-04 21:53:45 +00:00
rrs	17cf51b0bb	sched_getparam was just plain broke for time-share processes. It did not return an error but instead just let garbage be passed back. This I fix so it actually properly translates the priority the process is at to a posix's high means more priority. I also fix it so that if the ULE scheduler has bumped it up to a realtime process you get back a sane value i.e. the highest priority (63 for time-share). sched_setscheduler() had the setting of the timeshare class priority disabled. With some notes about rejecting the posix high numbers is greater priority and use nice instead. This fix also adjusts that to work, with the cavet that a t-s process may well get bumped up or down i.e. the setscheduler() will NOT change the nice value only the current priority. I think this is reasonable considering if the user wants to play with nice then he can. At least all the posix'ish interfaces now respond sanely. MFC after: 3 weeks	2010-03-03 21:46:51 +00:00
jhb	f9290c7f6d	Allow lseek(SEEK_END) to work on disk devices by using the DIOCGMEDIASIZE to determine the media size. Submitted by: nox MFC after: 1 week	2010-03-03 16:18:04 +00:00
ivoras	14f9175723	Document the VM detection type and sysctl a bit better.	2010-03-02 23:57:42 +00:00
alfred	f34ce3dd38	Merge projects/enhanced_coredumps (r204346) into HEAD: Enhanced process coredump routines. This brings in the following features: 1) Limit number of cores per process via the %I coredump formatter. Example: if corefilename is set to %N.%I.core AND num_cores = 3, then if a process "rpd" cores, then the corefile will be named "rpd.0.core", however if it cores again, then the kernel will generate "rpd.1.core" until we hit the limit of "num_cores". this is useful to get several corefiles, but also prevent filling the machine with corefiles. 2) Encode machine hostname in core dump name via %H. 3) Compress coredumps, useful for embedded platforms with limited space. A sysctl kern.compress_user_cores is made available if turned on. To enable compressed coredumps, the following config options need to be set: options COMPRESS_USER_CORES device zlib # brings in the zlib requirements. device gzio # brings in the kernel vnode gzip output module. 4) Eventhandlers are fired to indicate coredumps in progress. 5) The imgact sv_coredump routine has grown a flag to pass in more state, currently this is used only for passing a flag down to compress the coredump or not. Note that the gzio facility can be used for generic output of gzip'd streams via vnodes. Obtained from: Juniper Networks Reviewed by: kan	2010-03-02 06:58:58 +00:00
bruno	3bef33deb1	Deliver siginfo when signal is generated by thr_kill(2) (SI_USER with properly filled si_uid and si_pid). Reported by: Joel Bertrand <joel.bertrand systella fr> PR: 141956 Reviewed by: kib MFC after: 2 weeks	2010-03-01 14:27:16 +00:00
rwatson	fc045dee13	Remove stale comment about socket buffer accounting from access(2) code. It is the case, however, that the uidinfo of the temporary credential set up for access(2) is not properly updated when its effective uid is changed. MFC after: 3 days	2010-02-27 19:57:40 +00:00
alc	83149d5d10	When running as a guest operating system, the FreeBSD kernel must assume that the virtual machine monitor has enabled machine check exceptions. Unfortunately, on AMD Family 10h processors the machine check hardware has a bug (Erratum 383) that can result in a false machine check exception when a superpage promotion occurs. Thus, I am disabling superpage promotion when the FreeBSD kernel is running as a guest operating system on an AMD Family 10h processor. Reviewed by: jhb, kib MFC after: 3 days	2010-02-27 18:00:57 +00:00
kib	a8e7da4cf2	For kinfo_proc in kp->ki_siglist, return the set of the signals pending in the process queue when gathering information for the process, and set of signals pending for the thread, when gathering information for the thread. Previously, the sysctl returned a union of the process and some arbitrary thread pending set for the process, and union of the process and the thread pending set for the thread. MFC after: 1 week	2010-02-27 15:32:49 +00:00
kib	695f0b496c	Fix several style issues. Define make_dev_credv() as static to match declaration. MFC after: 3 days	2010-02-27 15:26:36 +00:00
jilles	9a57b41b18	Include terminated threads in ps's process cpu time field. MFC after: 2 weeks	2010-02-27 12:15:59 +00:00

... 2 3 4 5 6 ...

11896 Commits