freebsd-nq

Author	SHA1	Message	Date
John Baldwin	6226ec3ef8	Retire PCONFIG and leave the priority of thread0 alone when waiting for interrupt config hooks to execute.	2011-01-06 22:09:37 +00:00
Edward Tomasz Napierala	7b956487e9	Fix page fault that occurred when trying to initialize preloaded kernel module, the dependency of which was preloaded, but failed to initialize. Previously, kernel dereferenced NULL pointer returned by modlist_lookup2(); now, when this happens, we unload the dependent module. Since the depended_files list is sorted in dependency order, this properly propagates, unloading modules that depend on failed ones. From the user point of view, this prevents the kernel from panicing when trying to boot kernel compiled without KDTRACE_HOOKS with dtraceall_load="YES" in /boot/loader.conf. Reviewed by: kib	2011-01-05 09:58:41 +00:00
John Baldwin	a5a07ded82	kproc_exit() is already marked __dead2 so a NOTREACHED comment here isn't needed for lint. Submitted by: bde	2011-01-04 13:16:28 +00:00
Konstantin Belousov	23b70c1ae2	Finish r210923, 210926. Mark some devices as eternal. MFC after: 2 weeks	2011-01-04 10:59:38 +00:00
John Baldwin	547ffb85d9	Small whitespace nits and add a comment explaining why kthread_exit() can call kproc_exit() that was lost earlier.	2011-01-03 16:29:00 +00:00
Edward Tomasz Napierala	3e73ff1e94	Finishing touches to fork1() - ANSIfy missed function definition, style(9) fixes, removal of few comments that didn't really make sense and addition of fork_findpid() locking requirements.	2011-01-02 12:16:57 +00:00
Bjoern A. Zeeb	5cc703974c	Mfp4 CH177924: Add and export constants of array sizes of jail parameters as compiled into the kernel. This is the least intrusive way to allow kvm to read the (sparse) arrays independent of the options the kernel was compiled with. Reviewed by: jhb (originally) MFC after: 1 week Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH	2010-12-31 22:49:13 +00:00
Konstantin Belousov	50cfe7fa50	Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the only consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY was cleared early in vm_object_page_clean, before the cleaning pass was done. This is no longer true after r216799. Moreover, since OBJ_CLEANING is a flag, and not the counter, it could be reset too prematurely when parallel vm_object_page_clean() are performed. Reviewed by: alc (as a part of the bigger patch) MFC after: 1 month (after r216799 is merged)	2010-12-29 22:26:49 +00:00
Attilio Rao	3d7acbbabf	Fix several callout migration races: - Problem1: Hypothesis: thread1 is doing a callout_reset_on(), within his callout handler, willing to implicitly or explicitly migrate the callout. thread2 is draining the callout. Thesys: * thread1 calls callout_lock() and locks the old callout cpu * thread1 performs the checks in the first path of the callout_reset_on() * thread1 hits this codepiece: /* * If the lock must migrate we have to check the state again as * we can't hold both the new and old locks simultaneously. / if (c->c_cpu != cpu) { c->c_cpu = cpu; CC_UNLOCK(cc); goto retry; } which means it will drop the lock and 'retry' thread2 will callout_lock() and locks the new callout cpu. thread1 spins on the new lock and will not keep going for the moment. * thread2 checks that the callout is not pending (as callout is currently running) and that it is not on cc->cc_curr (because cc now refers to the new callout and the callout is running on the old callout cpu) thus it thinks it is done and returns. * thread1 will now acquire the lock and then adds the callout to the new callout cpu queue That seems an obvious race as callout_stop() falsely reports the callout stopped or worse, callout_drain() falsely returns while the callout is still in use. - Solution1: Fixing this problem would require, in general, to lock both callout cpus at once while switching the c_cpu field and avoid cyclic deadlocks between callout cpus locks. The concept of CPUBLOCK is then introduced (working more or less like the blocked_lock for thread_lock() function) meaning: "in callout_lock(), spin until the c->c_cpu is not different from CPUBLOCK". That way the "original" callout cpu, referred to the above mentioned code snippet, will remain blocked until the lock handover is over critical path will remain covered. - Problem2: Having the callout currently executed on a specific callout cpu and contemporary pending on another callout cpu (as it can happen with current code) breaks, at least, the assumption callout_drain() returns just once the callout cannot be referenced anymore. - Solution2: Callout migration is deferred if the current callout is already under execution. The best place to do that is in softclock() and new members are added to the callout cpu structure in order to specify a pending migration is requested. That is necessary because the callout cannot be trusted (not freed) the 100% of times after the execution of the callout handler. CPUBLOCK will prevent, in the "deferred migration" case, that the callout gets freed in this case, stopping any callout_stop() and callout_drain() possible activity until the migration is actually performed. - Problem3: There is a further race in callout_drain(). In order to avoid a race between sleepqueue lock and callout cpu spinlock, in _callout_stop_safe(), the callout cpu lock is dropped, the sleepqueue lock is acquired and a new callout cpu lookup is performed. Note that the channel used for locking the sleepqueue is obtained from the "current" callout cpu (&cc->cc_waiting). If the callout migrated in the meanwhile, callout_drain() will end up using the wrong wchan for the sleepqueue (the locked one will be the older, while the new one will not really be locked) leading to a lock leak and a race access to sleepqueue. - Solution3: It is enough to check if a migration happened between the operation of acquiring the sleepqueue lock and the new callout cpu lock and eventually unwind all those and try again. This problems can lead to deathly races on moderate (4-ways) SMP environment, leading to easy panic or deadlocks. The 24-ways of the reporter, could easilly panic, with completely normal workload, almost daily. gianni@ kindly wrote the following prof-of-concept which can panic a FreeBSD machine in less than one hour, in smaller SMP: http://www.freebsd.org/~attilio/callout/test.c Reported by: Nicholas Esborn <nick at desert dot net>, DesertNet In collabouration with: gianni, pho, Nicholas Esborn Reviewed by: jhb MFC after: 1 week () Usually, I would aim for a larger MFC timeout, but I really want this in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special case for this patch	2010-12-29 18:17:36 +00:00
David Xu	c8e368a933	- Follow r216313, the sched_unlend_user_prio is no longer needed, always use sched_lend_user_prio to set lent priority. - Improve pthread priority-inherit mutex, when a contender's priority is lowered, repropagete priorities, this may cause mutex owner's priority to be lowerd, in old code, mutex owner's priority is rise-only.	2010-12-29 09:26:46 +00:00
Konstantin Belousov	7dbb59c7ce	Teach ddb "show mount" about MNTK_SUJ flag.	2010-12-27 12:06:38 +00:00
Alan Cox	5b2d228c44	Correct the order of the arguments to vm_fault_quick_hold_pages().	2010-12-26 01:42:52 +00:00
Alan Cox	82de724fe1	Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@	2010-12-25 21:26:56 +00:00
David Xu	1c45127bd3	Enlarge hash table for new condition variable.	2010-12-23 03:12:03 +00:00
David Xu	d1078b0b03	MFp4: - Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based condition variable, this should eliminate an extra system call to get current time. - Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single system call. Create userland sleep queue for condition variable, in most cases, thread will wait in the queue, the pthread_cond_signal will defer thread wakeup until the mutex is unlocked, it tries to avoid an extra system call and a extra context switch in time window of pthread_cond_signal and pthread_mutex_unlock. The changes are part of process-shared mutex project.	2010-12-22 05:01:52 +00:00
Matthew D Fleming	4b7c684420	Initialize fp_location for explicitly managed fail points, and push the parentheses around the location for simple fail points into the location string. This makes the print on fail point set more consistent between the two versions. Also fix up fail.h a little for style(9): only use one of sys/param.h and sys/types.h, and use the existing __XSTRING() macro instead of rolling our own. Also fix up a few tabs on changed and nearby lines. Lastly, since KFAIL_POINT_{BEGIN,END} are not meant for use outside this file, just eliminate the macros entirely. MFC after: 1 week	2010-12-21 18:23:03 +00:00
Matthew D Fleming	d26b794591	Move the fail_point_entry definition from fail.h to kern_fail.c, which allows putting the enumeration constants of fail point types with the text string that matches them. MFC after: 1 week	2010-12-21 16:29:58 +00:00
Lawrence Stewart	a8d61afdc2	- Introduce the Hhook (Helper Hook) KPI. The KPI is closely modelled on pfil(9), and in many respects can be thought of as a more generic superset of pfil. Hhook provides a way for kernel subsystems to export hook points that Khelp modules can hook to provide enhanced or new functionality to the kernel. The KPI has been designed to ensure hook points pose no noticeable overhead when no hook functions are registered. - Introduce the Khelp (Kernel Helpers) KPI. Khelp provides a framework for managing Khelp modules, which indirectly use the Hhook KPI to register their hook functions with hook points of interest within the kernel. Khelp modules aim to provide a structured way to dynamically extend the kernel at runtime in an ABI preserving manner. Depending on the subsystem providing hook points, a Khelp module may be able to associate per-object data for maintaining relevant state between hook calls. - pjd's Object Specific Data (OSD) KPI is used to manage the per-object data allocated to Khelp modules. Create a new "OSD_KHELP" OSD type for use by the Khelp framework. - Bump __FreeBSD_version to 900028 to mark the introduction of the new KPIs. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz, others along the way MFC after: 3 months	2010-12-21 13:45:29 +00:00
Alan Cox	acd11c7499	Introduce vm_fault_hold() and use it to (1) eliminate a long-standing race condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks	2010-12-20 22:49:31 +00:00
Alan Cox	8c22654d7e	Implement and use a single optimized function for unholding a set of pages. Reviewed by: kib@	2010-12-17 22:41:22 +00:00
John Baldwin	36b4cd243e	Add back a bounds check on valid idle priorities that was lost in an earlier commit. While here, move the thread lock down in rtp_to_pri(). It is not needed for all of the priority value checks and the computation of newpri. Reported by: swell.k @ gmail MFC after: 3 days	2010-12-17 16:29:06 +00:00
Matthew D Fleming	e0f389c8d3	One of the compat32 functions was copying in a raw timespec, instead of a 32-bit one. This can cause weird timeout issues, as the copying reads garbage from the user. Code by: Deepak Veliath <deepak dot veliath at isilon dot com> MFC after: 1 week	2010-12-15 19:30:44 +00:00
Pawel Jakub Dawidek	b452cf6317	Just pass M_ZERO to malloc(9) instead of clearing allocated memory separately.	2010-12-14 06:19:13 +00:00
Edward Tomasz Napierala	4c7bba9985	Adapt filesystem-independent NFSv4 ACL code (used by UFS, but not by ZFS) to PSARC/2010/029. In short, the semantics is simplified - "weird stuff" no longer happens after chmod, entries don't get duplicated during inheritance, and trivial ACLs no longer contain three "DENY" entries, which is also more friendly to MS Windows. By default, UFS keeps using old semantics. To change it, set sysctl vfs.acl_nfs4_old_semantics to 0. I'll flip the switch when ZFSv28 hits the tree, to keep these two in sync - ZFS v28 uses PSARC semantics, and ZFS v15 uses the old one.	2010-12-13 18:56:04 +00:00
Hans Petter Selasky	0bad52e1d8	Fix race in devfs by using LIST_FIRST() instead of LIST_FOREACH_SAFE() when freeing the devfs private data entries. Reviewed by: kib MFC after: 3 days Approved by: thompsa (mentor)	2010-12-11 08:44:10 +00:00
Edward Tomasz Napierala	afd01097a0	Refactor fork1() to make it easier to follow. No functional changes. Reviewed by: kib (earlier version) Tested by: pho	2010-12-10 08:33:56 +00:00
Bjoern A. Zeeb	4befa84f9c	Don't tie ct_debug to bootverbose. Provide a sysctl to turn it on or off. Switch the default to always off. Reviewed by: kib	2010-12-09 22:02:48 +00:00
David Xu	ec6ea5e86d	MFp4: The unit number allocator reuses ID too fast, this may hide bugs in other code, add a ring buffer to delay freeing a thread ID.	2010-12-09 05:16:20 +00:00
David Xu	acbe332a58	MFp4: It is possible a lower priority thread lending priority to higher priority thread, in old code, it is ignored, however the lending should always be recorded, add field td_lend_user_pri to fix the problem, if a thread does not have borrowed priority, its value is PRI_MAX. MFC after: 1 week	2010-12-09 02:42:02 +00:00
Edward Tomasz Napierala	087bfb0e6b	Add a KASSERT to make it obvious when fork_norfproc() is to be called, and set *procp to NULL in all cases. Previously, it was not being set in the ERESTART case. This is effectively no-op, since its value is ignored by callers in the error case. Reviewed by: kib@	2010-12-06 19:15:38 +00:00
Edward Tomasz Napierala	f68c74bbd3	Fix style bug introduced by previous commit.	2010-12-06 16:45:36 +00:00
Edward Tomasz Napierala	1d845e8638	Improve readability by factoring out the !RFPROC case. While here, turn K&R function definitions into ANSI. No functional changes. Reviewed by: kib@	2010-12-06 16:39:18 +00:00
Konstantin Belousov	9f4ba450f2	Trim whitespaces at the end of lines. Use the commit to record proper log message for r216150. MFC after: 1 week If unix socket has a unix socket attached as the rights that has a unix socket attached as the rights that has a unix socket attached as the rights ... Kernel may overflow the stack on attempt to close such socket. Only close the rights file in the context of the current close if the file is not unix domain socket. Otherwise, postpone the work to taskqueue, preventing unlimited recursion. The pass of the unix domain sockets over the SCM_RIGHTS message control is not widely used, and more, the close of the socket with still attached rights is mostly an application failure. The change should not affect the performance of typical users of SCM_RIGHTS. Reviewed by: jeff, rwatson	2010-12-03 20:39:06 +00:00
Konstantin Belousov	0cb64678bc	Reviewed by: jeff, rwatson MFC after: 1 week	2010-12-03 16:15:44 +00:00
Edward Tomasz Napierala	ef694c1ac4	Replace pointer to "struct uidinfo" with pointer to "struct ucred" in "struct vm_object". This is required to make it possible to account for per-jail swap usage. Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation	2010-12-02 17:37:16 +00:00
Warner Losh	704c91294b	removed tag is '-', not '+'. remove extra return.	2010-12-02 04:28:01 +00:00
Edward Tomasz Napierala	26778a6c82	Remove useless NULL checks for M_WAITOK mallocs.	2010-12-02 01:14:45 +00:00
Warner Losh	5cb51b647c	Remove redundant (and bogus) insertion of pnp info when announcing new and retiring devices. That's already inserted elsewhere. Submitted by: n_hibma MFC after: 3 days	2010-11-30 05:54:21 +00:00
Matthew D Fleming	dd6312a7c1	Fix uninitialized variable warning that shows on Tinderbox but not my setup. (??) Submitted by: Michael Butler <imb at protected-networks dot net>	2010-11-29 21:53:21 +00:00
Matthew D Fleming	ccecef29d1	Do not hold the sysctl lock across a call to the handler. This fixes a general LOR issue where the sysctl lock had no good place in the hierarchy. One specific instance is #284 on http://sources.zabbadoz.net/freebsd/lor.html . Reviewed by: jhb MFC after: 1 month X-MFC-note: split oid_refcnt field for oid_running to preserve KBI	2010-11-29 18:18:07 +00:00
Matthew D Fleming	d0bb6f258b	Slightly modify the logic in sysctl_find_oid to reduce the indentation. There should be no functional change. MFC after: 3 days	2010-11-29 18:18:00 +00:00
Matthew D Fleming	5127ecb89c	Use the SYSCTL_CHILDREN macro in kern_sysctl.c to help de-obfuscate the code. MFC after: 3 days	2010-11-29 18:17:53 +00:00
Konstantin Belousov	eea7f71c81	Account i/o done on cdevs. Reported and tested by: Adam Vande More <amvandemore gmail com> MFC after: 1 week	2010-11-25 20:05:11 +00:00
Konstantin Belousov	f5eb95b1fc	Allow shared-locked vnode to be passed to vunref(9). When shared-locked vnode is supplied as an argument to vunref(9) and resulting usecount is 0, set VI_OWEINACT and do not try to upgrade vnode lock. The later could cause vnode unlock, allowing the vnode to be reclaimed meantime. Tested by: pho MFC after: 1 week	2010-11-24 12:30:41 +00:00
Andriy Gapon	706b0d31bb	taskqueue: drop unused tq_name field tq_name was used write-only and besides it was just a pointer, so it could point to some garbage in a temporary buffer that's gone. This change shouldn't change KPI/KBI as struct taskqueue is private to subr_taskqueue.c. If we find a need for tq_name it can be resurrected at any moment. taskqueue_create() interface is preserved for this purpose. Suggested by: jhb MFC after: 10 days	2010-11-23 14:30:22 +00:00
Sergey Kandaurov	f03749ca2d	Update MNT_ROOTFS comments after changes in the root mount logic. Reported by: arundel Suggested by: marcel (phrasing) Approved by: kib (mentor)	2010-11-23 13:49:15 +00:00
Colin Percival	772d1e42a2	Add parentheses for clarity. The parentheses around the two terms of the && are unnecessary but I'm leaving them in for the sake of avoiding confusion (I confuse easily). Submitted by: bde	2010-11-23 04:50:01 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Attilio Rao	57c153804a	Style fix. Sponsored by: Sandvine Incorporated Requested by: jhb Reviewed by: jhb MFC after: 1 week X-MFC: 215544	2010-11-22 15:28:54 +00:00
Attilio Rao	7f08176ee8	Add the ability for GDB to printout the thread name along with other thread specific informations. In order to do that, and in order to avoid KBI breakage with existing infrastructure the following semantic is implemented: - For live programs, a new member to the PT_LWPINFO is added (pl_tdname) - For cores, a new ELF note is added (NT_THRMISC) that can be used for storing thread specific, miscellaneous, informations. Right now it is just popluated with a thread name. GDB, then, retrieves the correct informations from the corefile via the BFD interface, as it groks the ELF notes and create appropriate pseudo-sections. Sponsored by: Sandvine Incorporated Tested by: gianni Discussed with: dim, kan, kib MFC after: 2 weeks	2010-11-22 14:42:13 +00:00
Colin Percival	aa519c0a64	In tc_windup, handle the case where the previous call to tc_windup was more than 1s earlier. Prior to this commit, the computation of th_scale * delta (which produces a 64-bit value equal to the time since the last tc_windup call in units of 2^(-64) seconds) would overflow and any complete seconds would be lost. We fix this by repeatedly converting tc_frequency units of timecounter to one seconds; this is not exactly correct, since it loses the NTP adjustment, but if we find ourselves going more than 1s at a time between clock interrupts, losing a few seconds worth of NTP adjustments is the least of our problems...	2010-11-22 09:13:25 +00:00
Alexander Leidinger	bb63fdde6d	By using the 32-bit Linux version of Sun's Java Development Kit 1.6 on FreeBSD (amd64), invocations of "javac" (or "java") eventually end with the output of "Killed" and exit code 137. This is caused by: 1. After calling exec() in multithreaded linux program threads are not destroyed and continue running. They get killed after program being executed finishes. 2. linux_exit_group doesn't return correct exit code when called not from group leader. Which happens regularly using sun jvm. The submitters fix this in a similar way to how NetBSD handles this. I took the PRs away from dchagin, who seems to be out of touch of this since a while (no response from him). The patches committed here are from [2], with some little modifications from me to the style. PR: 141439 [1], 144194 [2] Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk Reviewed by: rdivacky (in april 2010) MFC after: 5 days	2010-11-22 09:06:59 +00:00
David Xu	b169d0efa1	Use atomic instruction to set _has_writer, otherwise there is a race causes userland to not wake up a thread sleeping in kernel. MFC after: 3 days	2010-11-22 02:42:02 +00:00
Konstantin Belousov	730b63b0c2	Remove prtactive variable and related printf()s in the vop_inactive and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount. MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c	2010-11-19 21:17:34 +00:00
Attilio Rao	772753491b	Scan the list in reverse order for the shutdown handlers of loaded modules. This way, when there is a dependency between two modules, the handler of the latter probed runs first. This is a similar approach as the modules are unloaded in the same linkerfile. Sponsored by: Sandvine Incorporated Submitted by: Nima Misaghian <nmisaghian at sandvine dot com> MFC after: 1 week	2010-11-19 19:43:56 +00:00
John Baldwin	2e7758a8f6	Set the POSIX semaphore capability when the semaphore module is enabled. This is ignored in HEAD where semaphores are marked as always enabled in <unistd.h>. MFC after: 1 week	2010-11-19 17:57:50 +00:00
John Baldwin	34c1c5992f	Set various POSIX capability sysctls to the version of the API that is supported rather than 1. They are supposed to return a suitable value for sysconf(3). While here, make the fsync sysctl match <unistd.h>. MFC after: 1 week	2010-11-19 17:56:16 +00:00
John Baldwin	144df3a28f	Add a resource_list_reserved() method that returns true if a resource list entry contains a reserved resource.	2010-11-17 22:28:04 +00:00
Olivier Houchard	0c87b5e243	No need to include sys/systm.h twice.	2010-11-16 14:08:21 +00:00
David Xu	32c63db519	Only unlock process if a thread is found.	2010-11-15 07:33:54 +00:00
Dimitry Andric	31c6a0037e	Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.	2010-11-14 20:38:11 +00:00
Dimitry Andric	5f67450d3a	Similar to sys/net/vnet.h, define the linker set name for sys/sys/pcpu.h as a macro, and use it instead of literal strings.	2010-11-14 20:14:25 +00:00
Rebecca Cran	8d065a3914	Fix some more style(9) issues.	2010-11-14 16:10:15 +00:00
Ed Schouten	eb4c31fd41	Add support for asterisk characters when filling in the GELI password during boot. Change the last argument of gets() to indicate a visibility flag and add definitions for the numerical constants. Except for the value 2, gets() will behave exactly the same, so existing consumers shouldn't break. We only use it in two places, though. Submitted by: lme (older version)	2010-11-14 14:12:43 +00:00
Rebecca Cran	b389be97db	Fix style(9) issues from r215281 and r215282. MFC after: 1 week	2010-11-14 08:06:29 +00:00
Rebecca Cran	5d7abc8777	Add descriptions to some more sysctls. PR: kern/148510 MFC after: 1 week	2010-11-14 07:38:42 +00:00
Rebecca Cran	2baa5cddb6	Add some descriptions to sys/kern sysctls. PR: kern/148710 Tested by: Chip Camden <sterling at camdensoftware.com> MFC after: 1 week	2010-11-14 06:09:50 +00:00
Edward Tomasz Napierala	4220337804	Remove unused variables.	2010-11-13 11:54:04 +00:00
Luigi Rizzo	5c9d0a9ad3	This commit implements the SO_USER_COOKIE socket option, which lets you tag a socket with an uint32_t value. The cookie can then be used by the kernel for various purposes, e.g. setting the skipto rule or pipe number in ipfw (this is the reason SO_USER_COOKIE has been implemented; however there is nothing ipfw-specific in its implementation). The ipfw-related code that uses the optopn will be committed separately. This change adds a field to 'struct socket', but the struct is not part of any driver or userland-visible ABI so the change should be harmless. See the discussion at http://lists.freebsd.org/pipermail/freebsd-ipfw/2009-October/004001.html Idea and code from Paul Joe, small modifications and manpage changes by myself. Submitted by: Paul Joe MFC after: 1 week	2010-11-12 13:02:26 +00:00
Edward Tomasz Napierala	a37e14e1d8	Fix style. Submitted by: bde	2010-11-11 21:53:46 +00:00
Edward Tomasz Napierala	98e0196aef	Remove unneeded conditional. Discussed with: kib	2010-11-11 08:15:12 +00:00
Attilio Rao	9f518f2068	Fix typos. Submitted by: gianni MFC after: 3 days	2010-11-10 21:06:49 +00:00
John Baldwin	961135ead8	- Remove <machine/mutex.h>. Most of the headers were empty, and the contents of the ones that were not empty were stale and unused. - Now that <machine/mutex.h> no longer exists, there is no need to allow it to override various helper macros in <sys/mutex.h>. - Rename various helper macros for low-level operations on mutexes to live in the _mtx_* or __mtx_* namespaces. While here, change the names to more closely match the real API functions they are backing. - Drop support for including <sys/mutex.h> in assembly source files. Suggested by: bde (1, 2)	2010-11-09 20:46:41 +00:00
Rebecca Cran	b1ce21c6ef	Fix typos. PR: bin/148894 Submitted by: olgeni	2010-11-09 10:59:09 +00:00
Juli Mallett	b79b28b69d	Use macros rather than inline functions to lock and unlock mutexes, so that line number information is preserved in witness. Reviewed by: jhb	2010-11-08 22:12:25 +00:00
Matthew D Fleming	2f22b3ffe6	Whitespace and other aspects of style(9). No functional changes. MFC after: 3 days	2010-11-08 20:57:08 +00:00
Matthew D Fleming	f46276a9b0	Add a taskqueue_cancel(9) to cancel a pending task without waiting for it to run as taskqueue_drain(9) does. Requested by: hselasky Original code: jeff Reviewed by: jhb MFC after: 2 weeks	2010-11-08 20:56:31 +00:00
Alexander Motin	c70410e6f5	On APs startup skip hard-/statclock events, which time passed before CPU was lauched. Few seconds event burst, accumulated during long startup, reported to cause panic in SCHED_ULE priority calculation logic.	2010-11-08 15:25:12 +00:00
Jaakko Heinonen	ff91cc99dd	Add missing curly brackets. By chance, the missing brackets didn't alter the code behavior. Submitted by: Lucius Windschuh	2010-11-07 14:28:01 +00:00
John Baldwin	3350df4899	Remove 'softclock_ih' as it is no longer used.	2010-11-03 15:38:52 +00:00
John Baldwin	b58508045b	Tweak the waitchannel messages for the dead lock detection kthread. Use a shorter message (userland generally only sees the first 6 to 8 characters) when waiting for the allproc lock. Use "-" when idle to math the behavior of other kthreads. Reviewed by: attilio MFC after: 1 week	2010-11-02 18:34:31 +00:00
David Xu	444528c026	Use integer for size of cpuset, as it won't be bigger than INT_MAX, This is requested by bge. Also move the sysctl into file kern_cpuset.c, because it should always be there, it is independent of thread scheduler.	2010-11-01 00:42:25 +00:00
Alexander Motin	189795fe68	Fix callout_tickstofirst() behavior after signed integer ticks overflow. This should fix callout precision drop to 1/4s after 25 days of uptime with HZ = 1000. Submitted by: Taku YAMAMOTO <taku@tackymt.homeip.net>	2010-10-31 11:44:41 +00:00
Konstantin Belousov	3a40a00d56	Remove sysctl debug.ncnegfactor, it is renamed to vfs.ncnegfactor. MFC: do not	2010-10-30 14:08:26 +00:00
Edward Tomasz Napierala	252e4a96e6	Fix uninitialized variable. Found with: Coverity Prevent(tm) CID: 8632	2010-10-29 19:07:36 +00:00
David Xu	b67cc292dc	Add sysctl kern.sched.cpusetsize to export the size of kernel cpuset, also add sysconf() key _SC_CPUSET_SIZE to get sysctl value. Submitted by: gcooper	2010-10-29 13:31:10 +00:00
John Baldwin	b94e6f0ef6	Set bootverbose directly in mi_startup() rather than via a SYSINIT. This ensures 'bootverbose' is in a valid state for all SYSINITs. Reported by: avg MFC after: 1 week	2010-10-28 14:17:06 +00:00
David Xu	4a5478709b	- Revert r214409. - Use long word to figure out sizeof kernel cpuset, hope it works.	2010-10-27 09:29:03 +00:00
David Xu	1676b42546	If input parameter cpusetsize is zero, give userland size of cpuset mask kernel is using.	2010-10-27 02:32:54 +00:00
Ivan Voras	61eee6b8a7	Reduce the difference between hirunningspace and lorunningspace, it should help interactivity in edge cases.	2010-10-25 14:05:25 +00:00
David Xu	42fe684c1a	Use function tdfind() to find a thread.	2010-10-25 13:13:16 +00:00
Rebecca Cran	fd104c151b	Mostly revert r203420, and add similar functionality into ada(4) since the existing code caused problems with some SCSI controllers. A new sysctl kern.cam.ada.spindown_shutdown has been added that controls whether or not to spin-down disks when shutting down. Spinning down the disks unloads/parks the heads - this is much better than removing power when the disk is still spinning because otherwise an Emergency Unload occurs which may cause damage to the actuator. PR: kern/140752 Submitted by: olli Reviewed by: arundel Discussed with: mav MFC after: 2 weeks	2010-10-24 16:31:57 +00:00
Edward Tomasz Napierala	880cb81c5a	Remove workaround for ZFS bug; fix was committed to the //depot/user/pjd/zfs/... branch some time ago. MFC after: two weeks	2010-10-23 14:22:50 +00:00
David Xu	0d036d55e7	In thr_exit() and kthread_exit(), only remove thread from hash if it can directly exit, otherwise let exit1() do it. The change should be in r213950, but for unknown reason, it was lost.	2010-10-23 13:16:39 +00:00
Xin LI	5e5fd037d6	Call chainevh callback when we are invoked with neither MOD_LOAD nor MOD_UNLOAD. This makes it possible to add custom hooks for other module events. Return EOPNOTSUPP when there is no callback available. Pointed out by: jhb Reviewed by: jhb MFC after: 1 month	2010-10-21 20:31:50 +00:00
John Baldwin	d680caab73	- When disabling ktracing on a process, free any pending requests that may be left. This fixes a memory leak that can occur when tracing is disabled on a process via disabling tracing of a specific file (or if an I/O error occurs with the tracefile) if the process's next system call is exit(). The trace disabling code clears p_traceflag, so exit1() doesn't do any KTRACE-related cleanup leading to the leak. I chose to make the free'ing of pending records synchronous rather than patching exit1(). - Move KTRACE-specific logic out of kern_(exec\|exit\|fork).c and into kern_ktrace.c instead. Make ktrace_mtx private to kern_ktrace.c as a result. MFC after: 1 month	2010-10-21 19:17:40 +00:00
Xin LI	00e3c12e03	In syscall_module_handler(): all switch branches return, remove unreached code as pointed out in a Chinese forum [1]. [1] http://www.freebsdchina.org/forum/viewtopic.php?t=50619 Pointed out by: btw616 <btw s qq com> MFC after: 1 month	2010-10-21 08:57:25 +00:00
David Xu	cfca8a1862	- Don't include sx.h, it is not needed. - Check NULL pointer, move timeout calculation code outside of process lock.	2010-10-20 00:41:38 +00:00
Andrey V. Elsukov	366523d101	ZFS pool name is not a real device in devfs. Do not wait for device appear when mounting root from ZFS. Reviewed by: marcel Approved by: mav (mentor)	2010-10-19 18:32:01 +00:00
Ed Maste	c4965cfc44	We've already set p = td->td_proc, so use it.	2010-10-18 15:46:58 +00:00
Marcel Moolenaar	e25daafbb6	Re-implement the root mount logic using a recursive approach, whereby each root file system (starting with devfs and a synthesized configuration) can contain directives for mounting another file system as root. The old root file system is re-mounted under the new root file system (with /.mount or /mnt as the mount point) to allow access to the underlying file system. The configuration allows for creating vnode-backed memory disks that can subsequently be mounted as root. This allows for an efficient and low- cost way to distribute and boot FreeBSD software images that reside on some storage media. When trying a mount, the kernel will wait for the device in question to arrive. The timeout is configurable and is part of the configuration. This allows arbitrarily complex GEOM configurations to be constructed on the fly. A side-effect of this change is that all root specifications, whether compiled into the kernel or typed at the prompt can contain root mount options.	2010-10-18 05:01:53 +00:00
Marcel Moolenaar	c1f0aabb9f	In vfs_filteropt(), only print the errmsg when there's no errmsg mount option. Otherwise errors tend to get printed multiple times.	2010-10-18 04:34:42 +00:00
Marcel Moolenaar	76e18b25a0	Rename boot() to kern_reboot() and make it visible outside of kern_shutdown.c. This makes it easier for emulators and other parts of the kernel to initiate a reboot.	2010-10-18 04:30:27 +00:00
Nathan Whitehorn	c8593f7c4d	Fix an XXX comment by answering 'no'. OS X does not set the day-of-week counter on SMU-based systems, which causes FreeBSD to reject the RTC time when used in a dual-boot environment. Since we don't use the day-of-week counter anyway, solve this by just not checking that it matches. MFC after: 3 weeks	2010-10-17 17:31:49 +00:00
David Xu	21ecd1e977	- Insert thread0 into correct thread hash link list. - In thr_exit() and kthread_exit(), only remove thread from hash if it can directly exit, otherwise let exit1() do it. - In thread_suspend_check(), fix cleanup code when thread needs to exit. This change seems fixed the "Bad link elm " panic found by Peter Holm. Stress testing: pho	2010-10-17 11:01:52 +00:00
Konstantin Belousov	420cfbb460	Provide vfs.ncsizefactor instead of hard-coding namecache ratio. Move debug.ncnegfactor to vfs.ncnegfactor [1]. Provide some descriptions for the namecache related sysctls [1]. Based on the submission by: Rogier R. Mulhuijzen <drwilco drwilco net> [1] MFC after: 2 weeks X-MFC-note: remove debug.ncnegfactor in HEAD after MFC	2010-10-16 09:44:31 +00:00
David Xu	407af02b6e	In kern_sigtimedwait(), move initialization code out of process lock, instead of using SIGISMEMBER to test every interesting signal, just unmask the signal set and let cursig() return one, get the signal after it returns, call reschedule_signal() after signals are blocked again. In kern_sigprocmask(), don't call reschedule_signal() when it is unnecessary. In reschedule_signal(), replace SIGISEMPTY() + SIGISMEMBER() with sig_ffs(), rename variable 'i' to sig.	2010-10-14 08:01:33 +00:00
Matthew D Fleming	bf73d4d28e	Use a safer mechanism for determining if a task is currently running, that does not rely on the lifetime of pointers being the same. This also restores the task KBI. Suggested by: jhb MFC after: 1 month	2010-10-13 22:59:04 +00:00
David Xu	fc4ecc1d48	sigqueue_collect_set() is no longer needed because other functions maintain pending set correctly.	2010-10-13 06:28:40 +00:00
Matthew D Fleming	a92f0ee866	Re-expose and briefly document taskqueue_run(9). The function is used in at least one 3rd party driver. Requested by: jhb	2010-10-12 18:36:03 +00:00
Andriy Gapon	9ddb6637b8	generic_stop_cpus: prevent parallel execution This is based on the same approach as used in panic(). In theory parallel execution of generic_stop_cpus() could lead to two CPUs stopping each other and everyone else, and thus a total system halt. Also, in theory, we should have some smarter locking here, because two (or more CPUs) could be stopping unrelated sets of CPUs. But in practice, it seems, this function is only used to stop "all other" CPUs. Additionally, I took this opportunity to make amd64-specific suspend_cpus() function use generic_stop_cpus() instead of rolling out essentially duplicate code. This code is based on code by Sandvine Incorporated. Suggested by: mdf Reviewed by: jhb, jkim (earlier version) MFC after: 2 weeks	2010-10-12 17:40:45 +00:00
David Xu	96f231fde9	Add a flag TDF_TIDHASH to prevent a thread from being added to or removed from thread hash table multiple times.	2010-10-12 00:36:56 +00:00
Konstantin Belousov	d0cc54f3b4	The r184588 changed the layout of struct export_args, causing an ABI breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks	2010-10-10 07:05:47 +00:00
Andriy Gapon	95bb9d38b8	add kmem_map_free sysctl: query largest contiguous free range in kmem_map Suggested by: alc Reviewed by: alc MFC after: 1 week	2010-10-09 09:03:17 +00:00
Andriy Gapon	64dd590ece	panic_cpu variable should be volatile This is to prevent caching of its value in a register when it is checked and modified by multiple CPUs in parallel. Also, move the variable into the scope of the only function that uses it. Reviewed by: jhb Hint from: mdf MFC after: 1 week	2010-10-09 08:07:49 +00:00
David Xu	cf7d9a8ca8	Create a global thread hash table to speed up thread lookup, use rwlock to protect the table. In old code, thread lookup is done with process lock held, to find a thread, kernel has to iterate through process and thread list, this is quite inefficient. With this change, test shows in extreme case performance is dramatically improved. Earlier patch was reviewed by: jhb, julian	2010-10-09 02:50:23 +00:00
Ed Maste	6239ef1d29	Make a thread's address available via the kern proc sysctl, just like the process address. Add "tdaddr" keyword to ps(1) to display this thread address. Distilled from Sandvine's patch set by Mark Johnston.	2010-10-08 00:44:53 +00:00
Andriy Gapon	7814c80a5b	vm.kmem_map_size: a sysctl to query current kmem_map->size Based on a patch from Sandvine Incorporated via emaste. Reviewed by: emaste MFC after: 1 week	2010-10-07 18:11:33 +00:00
Jaakko Heinonen	68f7a01392	Check the device name validity on device registration. A new function prep_devname() sanitizes a device name by removing leading and redundant sequential slashes. The function returns an error for names which already exist or are considered invalid. A new flag MAKEDEV_CHECKNAME for make_dev_p(9) and make_dev_credf(9) indicates that the caller is prepared to handle an error related to the device name. An invalid name triggers a panic if the flag is not specified. Document the MAKEDEV_CHECKNAME flag in the make_dev(9) manual page. Idea from: kib Reviewed by: kib	2010-10-07 18:00:55 +00:00
Warner Losh	3a5c13580e	Adjust the all target message (but maybe all: sysent is better?	2010-10-02 22:12:41 +00:00
Warner Losh	a2869bda12	Turns out this file was how we make sysent stuff, so add that part only back...	2010-10-02 21:35:33 +00:00
Marcel Moolenaar	24e01f5998	Split the root mount logic from the (generic) mount code and move it (the root mount code) into a new file called vfs_mountroot.c The split is almost trivial, as the code is almost perfectly non-intertwined. The only adjustment needed was to move the UMA zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and into vfs_mount.c, where it had to be done as a SYSINIT [see vfs_mount_init()]. There are no functional changes with this commit.	2010-10-02 19:44:13 +00:00
Konstantin Belousov	f1d2d3052a	Release the vnode lock and close the linker file vnode earlier in the linker_load_file methods. The change is that the consequent linker_file_unload() call is not under the vnode lock anymore. This prevents the LOR between kernel linker sx xlock and vnode lock, because linker_file_unload() relocks kernel linker lock. MFC after: 2 weeks	2010-10-02 16:04:50 +00:00
Andriy Gapon	08a9c20500	sysctls in kern_shutdown: add twin tunables also make couple of sysctl-controlled variables static Reviewed by: rwatson MFC after: 1 week	2010-10-01 09:34:41 +00:00
Andriy Gapon	7b9df13bcd	there must be only one SYSINIT with SI_SUB_RUN_SCHEDULER+SI_ORDER_ANY order SI_SUB_RUN_SCHEDULER+SI_ORDER_ANY should only be used to call scheduler() function which turns the initial thread into swapper proper and thus there is no further SYSINIT processing. Other SYSINITs with SI_SUB_RUN_SCHEDULER+SI_ORDER_ANY may get ordered after scheduler() and thus never executed. That particular relative order is semi-arbitrary. Thus, change such places to use SI_ORDER_MIDDLE. Also, use SI_ORDER_MIDDLE instead of correct, but less appealing, SI_ORDER_ANY - 1. MFC after: 1 week	2010-09-30 17:05:23 +00:00
Andriy Gapon	10b2a365a0	debug.kdb.stop_cpus sysctl: hint that this is also a tunable MFC after: 1 week	2010-09-30 16:47:01 +00:00
Andriy Gapon	d801e824f6	kmem_size* sysctls: hint that these are also tunables MFC after: 1 week	2010-09-30 16:45:27 +00:00
David Xu	931e4573df	- kern_sched_rr_get_interval should return interval for thread 1 in target process. - eliminate a goto. MFC after: 1 week	2010-09-29 07:31:05 +00:00
Warner Losh	1115f627db	This file has been unused for ages. Retire it. Submitted by: pluknet	2010-09-28 15:33:30 +00:00
Ed Maste	25e45560da	Remove extra braces for style(9) (found while cleaning up an old work tree).	2010-09-28 01:36:01 +00:00
Andriy Gapon	61548876b1	kdb_backtrace: use stack_print_ddb instead of stack_print This is a followup to r212964. stack_print call chain obtains linker sx lock and thus potentially may lead to a deadlock depending on a kind of a panic. stack_print_ddb doesn't acquire any locks and it doesn't use any facilities of ddb backend. Using stack_print_ddb outside of DDB ifdef required taking a number of helper functions from under it as well. It is a good idea to rename linker_ddb_* and stack_*_ddb functions to have 'unlocked' component in their name instead of 'ddb', because those functions do not use any DDB services, but instead they provide unlocked access to linker symbol information. The latter was previously needed only for DDB, hence the 'ddb' name component. Alternative is to ditch unlocked versions altogether after implementing proper panic handling: 1. stop other cpus upon a panic 2. make all non-spinlock lock operations (mutex, sx, rwlock) be a no-op when panicstr != NULL Suggested by: mdf Discussed with: attilio MFC after: 2 weeks	2010-09-22 06:45:07 +00:00
Alexander Motin	9dfc483c4a	If kernel built with DEVICE_POLLING, keep one CPU always in active state to handle it.	2010-09-22 05:32:37 +00:00
John Baldwin	a8103ae8ca	Comment nit, set TDF_NEEDRESCHED after the comment describing why it is done rather than before. MFC after: 1 week	2010-09-21 19:12:22 +00:00
Alexander Motin	bcb74c4c95	If new callout scheduled to another CPU and we are using global timer, there is high probability that timer is already programmed by some other CPU. Especially by one that registered this callout, and so active now.	2010-09-21 17:37:28 +00:00
Alexander Motin	afe41f2da7	Remember last kern.eventtimer.periodic value, explicitly set by user. If timer capabilities forcing us to change periodicity mode, try to restore it back later, as soon as new choosen timer capable to do it. Without this, timer change like HPET->RTC->HPET always results in enabling periodic mode.	2010-09-21 16:50:24 +00:00
Alan Cox	8f7f5a7f26	Fix exec_imgact_shell()'s handling of two error cases: (1) Previously, if the first line of a script exceeded MAXSHELLCMDLEN characters, then exec_imgact_shell() silently truncated the line and passed on the truncated interpreter name or argument. Now, exec_imgact_shell() will fail and return ENOEXEC, which is the commonly used errno among Unix variants for this type of error. (2) Previously, exec_imgact_shell()'s check on the length of the interpreter's name was ineffective. In other words, exec_imgact_shell() could not possibly fail and return ENAMETOOLONG. The reason being that the length of the interpreter name had to exceed MAXSHELLCMDLEN characters in order that ENAMETOOLONG be returned. But, the search for the end of the interpreter name stops after at most MAXSHELLCMDLEN - 2 characters are scanned. (In the end, this particular error is eventually discovered outside of exec_imgact_shell() and ENAMETOOLONG is returned. So, the real effect of this second change is that the error is detected earlier, in exec_imgact_shell().) Update the definition of MAXINTERP to the actual limit on the size of the interpreter name that has been in effect since r142453 (from 2005). In collaboration with: kib	2010-09-21 16:24:51 +00:00
Andriy Gapon	088acbb312	kdb_backtrace: stack(9)-based code to print backtrace without any backend The idea is to add KDB and KDB_TRACE options to GENERIC kernels on stable branches, so that at least the minimal information is produced for non-specific panics like traps on page faults. The GENERICs in stable branches seem to already include STACK option. Reviewed by: attilio MFC after: 2 weeks	2010-09-21 15:07:44 +00:00
Alexander Motin	95d23438dd	Until hardclock() and respectively tc_windup() called first time, system is running on "dummy" time counter. But to function properly in one-shot mode, event timer management code requires working time counter. Slow moving "dummy" time counter delays first hardclock() call by few seconds on my systems, even though timer interrupts were correctly kicking kernel. That causes few seconds delay during boot with one-shot mode enabled. To break this loop, explicitly call tc_windup() first time during initialization process to let it switch to some real time counter.	2010-09-21 08:02:02 +00:00
Edward Tomasz Napierala	4089cc8aa1	First step at adopting FreeBSD to support PSARC/2010/029. This makes acl_is_trivial_np(3) properly recognize the new trivial ACLs. From the user point of view, that means "ls -l" no longer shows plus signs for all the files when running ZFS v28.	2010-09-20 17:10:06 +00:00
Ed Schouten	d1817ed7f3	Just make callout devices and /dev/console force CLOCAL on open(). Instead of adding custom checks to wait for DCD on open(), just modify the termios structure to set CLOCAL. This means SIGHUP is no longer generated when losing DCD as well. Reviewed by: kib@ MFC after: 1 week	2010-09-19 16:35:42 +00:00
Ed Schouten	4b5d5046ab	Ignore DCD handling on /dev/console entirely. This makes /dev/console more fail-safe and prevents a potential console lock-up during boot. Discussed on: stable@ Tested by: koitsu@ MFC after: 1 week	2010-09-19 14:21:39 +00:00
Robert Watson	adb6aa9ab9	With reworking of the socket life cycle in 7.x, the need for a "sotryfree()" was eliminated: all references to sockets are explicitly managed by sorele() and the protocols. As such, garbage collect sotryfree(), and update sofree() comments to make the new world order more clear. MFC after: 3 days Reported by: Anuranjan Shukla <anshukla at juniper dot net>	2010-09-18 11:18:42 +00:00
Andriy Gapon	19b8a6dbc1	kern.sched.topology_spec sysctl: use step of 1 for group levels numeration This is just a cosmetic change for prettier output. 'indent' variable/parameter serves two purposes: it specifies whitespace indentation level and also implies cpu group level/depth. It would have been better to split those two uses, but for now just a simple change. MFC after: 1 week	2010-09-18 11:16:43 +00:00
Alexander Motin	8e860de4bf	When global timer used at SMP system, update nextevent field on BSP before sending IPI to other CPUs. Otherwise, other CPUs will try to honor stale value, programming timer for zero interval. If timer is fast enough, it caused extra interrupt before timer correctly reprogrammed by BSP.	2010-09-18 07:18:30 +00:00
Warner Losh	5ff4999243	By popular demand, kill all the non GIANT related interrupt messages. They are confusing and add little value. Reviewed by: jhb@	2010-09-17 16:05:25 +00:00
Matthew D Fleming	4e6571599b	Re-add r212370 now that the LOR in powerpc64 has been resolved: Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)	2010-09-16 16:13:12 +00:00
Alexander Motin	9aff0c8ff7	Fix panic on NULL dereference possible after r212541.	2010-09-14 10:26:49 +00:00
Alexander Motin	0e18987383	Make kern_tc.c provide minimum frequency of tc_ticktock() calls, required to handle current timecounter wraps. Make kern_clocksource.c to honor that requirement, scheduling sleeps on first CPU for no more then specified period. Allow other CPUs to sleep up to 1/4 second (for any case).	2010-09-14 08:48:06 +00:00
Alexander Motin	4763a8b8c1	Replace spin lock with the set of atomics. It is impractical for one tc_ticktock() call to wait for another's completion -- just skip it.	2010-09-14 04:57:30 +00:00
Alexander Motin	dd9595e7fa	Add some foot shooting protection by checking singlemul value correctness. Rephrase sysctls descriptions. Suggested by: edmaste	2010-09-14 04:48:04 +00:00
Matthew D Fleming	404a593e28	Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn	2010-09-13 18:48:23 +00:00
Andriy Gapon	b7d28b2e0b	bus_add_child: add specialized default implementation that calls panic If a kobj method doesn't have any explicitly provided default implementation, then it is auto-assigned kobj_error_method. kobj_error_method is proper only for methods that return error code, because it just returns ENXIO. So, in the case of unimplemented bus_add_child caller would get (device_t)ENXIO as a return value, which would cause the mistake to go unnoticed, because return value is typically checked for NULL. Thus, a specialized null_add_child is added. It would have sufficied for correctness to return NULL, but this type of mistake was deemed to be rare and serious enough to call panic instead. Watch out for this kind of problem with other kobj methods. Suggested by: jhb, imp MFC after: 2 weeks	2010-09-13 08:34:20 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Alexander Motin	90baf564d2	Do not print "frequency 0 Hz", when frequency is unknown.	2010-09-11 20:18:15 +00:00
Alexander Kabaev	eb262be333	Add missing pointer increment to sbuf_cat.	2010-09-11 19:42:50 +00:00
Konstantin Belousov	9a24dc0760	Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak when mount and update are executed in parallel. Encapsulate syncer vnode deallocation into the helper function vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c. Found and reviewed by: jh (previous version of the patch) Tested by: pho MFC after: 3 weeks	2010-09-11 13:06:06 +00:00
Alexander Motin	b722ad008b	Merge some SCHED_ULE features to SCHED_4BSD: - Teach SCHED_4BSD to inform cpu_idle() about high sleep/wakeup rate to choose optimized handler. In case of x86 it is MONITOR/MWAIT. Also it will be needed to bypass forthcoming idle tick skipping logic to not consume resources on events rescheduling when it won't give any benefits. - Teach SCHED_4BSD to wake up idle CPUs without using IPI. In case of x86, when MONITOR/MWAIT is active, it require just single memory write. This doubles performance on some heavily switching test loads.	2010-09-11 07:08:22 +00:00
Jamie Gritton	f337198db0	Don't exit kern_jail_set without freeing options when enforce_statfs has an illegal value. MFC after: 3 days	2010-09-10 21:45:42 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
Alexander Motin	9f9ad565a1	Do not IPI CPU that is already spinning for load. It doubles effect of spining (comparing to MWAIT) on some heavly switching test loads.	2010-09-10 13:24:47 +00:00
Andriy Gapon	3d844eddb7	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
Matthew D Fleming	dd67e2103c	Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk	2010-09-09 18:33:46 +00:00
Matthew D Fleming	4351ba272c	Add drain functionality to sbufs. The drain is a function that is called when the sbuf internal buffer is filled. For kernel sbufs with a drain, the internal buffer will never be expanded. For userland sbufs with a drain, the internal buffer may still be expanded by sbuf_[v]printf(3). Sbufs now have three basic uses: 1) static string manipulation. Overflow is marked. 2) dynamic string manipulation. Overflow triggers string growth. 3) drained string manipulation. Overflow triggers draining. In all cases the manipulation is 'safe' in that overflow is detected and managed. Reviewed by: phk (the previous version)	2010-09-09 17:49:18 +00:00
Matthew D Fleming	01f6f5fcd4	Refactor sbuf code so that most uses of sbuf_extend() are in a new sbuf_put_byte(). This makes it easier to add drain functionality when a buffer would overflow as there are fewer code points. Reviewed by: phk	2010-09-09 16:51:52 +00:00
Rui Paulo	d3555b6fc2	Fix two bugs in DTrace: * when the process exits, remove the associated USDT probes * when the process forks, duplicate the USDT probes. Sponsored by: The FreeBSD Foundation	2010-09-09 09:58:05 +00:00
Pawel Jakub Dawidek	4946fa6791	Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure.	2010-09-09 07:55:13 +00:00
Pawel Jakub Dawidek	7443b79b81	Doing first mount and updating mount points are both handled by the same syscall and the same function, but are very different and share almost no code. To make it easier to read and analyze, split vfs_domount() into vfs_domount_first() and vfs_domount_update(). Reviewed by: kib	2010-09-08 21:00:53 +00:00
Pawel Jakub Dawidek	a34512e3f0	- Log all the problems in devfs_fixup(). - Correct error paths. The system will be useless on devfs_fixup() failure, so why bother? Maybe for the same reason why a dead body is washed and dressed in a nice suit before it is put into a coffin? Maybe system's last will is to panic without any locks held? Reviewed by: kib	2010-09-08 20:56:18 +00:00
Andriy Gapon	3b0620e06c	subr_bus: use hexadecimal representation for bit flags It seems that this format is more custom in our code, and it is more convenient too. Suggested by: jhb No objection: imp MFC after: 1 week	2010-09-08 17:35:06 +00:00
Michael Tuexen	049640c1f0	Implement correct handling of address parameter and sendinfo for SCTP send calls. MFC after: 4 weeks.	2010-09-05 20:13:07 +00:00
Alexander Motin	d89be9509f	Initialize buffer for case of empty string. Happens only on non-refactored platforms.	2010-09-05 06:16:04 +00:00
Andriy Gapon	ef3b7ba04f	struct device: widen type of flags and order fields to u_int Also change int -> u_int for order parameter in device_add_child_ordered. There should not be any ABI change as struct device is private to subr_bus.c and the API change should be compatible. To do: change int -> u_int for order parameter of bus_add_child method and its implementations. The change should also be API compatible, but is a bit more churn. Suggested by: imp, jhb MFC after: 1 week	2010-09-04 17:28:29 +00:00
Matthew D Fleming	181ff3d503	Use a better #if guard. Suggested by pluknet <pluknet at gmail dot com>.	2010-09-03 17:42:17 +00:00
Matthew D Fleming	c05dbe7a54	Style(9) fixes and eliminate the use of min().	2010-09-03 17:42:12 +00:00
Matthew D Fleming	969292fb1b	Fix user-space libsbuf build. Why isn't CTASSERT available to user-space?	2010-09-03 17:23:26 +00:00
Matthew D Fleming	f5a5dc5da8	Fix brain fart when converting an if statement into a KASSERT.	2010-09-03 16:12:39 +00:00
Matthew D Fleming	f4bafab8da	Use math rather than iteration when the desired sbuf size is larger than SBUF_MAXEXTENDSIZE.	2010-09-03 16:09:17 +00:00
Justin T. Gibbs	f03f7a0ca3	Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation MFC after: 1 month	2010-09-02 19:40:28 +00:00
Matthew D Fleming	ba4932b5a2	Fix UP build. MFC after: 2 weeks	2010-09-02 16:23:05 +00:00
Matthew D Fleming	0f7a0ebd59	Fix a bug with sched_affinity() where it checks td_pinned of another thread in a racy manner, which can lead to attempting to migrate a thread that is pinned to a CPU. Instead, have sched_switch() determine which CPU a thread should run on if the current one is not allowed. KASSERT in sched_bind() that the thread is not yet pinned to a CPU. KASSERT in sched_switch() that only migratable threads or those moving due to a sched_bind() are changing CPUs. sched_affinity code came from jhb@. MFC after: 2 weeks	2010-09-01 20:32:47 +00:00
Max Laier	36058c09e4	rmlock(9) two additions and one change/fix: - add rm_try_rlock(). - add RM_SLEEPABLE to use sx(9) as the back-end lock in order to sleep while holding the write lock. - change rm_noreadtoken to a cpu bitmask to indicate which CPUs need to go through the lock/unlock in order to synchronize. As a side effect, this also avoids IPI to CPUs without any readers during rm_wlock. Discussed with: ups@, rwatson@ on arch@ Sponsored by: Isilon Systems, Inc.	2010-09-01 19:50:03 +00:00
Ed Maste	e5ddf11581	As long as we are going to panic anyway, there's no need to hide additional information behind DIAGNOSTIC.	2010-09-01 13:47:11 +00:00
David Xu	137cf33d5e	rescure comments from RELENG_4.	2010-09-01 01:26:07 +00:00
Matthew D Fleming	6d3ed393d6	The realloc case for memguard(9) will copy too many bytes when reallocating to a smaller-sized allocation. Fix this issue. Noticed by: alc Reviewed by: alc Approved by: zml (mentor) MFC after: 3 weeks	2010-08-31 16:57:58 +00:00
David Xu	83b718eb07	If a process is being debugged, skips job control caused by SIGSTOP/SIGCONT signals, because it is managed by debugger, however a normal signal sent to a interruptibly sleeping thread wakes up the thread so it will handle the signal when the process leaves the stopped state. PR: 150138 MFC after: 1 week	2010-08-31 07:15:50 +00:00
Jaakko Heinonen	de478dd4b4	execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege. Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately. PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)	2010-08-30 16:30:18 +00:00
Konstantin Belousov	e7fb66340e	Regen	2010-08-30 14:26:02 +00:00
Konstantin Belousov	8d19559bde	Make the syscalls reserved for AFS usable by OpenAFS port. Submitted by: Benjamin Kaduk <kaduk mit edu> MFC after: 2 weeks	2010-08-30 14:24:44 +00:00
Konstantin Belousov	6d8fedda2c	For some file types, select code registers two selfd structures. E.g., for socket, when specified POLLIN\|POLLOUT in events, you would have one selfd registered for receiving socket buffer, and one for sending. Now, if both events are not ready to fire at the time of the initial scan, but are simultaneously ready after the sleep, pollrescan() would iterate over the pollfd struct twice. Since both times revents is not zero, returned value would be off by one. Fix this by recalculating the return value in pollout(). PR: kern/143029 MFC after: 2 weeks	2010-08-28 17:42:08 +00:00
Pawel Jakub Dawidek	c87f1ad43c	There is a bug in vfs_allocate_syncvnode() failure handling in mount code. Actually it is hard to properly handle such a failure, especially in MNT_UPDATE case. The only reason for the vfs_allocate_syncvnode() function to fail is getnewvnode() failure. Fortunately it is impossible for current implementation of getnewvnode() to fail, so we can assert this and make vfs_allocate_syncvnode() void. This in turn free us from handling its failures in the mount code. Reviewed by: kib MFC after: 1 month	2010-08-28 08:57:15 +00:00
Pawel Jakub Dawidek	646c3b21ae	Run all tasks from a proper context, with proper priority, etc. Reviewed by: jhb MFC after: 1 month	2010-08-28 08:38:03 +00:00
Konstantin Belousov	13561ed4ed	Fix typo. Submitted by: Ben Kaduk <minimarmot gmail com>	2010-08-26 11:20:57 +00:00
Brian Somers	c2d844d814	If we read zero bytes from the directory, early out with ENOENT rather than forging ahead and interpreting garbage buffer content and dirent structures. This change backs out r211684 which was essentially a no-op. MFC after: 1 week	2010-08-25 18:09:51 +00:00
David Xu	df7442533c	If a thread is removed from umtxq while sleeping, reset error code to zero, this gives userland a better indication that a thread needn't to be cancelled.	2010-08-25 03:14:32 +00:00
David Xu	2961a78226	Optimize thr_suspend, if timeout is zero, don't call msleep, just return immediately.	2010-08-24 07:29:55 +00:00
David Xu	baf28b69f4	- According to specification, SI_USER code should only be generated by standard kill(). On other systems, SI_LWP is generated by lwp_kill(). This will allow conforming applications to differentiate between signals generated by standard events and those generated by other implementation events in a manner compatible with existing practice. - Bump __FreeBSD_version	2010-08-24 07:22:24 +00:00
Warner Losh	b3cdb67393	This should really be MACHINE not MACHINE_ARCH, and is this Makefile even used?	2010-08-23 06:22:35 +00:00
Brian Somers	90db41b62b	uio_resid isn't updated by VOP_READDIR for nfs filesystems. Use the uio_offset adjustment instead to calculate a correct *len. Without this change, we run off the end of the directory data we're reading and panic horribly for nfs filesystems. MFC after: 1 week	2010-08-23 05:33:31 +00:00
Rui Paulo	b3d354c9ce	Call the systrace_probe_func() when the error value. Sponsored by: The FreeBSD Foundation	2010-08-22 11:30:49 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00

... 2 3 4 5 6 ...

12141 Commits