freebsd-dev

Author	SHA1	Message	Date
Ed Schouten	907b48bc05	Fix indentation.	2009-12-20 22:55:27 +00:00
Ed Schouten	8dc9b4cf04	Let access overriding to TTYs depend on the cdev_priv, not the vnode. Basically this commit changes two things, which improves access to TTYs in exceptional conditions. Basically the problem was that when you ran jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if you want to attach to screens quickly, use ssh(1), etc. The fixes: - Cache the cdev_priv of the controlling TTY in struct session. Change devfs_access() to compare against the cdev_priv instead of the vnode. This allows you to bypass UNIX permissions, even across different mounts of devfs. - Extend devfs_prison_check() to unconditionally expose the device node of the controlling TTY, even if normal prison nesting rules normally don't allow this. This actually allows you to interact with this device node. To be honest, I'm not really happy with this solution. We now have to store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp). In an ideal world, we should just get rid of the latter two and only use s_ttyp, but this makes certian pieces of code very impractical (e.g. devfs, kern_exit.c). Reported by: Many people	2009-12-19 18:42:12 +00:00
Edward Tomasz Napierala	28d3fd007e	Interpret VAPPEND correctly in vaccess_acl_nfs4(9).	2009-12-19 11:41:52 +00:00
Ed Schouten	e6d84d057a	Make the wchan names of pts(4) fit in top(1). Just like a similar change we made to the TTY code about half a year ago, make these strings look similar. Suggested by: Jille Timmermans <jille@quis.cx>	2009-12-18 20:11:29 +00:00
Andrew Thompson	2b54315009	If the runcount is non-zero in eventhandler_deregister() then one or more threads are executing the eventhandler, sleep in this case to make it safe for module unload. If the runcount was up then an entry would have been marked EHE_DEAD_PRIORITY so use this as a trigger to do the wakeup in eventhandler_prune_list(). Reviewed by: jhb	2009-12-17 21:17:13 +00:00
Matt Jacob	e7d829a46c	Fix argument order in a call to mtx_init. MFC after: 1 week	2009-12-17 00:22:56 +00:00
Luigi Rizzo	20c510f826	Properly fix callout handling by putting all the per-cpu info in struct callout_cpu. From the comment in the file: + * There is one struct callout_cpu per cpu, holding all relevant + * state for the callout processing thread on the individual CPU. + * In particular: + * cc_ticks is incremented once per tick in callout_cpu(). + * It tracks the global 'ticks' but in a way that the individual + * threads should not worry about races in the order in which + * hardclock() and hardclock_cpu() run on the various CPUs. + * cc_softclock is advanced in callout_cpu() to point to the + * first entry in cc_callwheel that may need handling. In turn, + * a softclock() is scheduled so it can serve the various entries i + * such that cc_softclock <= i <= cc_ticks . Together with a smaller patch committed in september, this fixes a bug that affects 8.0 with apps that rely on callouts to fire exactly in the number of ticks specified (qemu among them). Right now, callouts in 8.0 fire one tick late. This was discussed in september with JeffR and jhb MFC after: 3 days	2009-12-14 12:23:46 +00:00
Bjoern A. Zeeb	de0bd6f76b	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days	2009-12-13 13:57:32 +00:00
Attilio Rao	2028867def	In current code, threads performing an interruptible sleep (on both sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will leave the waiters flag on forcing the owner to do a wakeup even when if the waiter queue is empty. That operation may lead to a deadlock in the case of doing a fake wakeup on the "preferred" (based on the wakeup algorithm) queue while the other queue has real waiters on it, because nobody is going to wakeup the 2nd queue waiters and they will sleep indefinitively. A similar bug, is present, for lockmgr in the case the waiters are sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue is not empty, the waiters won't progress after being awake but they will just fail, still not taking care of the 2nd queue waiters (as instead the lock owned doing the wakeup would expect). In order to fix this bug in a cheap way (without adding too much locking and complicating too much the semantic) add a sleepqueue interface which does report the actual number of waiters on a specified queue of a waitchannel (sleepq_sleepcnt()) and use it in order to determine if the exclusive waiters (or shared waiters) are actually present on the lockmgr (or sx) before to give them precedence in the wakeup algorithm. This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters a lockmgr has and if all the waiters on the exclusive waiters queue are LK_SLEEPFAIL just wake both queues. The sleepq_sleepcnt() introduction and ABI breakage require __FreeBSD_version bumping. Reported by: avg, kib, pho Reviewed by: kib Tested by: pho	2009-12-12 21:31:07 +00:00
John Baldwin	42a346fa63	For some buses, devices may have active resources assigned even though they are not allocated by the device driver. These resources should still appear allocated from the system's perspective so that their assigned ranges are not reused by other resource requests. The PCI bus driver has used a hack to effect this for a while now where it uses rman_set_device() to assign devices to the PCI bus when they are first encountered and later assigns them to the actual device when a driver allocates a BAR. A few downsides of this approach is that it results in somewhat confusing devinfo -r output as well as not being very easily portable to other bus drivers. This commit adds generic support for "reserved" resources to the resource list API used by many bus drivers to manage the resources of child devices. A resource may be reserved via resource_list_reserve(). This will allocate the resource from the bus' parent without activating it. resource_list_alloc() recognizes an attempt to allocate a reserved resource. When this happens it activates the resource (if requested) and then returns the reserved resource. Similarly, when a reserved resource is released via resource_list_release(), it is deactivated (if it is active) and the resource is then marked reserved again, but is left allocated from the bus' parent. To completely remove a reserved resource, a bus driver may use resource_list_unreserve(). A bus driver may use resource_list_busy() to determine if a reserved resource is allocated by a child device or if it can be unreserved. The PCI bus driver has been changed to use this framework instead of abusing rman_set_device() to keep track of reserved vs allocated resources. Submitted by: imp (an older version many moons ago) MFC after: 1 month	2009-12-09 21:52:53 +00:00
Edward Tomasz Napierala	9d7031a6d6	Don't add VAPPEND if the file is not being opened for writing. Note that this only affects cases where open(2) is being used improperly - i.e. when the user specifies O_APPEND without O_WRONLY or O_RDWR. Reviewed by: rwatson	2009-12-08 20:47:10 +00:00
Konstantin Belousov	4f17d481ed	Remove wrong assertion. Debugee is allowed to lose a signal. Reported and tested by: jh MFC after: 2 weeks	2009-12-03 20:16:59 +00:00
Edward Tomasz Napierala	6bb58cdd0f	Add change that was somehow missed in r192586. It could manifest by incorrectly returning EINVAL from acl_valid(3) for applications linked against pre-8.0 libc.	2009-12-03 13:29:24 +00:00
Ed Schouten	6eaf04022b	Don't allocate an input buffer for a TTY when the receiver is turned off. When the termios CREAD flag is not set, it makes little sense to allocate an input buffer. Just set the size to 0 in this case to reduce memory footprint. Disallow CREAD to be disabled for pseudo-devices to prevent foot-shooting.	2009-12-01 19:14:57 +00:00
Alan Cox	a6d42a0d62	Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE has represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access. Reviewed by: kib MFC after: 4 weeks	2009-11-26 05:16:07 +00:00
Ivan Voras	cbc4ea28e2	Make ULE process usage (%CPU) accounting usable again by keeping track of the last tick we incremented on. Submitted by: matthew.fleming/at/isilon.com, is/at/rambler-co.ru Reviewed by: jeff (who thinks there should be a better way in the future) Approved by: gnn (mentor) MFC after: 3 weeks	2009-11-24 19:57:41 +00:00
Konstantin Belousov	080136212f	On the return path from F_RDAHEAD and F_READAHEAD fcntls, do not unlock Giant twice. While there, bring conditions in the do/while loops closer to style, that also makes the lines fit into 80 columns. Reported and tested by: dougb	2009-11-20 22:22:53 +00:00
Jaakko Heinonen	10d843a446	Extend ddb(4) "show mount" command to print active string mount options. Note that only option names are printed, not values. Reviewed by: pjd Approved by: trasz (mentor) MFC after: 2 weeks	2009-11-19 14:33:03 +00:00
Oleksandr Tymoshenko	3ea6157e6b	- Unbreak build with KLD_DEBUG defined - Add debug.kld_debug sysctl to control KLD debugging level - Print information about KLD dependencies with debug enabled	2009-11-17 21:56:12 +00:00
Konstantin Belousov	a3de221dbe	Among signal generation syscalls, only sigqueue(2) is allowed by POSIX to fail due to lack of resources to queue siginfo. Add KSI_SIGQ flag that allows sigqueue_add() to fail while trying to allocate memory for new siginfo. When the flag is not set, behaviour is the same as for KSI_TRAP: if memory cannot be allocated, set bit in sq_kill. KSI_TRAP is kept to preserve KBI. Add SI_KERNEL si_code, to be used in siginfo.si_code when signal is generated by kernel. Deliver siginfo when signal is generated by kill(2) family of syscalls (SI_USER with properly filled si_uid and si_pid), or by kernel (SI_KERNEL, mostly job control or SIGIO). Since KSI_SIGQ flag is not set for the ksi, low memory condition cause old behaviour. Keep psignal(9) KBI intact, but modify it to generate SI_KERNEL si_code. Pgsignal(9) and gsignal(9) now take ksi explicitely. Add pksignal(9) that behaves like psignal but takes ksi, and ddb kill command implemented as pksignal(..., ksi = NULL) to not do allocation while in debugger. While there, remove some register specifiers and use ANSI C prototypes. Reviewed by: davidxu MFC after: 1 month	2009-11-17 11:39:15 +00:00
Xin LI	1a9d4dda9b	Revert revision 199201 for now as it has introduced a kernel vulnerability and requires more polishing.	2009-11-12 19:02:10 +00:00
Attilio Rao	d113304956	Add the possibility for vfs.root.mountfrom tunable to accept a list of items rather than a single one. The list is a space separated collection of items defined as the current one accepted. While there fix also a nit in a comment. Obtained from: Sandvine Incorporated Reviewed by: emaste Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Sponsored by: Sandvine Incorporated MFC: 2 weeks	2009-11-12 15:59:05 +00:00
Attilio Rao	023c800576	The building the dev nameunit string, in devclass_add_device() is based on the assumption that the unit linked with the device is invariant but that can change when calling devclass_alloc_unit() (because -1 is passed or, more simply, because the unit choosen is beyond the table limits). This results in a completely bogus string building. Fix this by reserving the necessary room for all the possible characters printable by a positive integer (we do not allow for negative unit number). Reported by: Sandvine Incorporated Reviewed by: emaste Sponsored by: Sandvine Incorporated MFC: 1 week	2009-11-12 00:52:14 +00:00
Xin LI	41c8c6e876	Add interface description capability as inspired by OpenBSD. MFC after: 3 months	2009-11-11 21:30:58 +00:00
Edward Tomasz Napierala	23e62e7654	Revert r198873. Having different VAPPEND semantics for VOP_ACCESS(9) and VOP_ACCESSX(9) is not a good idea.	2009-11-11 13:49:22 +00:00
Konstantin Belousov	88e6f61a19	When rename("a", "b/.") is performed, target namei() call returns dvp == vp. Rename syscall does not check for the case, and at least ufs_rename() cannot deal with it. POSIX explicitely requires that both rename(2) and rmdir(2) return EINVAL when any of the pathes end in "/.". Detect the slashdot lookup for RENAME or REMOVE in lookup(), and return EINVAL. Reported by: Jim Meyering <jim meyering net> Tested by: simon, pho MFC after: 1 week	2009-11-10 11:50:37 +00:00
Konstantin Belousov	75c586a4c8	In r198506, kern_sigsuspend() started doing cursig/postsig loop to make sure that a signal was delivered to the thread before returning from syscall. Signal delivery puts new return frame on the user stack, and modifies trap frame to enter signal handler. As a consequence, syscall return code sets EINTR as error return for signal frame, instead of the syscall return. Also, for ia64, due to different registers layout for those two kind of frames, usermode sigsegfaulted when returned from signal handler. Use newly-introduced cpu_set_syscall_retval(9) to set syscall result, and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return code from modifying this frame [1]. Another issue is that pending SIGCONT might be cancelled by SIGSTOP, causing postsig() not to deliver any catched signal [2]. Modify postsig() to return 1 if signal was posted, and 0 otherwise, and use this in the kern_sigsuspend loop. Proposed by: marcel [1] Noted by: davidxu [2] Reviewed by: marcel, davidxu MFC after: 1 month	2009-11-10 11:46:53 +00:00
Edward Tomasz Napierala	7ba889b8f4	Add suggestion for zfs root.	2009-11-08 09:54:25 +00:00
Attilio Rao	337c5ff4fc	Save the sack when doing a lockmgr_disown() call. Requested by: kib MFC: 3 days	2009-11-06 22:33:03 +00:00
Edward Tomasz Napierala	7a172809b5	Fix build. Submitted by: Andrius Morkūnas <hinokind at gmail.com>	2009-11-04 08:25:58 +00:00
Edward Tomasz Napierala	3b1b09809f	Revert r198874, pending further discussion.	2009-11-04 07:14:16 +00:00
Edward Tomasz Napierala	da9ce28ecb	Style fixes.	2009-11-04 07:04:15 +00:00
Edward Tomasz Napierala	8fafa5cecf	Make sure we don't end up with VAPPEND without VWRITE, if someone calls open(2) like this: open(..., O_APPEND).	2009-11-04 06:48:34 +00:00
Edward Tomasz Napierala	597954c813	While VAPPEND without VWRITE makes sense for VOP_ACCESSX(9) (e.g. to check for the permission to create subdirectory (ACE4_ADD_SUBDIRECTORY)), it doesn't really make sense for VOP_ACCESS(9). Also, many VOP_ACCESS(9) implementations don't expect that. Make sure we don't confuse them.	2009-11-04 06:47:14 +00:00
Ed Schouten	ca1d2f657a	Make /dev/klog and kern.msgbuf* MPSAFE. Normally msgbufp is locked using Giant. Switch it to use the msgbuf_lock. Instead of changing the tsleep() calls to msleep(), just convert it to condvar(9). In my opinion the locking around msgbuf_peekbytes() still remains questionable. It looks like locks are dropped while performing copies of multiple blocks to userspace, which may cause the msgbuf to be reset in the mean time. At least getting it underneath from Giant should make it a little easier for us to figure out how to solve that. Reminded by: rdivacky	2009-11-03 21:06:19 +00:00
Attilio Rao	1b9d701fee	Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>	2009-11-03 16:46:52 +00:00
Konstantin Belousov	1c89fc757a	If socket buffer space appears to be lower then sum of count of already prepared bytes and next portion of transfer, inner loop of kern_sendfile() aborts, not preparing next mbuf for socket buffer, and not modifying any outer loop invariants. The thread loops in the outer loop forever. Instead of breaking from inner loop, prepare only bytes that fit into the socket buffer space. In collaboration with: pho Reviewed by: bz PR: kern/138999 MFC after: 2 weeks	2009-11-03 12:52:35 +00:00
Konstantin Belousov	80a8b0f3bf	Trapsignal() and postsig() call kern_sigprocmask() with both process lock and curproc->p_sigacts->ps_mtx. Reschedule_signals may need to have ps_mtx locked to decide and wakeup a thread, causing recursion on the mutex. Inform kern_sigprocmask() and reschedule_signals() about lock state of the ps_mtx by new flag SIGPROCMASK_PS_LOCKED to avoid recursion. Reported and tested by: keramida MFC after: 1 month	2009-10-30 10:10:39 +00:00
Konstantin Belousov	8084540253	Trapsignal() calls kern_sigprocmask() when delivering catched signal with proc lock held. Reported and tested by: Mykola Dzham freebsd at levsha org ua MFC after: 1 month	2009-10-29 14:34:24 +00:00
Konstantin Belousov	7415a41f4a	Fix style issue.	2009-10-29 10:03:08 +00:00
Konstantin Belousov	17c974499c	Regenerate	2009-10-27 11:01:15 +00:00
Konstantin Belousov	066d836b02	Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:55:34 +00:00
Konstantin Belousov	d6e029adbe	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:47:58 +00:00
Konstantin Belousov	84440afb54	In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery. Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop. Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between. Reviewed by: davidxu Tested by: pho MFC after: 1 month	2009-10-27 10:42:24 +00:00
John Baldwin	86855bf549	Another nit that both I and ispell missed. Submitted by: Ben Kaduk minimarmot of gmail	2009-10-26 18:32:06 +00:00
John Baldwin	9390262576	Fix some spelling nits.	2009-10-26 17:42:03 +00:00
Joseph Koshy	16d95d4f92	Inform hwpmc(4) of a thread's impending demise prior to invoking sched_throw(). Debugging help: fabient Review and testing by: fabient	2009-10-25 04:34:47 +00:00
Alan Cox	a0c703bf21	Update a comment to reflect the previous change.	2009-10-25 02:48:29 +00:00
Ruslan Ermilov	4d9d1e823c	- Rename tunable kern.ipc.shmmaxpgs to kern.ipc.shmall. - Explain the fuss when initializing shmmax. PR: 75542 (mistakenly closed instead of PR 75541)	2009-10-24 19:00:58 +00:00
John Baldwin	5ca4819ddf	- Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that created shadow copies of these arrays were just using MAXCOMLEN. - Prefer using sizeof() of an array type to explicit constants for the array length in a few places. - Ensure that all of p_comm[] and td_name[] is always zero'd during execve() to guard against any possible information leaks. Previously trailing garbage in p_comm[] could be leaked to userland in ktrace record headers via td_name[]. Reviewed by: bde	2009-10-23 15:14:54 +00:00
John Baldwin	4f9d48e478	Don't bother copying the name of a kproc or kthread out into a temporary array just to pass that array to printf(). kproc and kthread names are NUL-terminated and can be printed using printf() directly. Reviewed by: bde	2009-10-23 15:09:51 +00:00
John Baldwin	3eec6f034a	Set the devclass_t pointer specified in the DRIVER_MODULE() macro sooner so it is always valid when a driver's identify routine is called. Previously, new-bus would attempt to create the devclass for a newly loaded driver in two separate places, once in devclass_add_driver(), and again after devclass_add_driver() returned in driver_module_handler(). Only the second lookup attempted to set a device class' parent and set the devclass_t pointer specified in the DRIVER_MODULE() macro. However, by the time it was executed, the driver was already added to existing instances of the parent driver at which point in time the new driver's identify routine would have been invoked. The fix is to merge the two attempts and only create the devclass once in devclass_add_driver() including setting the devclass_t pointer passed to DRIVER_MODULE() before the driver is added to any existing bus devices. Reported by: avg Reviewed by: imp MFC after: 2 weeks	2009-10-22 14:53:44 +00:00
Marcel Moolenaar	1a4fcaebe3	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.	2009-10-21 18:38:02 +00:00
Ruslan Ermilov	e64585bdc2	Random number generator initialization cleanup: - Introduce new SI_SUB_RANDOM point in boot sequence to make it clear from where one may start using random(9). It should be as early as possible, so place it just after SI_SUB_CPU where we have some randomness on most platforms via get_cyclecount(). - Move stack protector initialization to be after SI_SUB_RANDOM as before this point we have no randomness at all. This fixes stack protector to actually protect stack with some random guard value instead of a well-known one. Note that this patch doesn't try to address arc4random(9) issues. With current code, it will be implicitly seeded by stack protector and hence will get the same entropy as random(9). It will be securely reseeded once /dev/random is feeded by some entropy from userland. Submitted by: Maxim Dounin <mdounin@mdounin.ru> MFC after: 3 days	2009-10-20 16:36:51 +00:00
Ed Schouten	6015f6f35a	Properly set the low watermarks when reducing the baud rate. Now that buffers are deallocated lazily, we should not use tty*q_getsize() to obtain the buffer size to calculate the low watermarks. Doing this may cause the watermark to be placed outside the typical buffer size. This caused some regressions after my previous commit to the TTY code, which allows pseudo-devices to resize the buffers as well. Reported by: yongari, dougb MFC after: 1 week	2009-10-19 07:17:37 +00:00
Ed Schouten	5ed8d12443	Allow the buffer size to be configured for pseudo-like TTY devices. Devices that don't implement param() (which means they don't support hardware parameters such as flow control, baud rate) hardcode the baud rate to TTYDEF_SPEED. This means the buffer size cannot be configured, which is a little inconvenient when using canonical mode with big lines of input, etc. Make it adjustable, but do clamp it between B50 and B115200 to prevent awkward buffer sizes. Remove the baud rate assignment from /etc/gettytab. Trust the kernel to fill in a proper value. Reported by: Mikolaj Golub <to my trociny gmail com> MFC after: 1 month	2009-10-18 19:48:53 +00:00
Ed Schouten	99087885be	Make lock devices work properly. It turned out I did add the code to use the init state devices to set the termios structure when opening the device, but it seems I totally forgot to add the bits required to force the actual locking of flags through the lock state devices. Reported by: ru MFC after: 1 week (to be discussed)	2009-10-18 19:45:44 +00:00
Konstantin Belousov	7564c4ad9a	If ET_DYN binary has non-zero base address for some reason, honour it and do not relocate the binary to ET_DYN_LOAD_ADDR. This allows for the binary author to influence address map of the process. In particular, when the binary is actually an interpeter, this allows to have almost usual process address map. Communicate the relocation bias of the mapping for interpeter-less ET_DYN binary, that is interperter itself, in AT_BASE aux entry. This way, rtld is able to find its dynamic structure and relocate itself. Note that mapbase in the rtld is still wrong and requires further fixing. Reported and tested by: rwatson Discussed with: kan MFC after: 3 days	2009-10-18 12:57:48 +00:00
Ed Schouten	39410373b3	Print backspaces after echoing an EOF. Applications like shells expect EOF to give no graphical output, while our implementation prints ^D by default (tunable with stty echoctl). Make the new implementation behave like the old TTY code. Print two backspaces afterwards. Reported by: koitsu MFC after: 1 month	2009-10-17 08:59:41 +00:00
John Baldwin	d0c9a29169	Use language more closely resembling English in a panic message. Pointy hat to: jhb Submitted by: pluknet	2009-10-15 18:51:19 +00:00
John Baldwin	37b8ef16cd	Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64 and i386 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler(). Requested by: many Reviewed by: scottl MFC after: 2 weeks	2009-10-15 14:54:35 +00:00
John Baldwin	a0f1535205	Fix a sign bug in the handling of nice priorities when computing the interactive score for a thread. Submitted by: Taku YAMAMOTO taku of tackymt.homeip.net Reviewed by: jeff MFC after: 3 days	2009-10-15 11:41:12 +00:00
Joseph Koshy	8b5adf9dd9	Improve the description of sysctl "kern.sugid_coredump". Submitted by: Mel Flynn <mel.flynn+fbsd.hackers at mailing.thruhere.net> on -hackers	2009-10-12 15:49:48 +00:00
Konstantin Belousov	7df8f6ab6f	Fix typo. Submitted by: rdivacky MFC after: 1 month	2009-10-12 10:09:48 +00:00
Konstantin Belousov	6b286ee8b5	Currently, when signal is delivered to the process and there is a thread not blocking the signal, signal is placed on the thread sigqueue. If the selected thread is in kernel executing thr_exit() or sigprocmask() syscalls, then signal might be not delivered to usermode for arbitrary amount of time, and for exiting thread it is lost. Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode, since only then the thread can handle delivery of signal reliably. For exiting thread or thread that has blocked some signals, check whether the newly blocked signal is queued for the process, and try to find a thread to wakeup for delivery, in reschedule_signal(). For exiting thread, assume that all signals are blocked. Change cursig() and postsig() to look both into the thread and process signal queues. When there is a signal that thread returning to usermode could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do unlocked read of p_siglist and p_pendingcnt to check for queued signals. Note that thread that has a signal unblocked might get spurious wakeup and EINTR from the interruptible system call now, due to the possibility of being selected by reschedule_signals(), while other thread returned to usermode earlier and removed the signal from process queue. This should not cause compliance issues, since the thread has not blocked a signal and thus should be ready to receive it anyway. Reported by: Justin Teller <justin.teller gmail com> Reviewed by: davidxu, jilles MFC after: 1 month	2009-10-11 16:49:30 +00:00
Konstantin Belousov	6728dc169d	Refine r195509, instead of checking that vnode type is VBAD, that is set quite late in the revocation path, properly verify that vnode is not doomed before calling VOP. Reported and tested by: Harald Schmalzbauer <h.schmalzbauer omnilan de> MFC after: 3 days	2009-10-10 21:17:30 +00:00
Konstantin Belousov	ab02d85f0d	Map PIE binaries at non-zero base address. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time	2009-10-10 15:33:01 +00:00
Konstantin Belousov	5b33842a9b	Do not map segments of zero length. Discussed with: bz Reviewed by: kan Tested by: bz (i386, amd64), bsam (linux) MFC after: some time	2009-10-10 15:28:52 +00:00
Konstantin Belousov	47d81c1b70	Postpone dropping fp till both kq_global and kqueue mutexes are unlocked. fdrop() closes file descriptor when reference count goes to zero. Close method for vnodes locks the vnode, resulting in "sleepable after non-sleepable". For pipes, pipe mutex is before kqueue lock, causing LOR. Reported and tested by: pho MFC after: 2 weeks	2009-10-10 14:56:34 +00:00
Robert Watson	604f19c91e	Fix build on amd64, where sysctl arg1 is a pointer. Reported by: Mr Tinderbox MFC after: 3 months	2009-10-05 22:23:12 +00:00
Edward Tomasz Napierala	d5e0b21541	Fix NFSv4 ACLs on sparc64. Turns out that fuword(9) fetches 64 bits instead of sizeof(int), and on sparc64 that resulted in fetching wrong value for acl_maxcnt, which in turn caused __acl_get_link(2) to fail with EINVAL. PR: sparc64/139304 Submitted by: Dmitry Afanasiev <KOT at MATPOCKuH.Ru>	2009-10-05 19:56:56 +00:00
Robert Watson	84d61770bc	First cut at implementing SOCK_SEQPACKET support for UNIX (local) domain sockets. This allows for reliable bi-directional datagram communication over UNIX domain sockets, in contrast to SOCK_DGRAM (M:N, unreliable) or SOCK_STERAM (bi-directional bytestream). Largely, this reuses existing UNIX domain socket code. This allows applications requiring record- oriented semantics to do so reliably via local IPC. Some implementation notes (also present in XXX comments): - Currently we lack an sbappend variant able to do datagrams and control data without doing addresses, so we mark SOCK_SEQPACKET as PR_ADDR. Adding a new variant will solve this problem. - UNIX domain sockets on FreeBSD provide back-pressure/flow control notification for stream sockets by manipulating the send socket buffer's size during pru_send and pru_rcvd. This trick works less well for SOCK_SEQPACKET as sosend_generic() uses sb_hiwat not just to manage blocking, but also to determine maximum datagram size. Fixing this requires rethinking how back-pressure is done for SOCK_SEQPACKET; in the mean time, it's possible to get EMSGSIZE when buffers fill, instead of blocking. Discussed with: benl Reviewed by: bz, rpaulo MFC after: 3 months Sponsored by: Google	2009-10-05 14:49:16 +00:00
Attilio Rao	7f9f80ce03	When releasing a lockmgr held in shared way we need to use a write memory barrier in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio	2009-10-03 15:02:55 +00:00
Bjoern A. Zeeb	925c8b5b04	Print a warning in case we cannot add more brandinfo because we would overflow the MAX_BRANDS sized array. Reviewed by: kib MFC After: 1 month	2009-10-03 10:50:00 +00:00
Robert Watson	afd8e45b45	Don't comment on stream socket handling in sosend_dgram, since that's not handled. MFC after: 3 weeks	2009-10-02 21:31:15 +00:00
Bjoern A. Zeeb	878adb8517	Add a mitigation feature that will prevent user mappings at virtual address 0, limiting the ability to convert a kernel NULL pointer dereference into a privilege escalation attack. If the sysctl is set to 0 a newly started process will not be able to map anything in the address range of the first page (0 to PAGE_SIZE). This is the default. Already running processes are not affected by this. You can either change the sysctl or the tunable from loader in case you need to map at a virtual address of 0, for example when running any of the extinct species of a set of a.out binaries, vm86 emulation, .. In that case set security.bsd.map_at_zero="1". Superseeds: r197537 In collaboration with: jhb, kib, alc	2009-10-02 17:48:51 +00:00
Ed Maste	e380cc73ed	In fill_kinfo_thread, copy the thread's name into struct kinfo_proc even if it is empty. Otherwise the previous thread's name would remain in the struct and then be reported for this thread. Submitted by: Ryan Stone MFC after: 1 week	2009-10-01 21:44:30 +00:00
Edward Tomasz Napierala	2c29cfa083	Provide default implementation for VOP_ACCESS(9), so that filesystems which want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib	2009-10-01 17:22:03 +00:00
Konstantin Belousov	75ffdc4049	Do not dereference vp->v_mount without holding vnode lock and checking that the vnode is not reclaimed. Noted by: Igor Sysoev <is rambler-co ru> MFC after: 1 week	2009-10-01 12:50:26 +00:00
Konstantin Belousov	15b7a831df	Fix typo. MFC after: 3 days	2009-10-01 12:46:58 +00:00
Andriy Gapon	7e936cef34	print machine in kernel boot version string Discussed with: gavin, kib, jhb PR: kern/126926 MFC after: 2 weeks	2009-10-01 10:53:12 +00:00
Attilio Rao	ddce63ca73	When releasing a read/shared lock we need to use a write memory barrier in order to avoid, on architectures which doesn't have strong ordered writes, CPU instructions reordering. Diagnosed by: fabio Reviewed by: jhb Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-30 13:26:31 +00:00
Andriy Gapon	78edb09e67	print_caddr_t: drop incorrect __unused attribute from parameter seems like a purely cosmetic change Reviewed by: jhb, kib MFC after: 1 week	2009-09-30 11:14:13 +00:00
Robert Watson	718dbcfeaa	Regenerate system call files following r197636.	2009-09-30 08:48:59 +00:00
Robert Watson	72c35fc69c	Reserve system call numbers for Capsicum security framework capabilities, capability mode, and process descriptors: cap_new, cap_getrights, cap_enter, cap_getmode, pdfork, pdkill, pdgetpid, and pdwait. Obtained from: TrustedBSD Project Sponsored by: Google MFC after: 3 weeks	2009-09-30 08:46:01 +00:00
Jamie Gritton	d446857747	Set the prison in NFS anon and GSS SVC creds. Reviewed by: marcel MFC after: 3 days	2009-09-28 18:07:16 +00:00
Xin LI	82aebf697c	Add two new fcntls to enable/disable read-ahead: - F_READAHEAD: specify the amount for sequential access. The amount is specified in bytes and is rounded up to nearest block size. - F_RDAHEAD: Darwin compatible version that use 128KB as the sequential access size. A third argument of zero disables the read-ahead behavior. Please note that the read-ahead amount is also constrainted by sysctl variable, vfs.read_max, which may need to be raised in order to better utilize this feature. Thanks Igor Sysoev for proposing the feature and submitting the original version, and kib@ for his valuable comments. Submitted by: Igor Sysoev <is rambler-co ru> Reviewed by: kib@ MFC after: 1 month	2009-09-28 16:59:47 +00:00
Xin LI	13dcbd75c1	Use correct sizeof() object for klist 'list'. Currently, struct klist contained only SLIST_HEAD as its member, thus sizeof(struct klist) would equal to sizeof(struct klist *), so this change makes the code more correct in terms of semantics, but should be a no-op to compiler at this time. Reported by: MQ <antinvidia at gmail com>	2009-09-28 10:22:46 +00:00
David Xu	b101b127f3	In function do_rw_wrlock, when a writer got an error and before returning, check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag, and resume the readers. The error must be EAGAIN, otherwise there must have memory problem, and nobody can rescue the buggy application. The revision 197445 might be reverted.	2009-09-25 00:03:13 +00:00
Alexander Motin	6090db7d38	Do not call BUS_DRIVER_ADDED() for detached buses (attach failed) on driver load. This fixes crash on atapicam module load on systems, where some ata channels (usually ata1) was probed, but failed to attach. Reviewed by: jhb, imp Tested by: many	2009-09-24 17:03:32 +00:00
Roman Divacky	1c2825bd80	Change unsigned foo to u_foo as required by style(9). Requested by: bde Approved by: ed (mentor)	2009-09-22 16:16:02 +00:00
Edward Tomasz Napierala	a9315dded6	Add pieces of infrastructure required for NFSv4 ACL support in UFS. Reviewed by: rwatson	2009-09-22 15:15:03 +00:00
Konstantin Belousov	51a6ef34fb	Remove forward_roundrobin(), it is unused for quite some time. Reviewed by: jhb MFC after: 1 week	2009-09-21 13:09:56 +00:00
Michael Tuexen	8518270e20	Get SCTP working in combination with VIMAGE. Contains code from bz. Approved by: rrs (mentor) MFC after: 1 month.	2009-09-19 14:02:16 +00:00
Alan Cox	fe105d45a2	Add a new sysctl for reporting all of the supported page sizes. Reviewed by: jhb MFC after: 3 weeks	2009-09-18 17:04:57 +00:00
Attilio Rao	39df6da8cc	Don't allocate new unnecessary pages when devstat_alloc() looses the run for re-acuiring the lock, but recheck if new pages are allocatable from the pool and free the previously allocated ones. Tested by: pho, Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-18 13:48:38 +00:00
Roman Divacky	6413e27b3a	Fix the style of the previous commit. Approved by: ed (mentor, implicit)	2009-09-17 17:48:13 +00:00
Roman Divacky	abc8594d70	Make these argument/variable unsigned as the defines for them don't fit into signed 32bit integer. Approved by: ed (mentor, implicit) Approved by: sson	2009-09-17 17:41:28 +00:00
Stacey Son	fdc1a1131e	Add EV_RECEIPT to kevents. EV_RECEIPT is useful to disambiguating error conditions when multiple events structures are passed to kevent(2). The error code is returned in the data field and EV_ERROR is set. Approved by: rwatson (co-mentor)	2009-09-16 03:49:54 +00:00
Stacey Son	1a921c410a	Add the EV_DISPATCH flag to kevents. When the EV_DISPATCH flag is used the event source will be disabled immediately after the delivery of an event. This is similar to the EV_ONESHOT flag but it doesn't delete the event. Approved by: rwatson (co-mentor)	2009-09-16 03:37:39 +00:00
Stacey Son	2c2e449905	Add EVFILT_USER to kevents. Add user events support to kernel events which are not associated with any kernel mechanism but are triggered by user level code. This is useful for adding user level events to an event handler that may also be monitoring kernel events. Approved by: rwatson (co-mentor)	2009-09-16 03:30:12 +00:00
Stacey Son	95128e983f	Add optional touch event filter hooks to kevents. The touch event filter is called when a kernel event data is possibly updated. There are two hook points. First, during a kevent() system call. Second, when an event has been triggered. Approved by: rwatson (co-mentor)	2009-09-16 03:15:57 +00:00
Andre Oppermann	11c99a6d7b	-Put the optimized soreceive_stream() under a compile time option called TCP_SORECEIVE_STREAM for the time being. Requested by: brooks Once compiled in make it easily switchable for testers by using a tuneable net.inet.tcp.soreceive_stream and a corresponding read-only sysctl to report the current state. Suggested by: rwatson MFC after: 2 days	2009-09-15 22:23:45 +00:00
Attilio Rao	435068aab7	Fix sched_switch_migrate(): - In 8.x and above the run-queue locks are nomore shared even in the HTT case, so remove the special case. - The deadlock explained in the removed comment here is still possible even with different locks, with the contribution of tdq_lock_pair(). An explanation is here: (hypotesis: a thread needs to migrate on another CPU, thread1 is doing sched_switch_migrate() and thread2 is the one handling the sched_switch() request or in other words, thread1 is the thread that needs to migrate and thread2 is a thread that is going to be preempted, most likely an idle thread. Also, 'old' is referred to the context (in terms of run-queue and CPU) thread1 is leaving and 'new' is referred to the context thread1 is going into. Finally, thread3 is doing tdq_idletd() or sched_balance() and definitively doing tdq_lock_pair()) * thread1 blocks its td_lock. Now td_lock is 'blocked' * thread1 drops its old runqueue lock * thread1 acquires the new runqueue lock * thread1 adds itself to the new runqueue and sends an IPI_PREEMPT through tdq_notify() to the new CPU * thread1 drops the new lock * thread3, scanning the runqueues, locks the old lock * thread2 received the IPI_PREEMPT and does thread_lock() with td_lock pointing to the new runqueue * thread3 wants to acquire the new runqueue lock, but it can't because it is held by thread2 so it spins * thread1 wants to acquire old lock, but as long as it is held by thread3 it can't * thread2 going further, at some point wants to switchin in thread1, but it will wait forever because thread1->td_lock is in blocked state This deadlock has been manifested mostly on 7.x and reported several time on mailing lists under the voice 'spinlock held too long'. Many thanks to des@ for having worked hard on producing suitable textdumps and Jeff for help on the comment wording. Reviewed by: jeff Reported by: des, others Tested by: des, Giovanni Trematerra <giovanni dot trematerra at gmail dot com> (STABLE_7 based version)	2009-09-15 16:56:17 +00:00
Attilio Rao	4c68dee0fb	Revert r196779 in order to implement a different scheme for newbus locking methodology. Requested by: imp	2009-09-13 15:08:19 +00:00
Luigi Rizzo	446e861708	Make sure callouts are not processed one tick late. The problem was introduced in SVN 180608/ rev 1.114 and affects all users of callout_reset() (including select, usleep, setitimer). A better fix probably involves replicating 'ticks' in the struct callout_cpu; this commit is just a temporary thing so that we can MFC it after a suitable test time and RE approval. MFC after: 3 days	2009-09-12 21:44:34 +00:00
Robert Watson	e76d823b81	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks	2009-09-12 20:03:45 +00:00
Nick Hibma	b22692bd0a	Add a comment on the consequences of reducing the poweroff delay	2009-09-10 18:24:59 +00:00
Konstantin Belousov	b55ef216fe	kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected. Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI. Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week	2009-09-09 20:59:01 +00:00
Konstantin Belousov	1ef6ea9b60	Unlock the image vnode around the call of pmc PMC_FN_PROCESS_EXEC hook. The hook calls vn_fullpath(9), that should not be executed with a vnode lock held. Reported by: Bruce Cran <bruce cran org uk> Tested by: pho MFC after: 3 days	2009-09-09 10:52:36 +00:00
Konstantin Belousov	427992ecdb	In vfs_mark_atime(9), be resistent against reclaimed vnodes. Assert that neccessary locks are taken, since vop might not be called. Tested by: pho MFC after: 3 days	2009-09-09 10:51:50 +00:00
Poul-Henning Kamp	6778431478	Revert previous commit and add myself to the list of people who should know better than to commit with a cat in the area.	2009-09-08 13:19:05 +00:00
Poul-Henning Kamp	b34421bf9c	Add necessary include.	2009-09-08 13:16:55 +00:00
Antoine Brodin	bf3f1fe043	Change w_notrunning and w_stillcold from pointer to array so that sizeof returns what is expected. PR: kern/138557 Discussed with: brucec@ MFC after: 1 month	2009-09-06 13:31:05 +00:00
Konstantin Belousov	db17314ea4	In fhopen, vfs_ref() the mount point while vnode is unlocked, to prevent vn_start_write(NULL, &mp) from operating on potentially freed or reused struct mount *. Remove unmatched vfs_rel() in cleanup. Noted and reviewed by: tegge Tested by: pho MFC after: 3 days	2009-09-06 11:44:46 +00:00
Ed Schouten	4d3b1aacfc	Move ptmx into pty(4). Now that pty(4) is a loadable kernel module, I'd better move /dev/ptmx in there as well. This means that pty(4) now provides almost all pseudo-terminal compatibility code. This means it's very easy to test whether applications use the proper library interfaces when allocating pseudo-terminals (namely posix_openpt and openpty).	2009-09-06 10:27:45 +00:00
Jamie Gritton	babbbb9c57	Allow a jail's name to be the same as its jid (which is the default if no name is specified), but still disallow other numeric names. Reviewed by: zec Approved by: bz (mentor) MFC after: 3 days	2009-09-04 19:00:48 +00:00
Attilio Rao	80002a63db	Add intermediate states for attaching and detaching that will be reused by the enhached newbus locking once it is checked in. This change can be easilly MFCed to STABLE_8 at the appropriate moment. Reviewed by: jhb, scottl Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-03 13:40:41 +00:00
Attilio Rao	8d3635c4db	Fix some bugs related to adaptive spinning: In the lockmgr support: - GIANT_RESTORE() is just called when the sleep finishes, so the current code can ends up into a giant unlock problem. Fix it by appropriately call GIANT_RESTORE() when needed. Note that this is not exactly ideal because for any interation of the adaptive spinning we drop and restore Giant, but the overhead should be not a factor. - In the lock held in exclusive mode case, after the adaptive spinning is brought to completition, we should just retry to acquire the lock instead to fallthrough. Fix that. - Fix a style nit In the sx support: - Call GIANT_SAVE() before than looping. This saves some overhead because in the current code GIANT_SAVE() is called several times. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>	2009-09-02 17:33:51 +00:00
Konstantin Belousov	579b976090	Fix mount reference leak when V_XSLEEP is specified to vn_start_write(). Submitted by: tegge	2009-09-01 12:05:39 +00:00
Konstantin Belousov	8a945d109c	Reintroduce the r196640, after fixing the problem with my testing. Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week	2009-09-01 11:41:51 +00:00
Konstantin Belousov	a505c2c72c	Make the mnt_writeopcount and mnt_secondary_writes counters, used by the suspension code, not greater then mnt_ref reference counter value. Increment mnt_ref together with write counter in vn_start_write()/ vn_start_secondary_write(), releasing in vn_finished_write/vn_finished_secondary_write(). Since r186197, unmount code requires that no writers occured after all references are expired. We still could get write counter incremented for freed or reused struct mount, but it seems to be innocent, since corresponding vnode should be referenced and reclaimed then. Reported by: pho (last half a year), erwin Reviewed by: attilio Tested by: pho, erwin MFC after: 1 week	2009-08-31 10:20:52 +00:00
Bjoern A. Zeeb	ecc2fda872	Make sure FreeBSD binaries without .note.ABI-tag section work correctly and do not match a colliding Debian GNU/kFreeBSD brandinfo statements. For this mark the Debian GNU/kFreeBSD brandinfo that it must have an .note.ABI-tag section and ignore the old EI_OSABI brandinfo when comparing a possibly colliding set of options. Due to SYSINIT we add the brandinfo in a non-deterministic order, so native FreeBSD is not always first. We may want to consider to force native FreeBSD to come first as well. The only way a problem could currently be noticed is when running an i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD brandinfo was matched first, as the fallback to ld-elf32.so.1 does not exist in that case. Reported and tested by: ticso In collaboration with: kib MFC after: 3 days	2009-08-30 14:38:17 +00:00
Konstantin Belousov	f25fa6abb2	Reverse r196640 and r196644 for now.	2009-08-29 21:53:08 +00:00
Konstantin Belousov	b6b2d1bf88	Dispose the kernel stack of the proper thread. Submitted by: alc MFC after: 1 week	2009-08-29 18:01:02 +00:00
Konstantin Belousov	c3cf0b476f	Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week	2009-08-29 13:28:02 +00:00
John Baldwin	2fa8c8d21e	Extend the device pager to support different memory attributes on different pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8. Submitted by: alc MFC after: 1 month	2009-08-28 14:06:55 +00:00
Jamie Gritton	c4884ffa6f	Fix a LOR between allprison_lock and vnode locks by releasing allprison_lock before releasing a prison's root vnode. PR: kern/138004 Reviewed by: kib Approved by: bz (mentor) MFC after: 3 days	2009-08-27 16:15:51 +00:00
Marius Strobl	5486ffc898	Add a temporary workaround which just lets init die instead of causing a panic if it is killed due to a unsolved stack overflow seen very late during shutdown on sparc64 when the gmirror worker process exists, which is a regression introduced in 8.0. Reviewed by: kib MFC after: 3 days	2009-08-26 21:10:47 +00:00
Konstantin Belousov	4f4946d337	Honor the vfs.timestamp_precision sysctl settings for utimes(path, NULL) and similar calls. Obtained from: Petr Salinger, Debian GNU/kFreeBSD, Debian bug #489894 MFC after: 3 days	2009-08-26 14:32:37 +00:00
Jilles Tjoelker	74d1c4927a	Fix poll() on half-closed sockets, while retaining POLLHUP for fifos. This reverts part of r196460, so that sockets only return POLLHUP if both directions are closed/error. Fifos get POLLHUP by closing the unused direction immediately after creating the sockets. The tools/regression/poll/*poll.c tests now pass except for two other things: - if POLLHUP is returned, POLLIN is always returned as well instead of only when there is data left in the buffer to be read - fifo old/new reader distinction does not work the way POSIX specs it Reviewed by: kib, bde	2009-08-25 21:44:14 +00:00
Warner Losh	58a745889f	Rather than havnig enabled/disabled, implement a max queue depth. While usually not an issue, this firewalls bugs in the code that may run us out of memory. Fix a memory exhaustion in the case where devctl was disabled, but the link was bouncing. The check to queue was in the wrong place. Implement a new sysctl hw.bus.devctl_queue to control the depth. Make compatibility hacks for hw.bus.devctl_disable to ease transition. Reviewed by: emaste@ Approved by: re@ (kib) MFC after: asap	2009-08-25 06:25:59 +00:00
Bjoern A. Zeeb	89ffc202d6	Fix handling of .note.ABI-tag section for GNU systems [1]. Handle GNU/Linux according to LSB Core Specification 4.0, Chapter 11. Object Format, 11.8. ABI note tag. Also check the first word of desc, not only name, according to glibc abi-tags specification to distinguish between Linux and kFreeBSD. Add explicit handling for Debian GNU/kFreeBSD, which runs on our kernels as well [2]. In {amd64,i386}/trap.c, when checking osrel of the current process, also check the ABI to not change the signal behaviour for Linux binary processes, now that we save an osrel version for all three from the lists above in struct proc [2]. These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD and Linux binaries on the same machine again for at least i386 and amd64, and no longer break kFreeBSD which was detected as GNU(/Linux). PR: kern/135468 Submitted by: dchagin [1] (initial patch) Suggested by: kib [2] Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD Reviewed by: kib MFC after: 3 days	2009-08-24 16:19:47 +00:00
Ed Schouten	2992abe047	Allow multiple console devices per driver without insane code duplication. Say, a driver wants to have multiple console devices to pick from, you would normally write down something like this: CONSOLE_DRIVER(dev1); CONSOLE_DRIVER(dev2); Unfortunately, this means that you have to declare 10 cn routines, instead of 5. It also isn't possible to initialize cn_arg on beforehand. I noticed this restriction when I was implementing some of the console bits for my vt(4) driver in my newcons branch. I have a single set of cn routines (termcn_*) which are shared by all vt(4) console devices. In order to solve this, I'm adding a separate consdev_ops structure, which contains all the function pointers. This structure is referenced through consdev's cn_ops field. While there, I'm removing CONS_DRIVER() and cn_checkc, which have been deprecated for years. They weren't used throughout the source, until the Xen console driver showed up. CONSOLE_DRIVER() has been changed to do the right thing. It now declares both the consdev and consdev_ops structure and ties them together. In other words: this change doesn't change the KPI for drivers that used the regular way of declaring console devices. If drivers want to use multiple console devices, they can do this as follows: static const struct consdev_ops mydriver_cnops = { .cn_probe = mydriver_cnprobe, ... }; static struct mydriver_softc cons0_softc = { ... }; CONSOLE_DEVICE(cons0, mydriver_cnops, &cons0_softc); static struct mydriver_softc cons1_softc = { ... }; CONSOLE_DEVICE(cons1, mydriver_cnops, &cons1_softc); Obtained from: //depot/user/ed/newcons/...	2009-08-24 10:53:30 +00:00
Marko Zec	0cb8b6a9b7	When "jail -c vnet" request fails, the current code actually creates and leaves behind an orphaned vnet. This change ensures that such vnets get released. This change affects only options VIMAGE builds. Submitted by: jamie Discussed with: bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:16:19 +00:00
Marko Zec	d57425ab65	When registering a protocol to an existing protocol domain via pf_proto_register(), iterate over all existing vnets to call protosw_init() and thus the appropriate .pr_init() handler in the context of each vnet. NB in the future we probably want to separate pr_init() handlers into two, i.e. per-vnet and global, functions. This change has no impact on nooptions VIMAGE builds. Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:03:41 +00:00
Robert Watson	77dfcdc445	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days	2009-08-23 20:40:19 +00:00
Ed Schouten	bfdaa52382	Allow pty(4) to be loaded as a kld. Unfortunately, the wrappers that are present in pts(4) don't have the mechanics to allow pty(4) to be unloaded safely, so I'm forcing this kld to return EBUSY. This also means we have to enable some extra code in pts(4) unconditionally. Proposed by: rwatson	2009-08-23 20:26:09 +00:00
Konstantin Belousov	f2159cc790	Fix the conformance of poll(2) for sockets after r195423 by returning POLLHUP instead of POLLIN for several cases. Now, the tools/regression/poll results for FreeBSD are closer to that of the Solaris and Linux. Also, improve the POSIX conformance by explicitely clearing POLLOUT when POLLHUP is reported in pollscan(), making the fix global. Submitted by: bde Reviewed by: rwatson MFC after: 1 week	2009-08-23 12:44:15 +00:00
Rui Paulo	88579aa572	Constify prime numbers.	2009-08-23 09:55:06 +00:00
Ed Schouten	5c67885a26	Add ttydisc_rint_simple(). I noticed several drivers in our tree don't actually care about parity and framing, such as pts(4), snp(4) (and my partially finished console driver). Instead of duplicating a lot of code, I think we'd better add a utility function for those drivers to quickly process a buffer of input. Also change pts(4) and snp(4) to use this function.	2009-08-23 08:04:40 +00:00
John Baldwin	eb5a1e8f38	This patch fixes two bugs in sglist(9) and improves robustness of the API via better semantics if a request to append an address range to an existing list fails. - When cloning an sglist, properly set the length in the new sglist instead of leaving the new list empty. - Properly compute the amount of data added to an sglist via _sglist_append_buf(). This allows sglist_consume_uio() to properly update uio_resid. - When a request to append an address range to a scatter/gather list fails, restore the sglist to the state it had at the start of the function call instead of resetting it to an empty list. Requested by: np (3) Approved by: re (kib)	2009-08-21 02:59:07 +00:00
John Baldwin	0cef25aeb2	Change the 'resid' parameter to sglist_consume_uio() from an int to a size_t to match the recent type change of the uio_resid member of struct uio. Approved by: re (kib)	2009-08-20 19:23:58 +00:00
John Baldwin	a56fe095f0	Temporarily revert the new-bus locking for 8.0 release. It will be reintroduced after HEAD is reopened for commits by re@. Approved by: re (kib), attilio	2009-08-20 19:17:53 +00:00
Ed Schouten	6c813c1ccc	Small changes to the warning message generated by pty(4): - Only print the warning once, instead of filling up the screen. - Use the word "legacy" for the pty_warningcnt description, to prevent confusion. - Use log() instead of printf(). Discussed with: rwatson, jhb Approved by: re (kib)	2009-08-19 14:30:46 +00:00
Pawel Jakub Dawidek	e477e4fe8e	Remove unused taskqueue_find() function. Reviewed by: dfr Approved by: re (kib)	2009-08-18 13:55:48 +00:00
Attilio Rao	353998acc3	* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to a pointer-fetching specific operation check. Consequently, rename the operation ASSERT_ATOMIC_LOAD_PTR(). * Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking directly alignment on the word boundry, for all the given specific architectures. That's a bit too strict for some common case, but it assures safety. * Add a comment explaining the scope of the macro * Add a new stub in the lockmgr specific implementation Tested by: marcel (initial version), marius Reviewed by: rwatson, jhb (comment specific review) Approved by: re (kib)	2009-08-17 16:17:21 +00:00
Pawel Jakub Dawidek	159ef108e1	Remove OpenSolaris taskq port (it performs very poorly in our kernel) and replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue. Approved by: re (kib)	2009-08-17 09:01:20 +00:00
Pawel Jakub Dawidek	6a3b289388	Because taskqueue_run() can drop tq_mutex, we need to check if the TQ_FLAGS_ACTIVE flag wasn't removed in the meantime, which means we missed a wakeup. Approved by: re (kib)	2009-08-17 08:42:34 +00:00
Ed Schouten	f6b3749db3	Fix small style regression introduced by the MPSAFE newbus code. Approved by: re (rwatson)	2009-08-16 19:55:53 +00:00

1 2 3 4 5 ...

11564 Commits