freebsd-nq

Author	SHA1	Message	Date
John Baldwin	c957c14d05	Revert most of 1.109. Although it improved the situation on one particular motherboard, in practice the changes resulted in many false positives for heavy network loads, etc. resulting in poor performance. Also, the motherboard referenced in the 1.109 log has other problems and simply does not seem to work with the APIC enabled even with the changes in 1.109. The correct fix for that board seems to be to not use the APIC at all. One thing kept from 1.109 is that throttled interrupts are now effectively polled on every clock tick rather than just 10 times per second. MFC after: 1 month Tested by: Shunsuke SHINOMIYA shino at fornext dot org	2004-11-03 22:11:20 +00:00
Poul-Henning Kamp	e0b687d33b	Always initialize bo_private along with bo_ops in getnewvnode(). Spotted by: tegge	2004-11-03 21:09:23 +00:00
Alan Cox	d19ef81437	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.	2004-11-03 20:17:31 +00:00
Poul-Henning Kamp	51f83da622	Restore TTYDEF_LFLAG to set echo bits.	2004-11-03 19:16:55 +00:00
Poul-Henning Kamp	f71a47143f	Don't print the singularly unhelpful message: unknown: not probled (disabled) During verbose boot.	2004-11-03 09:06:45 +00:00
Robert Watson	aae2782bff	Acquire the accept mutex in soabort() before calling sotryfree(), as that is now required. RELENG_5_3 candidate. Foot provided by: Dikshie <dikshie at ppk dot itb dot ac dot id>	2004-11-02 17:15:13 +00:00
John Baldwin	d39d4a6e64	- Change the ddb paging "support" to use a variable (db_lines_per_page) to control the number of lines per page rather than a constant. The variable can be examined and changed in ddb as '$lines'. Setting the variable to 0 will effectively turn off paging. - Change db_putchar() to force out pending whitespace before outputting newlines and carriage returns so that one can rub out content on the current line via '\r \r' type strings. - Change the simple pager to rub out the --More-- prompt explicitly when the routine exits. - Add some aliases to the simple pager to make it more compatible with more(1): 'e' and 'j' do a single line. 'd' does half a page, and 'f' does a full page. MFC after: 1 month Inspired by: kris	2004-11-01 22:15:15 +00:00
Dag-Erling Smørgrav	b0e1e474f7	Add TUNABLE_LONG and TUNABLE_ULONG, and use the latter for the hw.pci.host_mem_start tunable. Add comments to TUNABLE_INT and TUNABLE_QUAD recommending against their use. MFC after: 3 weeks	2004-10-31 15:50:33 +00:00
Pawel Jakub Dawidek	7579614b6d	Don't treat # as a comment in interpreter specification line. This is magic and no other operating system do so (i.e. Solaris, Tru64, Linux, AIX, HP-UX, Irix, MacOS X, NetBSD). Discussed on: current@ Reported by: S³awek ¯ak <zaks@prioris.mini.pw.edu.pl>	2004-10-31 11:12:59 +00:00
Robert Watson	1e4cadcb14	Disable use of synchronization early in the boot by the MAC Framework; for modules linked into the kernel or loaded very early, panics will result otherwise, as the CV code it calls will panic due to its use of a mutex before it is initialized.	2004-10-30 14:20:59 +00:00
Jeff Roberson	0516c8dd4a	- When choosing a thread on the run queue, check to see if its nice is outside of the nice threshold due to a recently awoken thread with a lower nice value. This further reduces the amount of time a positively niced thread gets while running in conjunction with a workload that has many short sleeps (ie buildworld).	2004-10-30 12:19:15 +00:00
Jeff Roberson	6bd0c7fd53	- In sched_prio() check to see if the kse is assigned to a runq as the check for TD_ON_RUNQ() no longer means the thread is really on a run- queue. I suspect this state should be re-evaluated as it must mean something else now. This fixes ULE+KSE+PREEMPTION on UP x86.	2004-10-30 07:35:53 +00:00
Alfred Perlstein	90d75f785a	Allow kill -9 to kill processes stuck in procfs STOPEVENTs.	2004-10-30 02:56:22 +00:00
Poul-Henning Kamp	996b2c82ca	Loose vfs_mountedon()	2004-10-29 11:15:08 +00:00
Poul-Henning Kamp	c108bb741c	Remove VOP_SPECSTRATEGY() from the system.	2004-10-29 10:59:28 +00:00
Poul-Henning Kamp	0cbda9dfd5	Remove the last call in the system to VOP_SPECSTRATEGY(): We can no longer come through the VNODE layer to the disks since all the filesystems now go via geom_vfs to GEOM.	2004-10-29 10:52:31 +00:00
Poul-Henning Kamp	e1f355fe4e	Give the bufobj a private __bo_vnode for now to keep the syncer floating [1] At some point later the syncer will unlearn about vnodes and the filesystems method called by the syncer will know enough about what's in bo_private to do the right thing. [1] Ok, I know, but I couldn't resist the pun.	2004-10-29 09:33:32 +00:00
Alfred Perlstein	cd71c41476	Backout 1.291. re doesn't seem to think this fixes: Desired features for 5.3-RELEASE "More truss problems"	2004-10-29 08:24:41 +00:00
Poul-Henning Kamp	6afb3b1c37	Give dev_strategy() an explict cdev argument in preparation for removing buf->b-dev. Put a bio between the buf passed to dev_strategy() and the device driver strategy routine in order to not clobber fields in the buf. Assert copyright on vfs_bio.c and update copyright message to canonical text. There is no legal difference between John Dysons two-clause abbreviated BSD license and the canonical text.	2004-10-29 07:16:37 +00:00
Poul-Henning Kamp	c5995e45eb	Lock bp->b_bufobj->b_object instead of bp->b_object	2004-10-28 08:38:46 +00:00
Robert Watson	df970488b3	Move the 'debug' sysctl tree under options SYSCTL_DEBUG. It generates an inordinate amount of synchronous console output that is fairly undesirable on slower serial console. It's easily hit by accident when frobbing other sysctls late at night.	2004-10-27 19:26:01 +00:00
Poul-Henning Kamp	20eba72f53	Move the syncer linkage from vnode to bufobj. This is not quite a perfect separation: the syncer still think it knows that everything is a vnode.	2004-10-27 08:05:02 +00:00
Poul-Henning Kamp	5b285effb0	Eliminate unnecessary KASSERT. Eliminate a printf which would never tell us anything anyway because the KASSERT would have triggered.	2004-10-27 06:47:00 +00:00
Poul-Henning Kamp	f6b855f60b	Avoid using bp->b_vp when we already have the vnode by other means.	2004-10-27 06:45:52 +00:00
Maxim Konovalov	2899faf96b	Fix a typo in a comparison appeared in rev. 1.125. Submitted by: JINMEI Tatuya	2004-10-27 05:37:58 +00:00
Alan Cox	df9f17c3df	Synchronize access to the vm page's PG_BUSY flag using the containing vm object's lock. In the same place, eliminate unnecessary checks for a NULL vm object pointer.	2004-10-27 02:05:00 +00:00
Poul-Henning Kamp	6e77a04170	The island council met and voted buf_prewrite() home. Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.	2004-10-26 10:44:10 +00:00
Poul-Henning Kamp	5d9d81e7ea	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.	2004-10-26 07:39:12 +00:00
Alan Cox	cd9c0da805	Hold the lock on the containing vm object when calling vm_page_sleep_if_busy().	2004-10-26 06:58:26 +00:00
Poul-Henning Kamp	9b7cc97f6c	Remove unused si_bsize_best field from struct cdev.	2004-10-26 06:53:00 +00:00
Poul-Henning Kamp	1a1b280063	Get rid of the magic "stash" of cdev structures, we no longer call make_dev() before malloc works.	2004-10-25 13:12:06 +00:00
Poul-Henning Kamp	e4fea39e9e	Add delete_unrhdr() function. It will fail fatally if all allocated numbers have not been returned first.	2004-10-25 12:27:03 +00:00
Poul-Henning Kamp	156cb26583	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().	2004-10-25 09:14:03 +00:00
Poul-Henning Kamp	ee1d0eb330	Remove vnode->v_bsize. This was a dead-end.	2004-10-25 07:50:59 +00:00
Alan Cox	a50b705403	Use VM_ALLOC_NOBUSY to eliminate vm_page_wakeup() calls and the acquisition and release of the global page queues lock required to make the call. Remove GIANT_REQUIRED from vm_hold_free_pages(). All of its VM operations are properly synchronized.	2004-10-25 06:34:14 +00:00
Poul-Henning Kamp	4dcd0ac4cf	Collapse vnode->v_object and buf->b_object into bufobj->bo_object.	2004-10-25 06:02:57 +00:00
Robert Watson	ae1e5c5d8d	Move from using the socket reference count to the file reference count to prevent sockets from being garbage collected during socket-specific system calls. This is the same approach used in most VFS-specific system calls, as well as generic file descriptor system calls such as read() and write(). To do this, add a utility function getsock(), which is logically identical to getvnode() used for the same purpose in VFS. Unlike fgetsock(), it returns with the file reference count elevated, but no bump of the socket reference count. Replace matching calls to fputsock() with fdrop(). This change is made to all socket system calls other than sendfile() and accept(), but the approach should be applicable to those system calls also. This shaves about four mutex operations off of each of these system calls, including send() and recv() variants, adding about 1% to pps on minimal UDP packets for UP using netblast, and 4% on SMP. Reviewed by: pjd	2004-10-24 23:45:01 +00:00
Alan Cox	01ad40dac5	Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().	2004-10-24 20:09:59 +00:00
Poul-Henning Kamp	b792bebeea	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().	2004-10-24 20:03:41 +00:00
Poul-Henning Kamp	9197ce2ee5	Add a new per-thread private flag: TDP_GEOM. This flag gets set whenever the thread posts an event on the GEOM event queue, and if the flag is set when the thread is prepared to return to userland from the kernel, g_waitidle() will be called to make sure that the posted events have completed. This can replace an insufficient number of g_waitidle() calls in various other places, and has the advantage of being failsafe: Any system call which does a VOP_OPEN()/VOP_CLOSE will now correctly wait for any geom events it posted as part of spoils or tastes. Assert that topology and Giant is not held in g_waitidle().	2004-10-23 20:49:17 +00:00
Poul-Henning Kamp	186e51cb30	Drop Giant around the call to g_waitidle(). This is necessary to allow any geom events which need it to pick up Giant.	2004-10-23 20:21:05 +00:00
Robert Watson	299b4e7fa6	Rebuild from syscalls.master:1.178.	2004-10-23 20:01:32 +00:00
Robert Watson	3e8c244949	Add system call place-holders for the following system calls implementing Sun's BSM Audit API on FreeBSD: audit() auditon() getauid() setauid() getaudit() setaudit() getaudit_addr() setaudit_addr() auditctl() Submitted by: Wayne Salamon <wsalamon at computer dot org> Obtained from: TrustedBSD Project	2004-10-23 20:00:43 +00:00
Andre Oppermann	3a82a5451c	socreate() does an early abort if either the protocol cannot be found, or pru_attach is NULL. With loadable protocols the SPACER dummy protocols have valid function pointers for all methods to functions returning just EOPNOTSUPP. Thus the early abort check would not detect immediately that attach is not supported for this protocol. Instead it would correctly get the EOPNOTSUPP error later on when it calls the protocol specific attach function. Add testing against the pru_attach_notsupp() function pointer to the early abort check as well.	2004-10-23 19:06:43 +00:00
Andre Oppermann	480fa3f985	Aquire GIANT in pf_proto_[un]register() before manipulating the protosw.	2004-10-23 18:52:06 +00:00
David Xu	c283653201	Remove P_STOPPED_TRACE bit if debugger dies without a chance to detach debugged process.	2004-10-23 11:20:26 +00:00
Robert Watson	857a600580	Add an annotation to the comment for sysv_ipc.c to indicate that the MAC Framework doesn't require checks in ipcperm() because checks relating to System V IPC will be performed in individual IPC implementations.	2004-10-22 12:12:40 +00:00
Robert Watson	397b3428eb	In osethostname(), don't need to call suser() directly as userland_sysctl() will perform all necessary privilege checks for the caller.	2004-10-22 12:10:50 +00:00
Robert Watson	b0e86f6ac2	When MAC is enabled, warn if getnewvnode() is asked to produce a vnode without a mountpoint. In this scenario, there's no useful source for a label on the vnode, since we can't query the mountpoint for the labeling strategy or default label.	2004-10-22 11:04:58 +00:00
Poul-Henning Kamp	ff7c5a4880	Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite jest, of most excellent fancy: he hath taught me lessons a thousand times; and now, how abhorred in my imagination it is! my gorge rises at it. Here were those hacks that I have curs'd I know not how oft. Where be your kludges now? your workarounds? your layering violations, that were wont to set the table on a roar? Move the skeleton of specfs into devfs where it now belongs and bury the rest.	2004-10-22 09:59:37 +00:00
Poul-Henning Kamp	494eb176e7	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.	2004-10-22 08:47:20 +00:00
Poul-Henning Kamp	a76d8f4ec9	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.	2004-10-21 15:53:54 +00:00
Poul-Henning Kamp	1bca607b9f	Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx. Initialize the bo_mtx when we allocate a vnode i getnewvnode() For now we point to the vnodes interlock mutex, that retains the exact same locking sematics. Move v_numoutput from vnode to bufobj. Add renaming macro to postpone code sweep.	2004-10-21 14:42:31 +00:00
Poul-Henning Kamp	67647b2312	Polish vtruncbuf() to improve readability and style a bit.	2004-10-21 14:13:54 +00:00
Poul-Henning Kamp	e163395619	Simplify buf_vlist_remove(). Now that we have encapsulated the splaytree related information into a structure we can eliminate the half of this function.	2004-10-21 13:48:50 +00:00
Stephan Uphoff	f742a1edcd	Zero terminate empty sting in kdb_sysctl_available. Approved by: sam (mentor) MFC after: 1 week	2004-10-21 01:11:25 +00:00
Alan Cox	0f777d7d9b	Modify the vm object locking in do_sendfile() so that the containing object is locked when vm_page_io_finish() is called on a page. This is to satisfy a new, post-RELENG_5 assertion in vm_page_io_finish(). (I am in the process of transitioning the responsibility for synchronizing access to various fields/flags on the page from the global page queues lock to the per-object lock.) Tripped over by: obrien@	2004-10-20 17:44:40 +00:00
Andre Oppermann	312c75c362	Support for dynamically loadable and unloadable protocols within existing protocol families. The protosw[] array of any particular protocol family ("domain") is of fixed size defined at compile time. This made it impossible to dynamically add or remove any protocols to or from it. We work around this by introducing so called SPACER's which are embedded into the protosw[] array at compile time. The SPACER's have a special protocol number (32767) to indicate the fact that they are SPACER's but are otherwise NULL. Only as many protocols can be dynamically loaded as SPACER's are provided in the protosw[] structure. The pr_usrreqs structure is treated more special and contains pointers to dummy functions only returning EOPNOTSUPP. This is needed because the use of those functions pointers is usually not checked within the kernel because until now it was assumed to be a valid function pointer. Instead of fixing all potential callers we just return a proper error code. Two new functions provide a clean API to register and unregister a protocol. The register function expects a pointer to a valid and complete struct protosw including a pointer to struct pru_usrreqs provided by the caller. Upon successful registration the pr_init() function will be called to finish initialization of the protocol. The unregister function restores the SPACER in place of the protocol again. It is the responseability of the caller to ensure proper closing of all sockets and freeing of memory allocation by the unloading protocol. sys/protosw.h o Define generic PROTO_SPACER to be 32767 o Prototypes for all pru__notsupp() functions o Prototypes for pf_proto_[un]register() functions kern/uipc_domain.c o Global struct pr_usrreqs nousrreqs containing valid pointers to the pru__notsupp() functions o New functions pf_proto_[un]register() kern/uipc_socket2.c o New functions bodies for all pru_*_notsupp() functions	2004-10-19 15:13:30 +00:00
Robert Watson	81158452be	Push acquisition of the accept mutex out of sofree() into the caller (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>	2004-10-18 22:19:43 +00:00
Poul-Henning Kamp	95bc568977	Add new function ttyinitmode() which sets our systemwide default modes on a tty structure. Both the ".init" and the current settings are initialized allowing the function to be used both at attach and open time. The function takes an argument to decide if echoing should be enabled. Echoing should not be enabled for regular physical serial ports unless they are consoles, in which case they should be configured by ttyconsolemode() instead. Use the new function throughout.	2004-10-18 21:51:27 +00:00
Scott Long	b96741f410	If a process needs to be swapped in, wakeup the swapper from within critical_exit as the process is getting scheduled to run. This is subotimal but for now avoid the LOR between the scheduler and the sleepq systems. This is a 5.3 candidate. Submitted by: davidxu MFC After: 3 days	2004-10-16 06:38:22 +00:00
Poul-Henning Kamp	33da4e5bd8	Make pty's always come up in echo mode.	2004-10-15 09:03:07 +00:00
Poul-Henning Kamp	fffc55152b	Add missing chunk of code to enforce the lock-bits of termios. This solves the problem where serial consoles suddenly required DCD to be asserted. Reported by: Randy Bush <randy@psg.com>	2004-10-14 18:30:24 +00:00
Nate Lawson	66ae9f6384	Update flags patch for the !ISA case. * Get flags first, in case there is no devclass. * Reset flags after each probe in case the next driver has no hints so it doesn't inherit the old ones. * Set them again before the winning probe. Tested ok both with and without ACPI for ISA device flags. Reviewed by: imp MFC after: 1 day	2004-10-14 17:14:56 +00:00
John-Mark Gurney	583ef6b6d2	/me gets the wrong patch out of the pr :( /me had the write patch w/o comments on his test system. Pointed out by: kuriyama and ache Pointy hat to: jmg	2004-10-14 03:26:50 +00:00
Stephan Uphoff	7c71b6453a	Fix maybe_preempt_in_ksegrp for !SMP. Tested by: tegge Reviewed by: julian Approved by: sam (mentor) MFC after: 3 days	2004-10-13 22:07:04 +00:00
John-Mark Gurney	d46316e8f9	fix a bug where signal events didn't set the flags for attach/detach.. PR: 72234 MFC after: 2 days	2004-10-13 20:55:19 +00:00
Nate Lawson	6f857c4b9f	Set flags for devices before probing them. In the non-ISA case, flags set via hints were not getting passed to the child. PR: kern/72489 MFC after: 1 day	2004-10-13 07:10:41 +00:00
Poul-Henning Kamp	43c72732aa	Don't call driver close unless we have one.	2004-10-12 21:40:41 +00:00
Poul-Henning Kamp	13e7430fde	Make !SMP kernels compile, and as far as I can tell, work again.	2004-10-12 20:57:37 +00:00
John Baldwin	ebcfea8764	Whitespace fix.	2004-10-12 19:36:00 +00:00
John Baldwin	2ff0e645d1	Refine the turnstile and sleep queue interfaces just a bit: - Add a new _lock() call to each API that locks the associated chain lock for a lock_object pointer or wait channel. The _lookup() functions now require that the chain lock be locked via _lock() when they are called. - Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup the associated queue structure internally via _lookup() rather than accepting a pointer from the caller. For turnstiles, this means that the actual lookup of the turnstile in the hash table is only done when the thread actually blocks rather than being done on each loop iteration in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup() is no longer used outside of the sleep queue code except to implement an assertion in cv_destroy(). - Change sleepq_broadcast() and sleepq_signal() to require that the chain lock is already required. For condition variables, this lets the cv_broadcast() and cv_signal() functions lock the sleep queue chain lock while testing the waiters count. This means that the waiters count internal to condition variables is no longer protected by the interlock mutex and cv_broadcast() and cv_signal() now no longer require that the interlock be held when they are called. This lets consumers of condition variables drop the lock before waking other threads which can result in fewer context switches. MFC after: 1 month	2004-10-12 18:36:20 +00:00
John Baldwin	c7836018ea	Add a WITNESS_WARN() to uiomove() to whine if locks are held when this function is called. MFC after: 1 month	2004-10-12 18:27:14 +00:00
Stephan Uphoff	c6a08cf2d7	Directly modifying the priority of a thread that may be on the runqueue can break the sorting order of the ksegp run queue. Tested by: pho Reviewed by: jhb, julian Approved by: sam (mentor) MFC: ASAP	2004-10-12 16:31:23 +00:00
Stephan Uphoff	84f9d4b137	Prevent preemption in slot_fill. Implement preemption between threads in the same ksegp in out of slot situations to prevent priority inversion. Tested by: pho Reviewed by: jhb, julian Approved by: sam (mentor) MFC: ASAP	2004-10-12 16:30:20 +00:00
Stephan Uphoff	b9a80acadb	Force MUTEX_WAKE_ALL. A race condition in single thread wakeup may break priority inheritance. Tested by: pho Reviewed by: jhb,julian Approved by: sam (mentor) MFC: ASAP	2004-10-12 16:28:18 +00:00
Poul-Henning Kamp	a1bd71b260	Add missing zero flag arguments to calls to userland_sysctl()	2004-10-12 07:49:15 +00:00
Peter Wemm	a7bc3102c4	Put on my peril sensitive sunglasses and add a flags field to the internal sysctl routines and state. Add some code to use it for signalling the need to downconvert a data structure to 32 bits on a 64 bit OS when requested by a 32 bit app. I tried to do this in a generic abi wrapper that intercepted the sysctl oid's, or looked up the format string etc, but it was a real can of worms that turned into a fragile mess before I even got it partially working. With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have it not abort. Things like netstat, ps, etc have a long way to go. This also fixes a bug in the kern.ps_strings and kern.usrstack hacks. These do matter very much because they are used by libc_r and other things.	2004-10-11 22:04:16 +00:00
Gleb Smirnoff	366538f251	Rename _m_tag_free() to m_tag_free_default() and make it non-static. Approved by: sam	2004-10-11 18:40:19 +00:00
Robert Watson	cc34aa2094	Add entropy harvest mutex to hard-coded spin lock witness lock order, remove previous entropy harvesting mutex names as they are no longer present. Commit to this file was ommitted when randomdev_soft.c:1.5 was made. Feet shot: Robert Huff <roberthuff at rcn dot com>	2004-10-11 08:26:18 +00:00
Robert Watson	35b260cd69	Rework sofree() logic to take into account a possible race with accept(). Sockets in the listen queues have reference counts of 0, so if the protocol decides to disconnect the pcb and try to free the socket, this triggered a race with accept() wherein accept() would bump the reference count before sofree() had removed the socket from the listen queues, resulting in a panic in sofree() when it discovered it was freeing a referenced socket. This might happen if a RST came in prior to accept() on a TCP connection. The fix is two-fold: to expand the coverage of the accept mutex earlier in sofree() to prevent accept() from grabbing the socket after the "is it really safe to free" tests, and to expand the logic of the "is it really safe to free" tests to check that the refcount is still 0 (i.e., we didn't race). RELENG_5 candidate. Much discussion with and work by: green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>	2004-10-11 08:11:26 +00:00
Gleb Smirnoff	8c4a75be4a	Revert last commit since it breaks API. Requested by: sam	2004-10-10 09:16:48 +00:00
Julian Elischer	042b7b1af0	Don't release the slot twice.. sched_rem() has already done it. Submitted by: stephan uphoff (ups at tree dot com) MFC after: 3 days	2004-10-10 05:19:22 +00:00
Julian Elischer	9b036bdf5a	Remove duplicate line.	2004-10-10 05:07:43 +00:00
Gleb Smirnoff	42c5607501	Remove inlined m_tag_free(). Rename _m_tag_free() to m_tag_free() and make it visible (same way as in OpenBSD). Describe usage in manpage. This change is useful for creating custom free methods, which call default free method at their end. While here, make malloc declaration for mbuf tags more informative. Approved by: julian (mentor), sam MFC after: 1 month	2004-10-09 13:25:19 +00:00
Brian Feldman	41f57cbc8d	Don't "implicitly order all sleep locks before spin locks" in witness when the spin lock in question isn't -- it's the critical_enter() that KDB set. No more panic in DDB for console -> syscons -> tty -> knote operations.	2004-10-09 08:16:37 +00:00
David Xu	84e0b075f6	Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX. Reviewed by: deischen	2004-10-07 13:50:10 +00:00
David Xu	ebfcca3d61	Regen to unbreak world. Pointy hat to: mtm	2004-10-07 01:09:46 +00:00
David Schultz	cda5aba4b9	Back out rev 1.240; it is unnecessary. In particular, p1 == curthread, so _PHOLD(p1) will not have to block to swap in p1. Noticed by: jhb	2004-10-06 23:53:49 +00:00
Mike Makonnen	401901ac43	Close a race between a thread exiting and the freeing of it's stack. After some discussion the best option seems to be to signal the thread's death from within the kernel. This requires that thr_exit() take an argument. Discussed with: davidxu, deischen, marcel MFC after: 3 days	2004-10-06 14:23:00 +00:00
David Xu	195f5806e4	Close a race between thr_create and sysctl -w, the thr_scope_sys could be changed when thr_create is running, and we tested it for several times.	2004-10-06 02:29:19 +00:00
Greg Lehey	57259f2864	vtryrecycle: Don't rely on type VBAD alone to mean that we don't need to clean the vnode. If v_data is set, we still need to clean it. This code change should catch all incidents of the previous commit (INVARIANTS only).	2004-10-06 02:09:59 +00:00
Greg Lehey	f2154b33d2	getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning. Discussion: this panic (or waning) only occurs when the kernel is compiled with INVARIANTS. Otherwise the problem (which means that the vp->v_data field isn't NULL, and represents a coding error and possibly a memory leak) is silently ignored by setting it to NULL later on. Panicking here isn't very helpful: by this time, we can only find the symptoms. The panic occurs long after the reason for "not cleaning" has been forgotten; in the case in point, it was the result of severe file system corruption which left the v_type field set to VBAD. That issue will be addressed by a separate commit.	2004-10-06 02:06:11 +00:00
David Xu	e0cfeb44a8	Restore some code removed in revision 1.193 and 1.194, julian said he'd like to keep these code.	2004-10-06 00:49:41 +00:00
David Xu	906ac69d08	In original kern_execve() code, at the start of the function, it forces all other threads to suicide, problem is execve() could be failed, and a failed execve() would change threaded process to unthreaded, this side effect is unexpected. The new code introduces a new single threading mode SINGLE_BOUNDARY, in the mode, all threads should suspend themself at user boundary except the singler. we can not use SINGLE_NO_EXIT because we want to start from a clean state if execve() is successful, suspending other threads at unknown point and later resuming them from there and forcing them to exit at user boundary may cause the process to start from a dirty state. If execve() is successful, current thread upgrades to SINGLE_EXIT mode and forces other threads to suicide at user boundary, otherwise, other threads will be resumed and their interrupted syscall will be restarted. Reviewed by: julian	2004-10-06 00:40:41 +00:00
Julian Elischer	f8135176c9	Fix whitespace botch that only showed up in the commit message diff :-/ MFC after: 4 days	2004-10-05 22:14:02 +00:00
Julian Elischer	fcb7c67b7b	Slight cleanup in the single threading code. MFC after: 4 days	2004-10-05 22:05:25 +00:00
Julian Elischer	c20c691bed	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days	2004-10-05 22:03:10 +00:00
Julian Elischer	c5c3fb335f	Oops. left out part of the diff. MFC after: 4 days	2004-10-05 21:26:27 +00:00
Julian Elischer	d39063f20d	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days	2004-10-05 21:10:44 +00:00
Julian Elischer	6f23adbc11	light rearrangement of some code to get some locking more correct MFC after: 4 days	2004-10-05 20:48:16 +00:00
Julian Elischer	e5bedcef92	Break out to a separate function, the code to revert a multithreaded process back to officially being a non-threaded program. MFC after: 4 days	2004-10-05 20:39:26 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
John Baldwin	b85975277e	Add a critical section in turnstile_unpend() from before dropping the turnstile chain lock until after making all the awakened threads runnable. First, this fixes a priority inversion race. Second, this attempts to finish waking up all of the threads waiting on a turnstile before doing a preemption. Reviewed by: Stephan Uphoff (who found the priority inversion race)	2004-10-05 18:00:30 +00:00
Pawel Jakub Dawidek	8d02a378aa	Back out changes which were introduced to delay mounting root file system. Those changes were made on gmirror needs, but now gmirror handles this by itself.	2004-10-05 11:26:43 +00:00
David Xu	b3a4fb14b3	Use scheduler api to adjust thread priority.	2004-10-05 09:10:30 +00:00
Warner Losh	14889b4229	Add taskqueue_drain. This waits for the specified task to finish, if running, or returns. The calling program is responsible for making sure that nothing new is enqueued. # man page coming soon.	2004-10-05 04:16:01 +00:00
Poul-Henning Kamp	37abb77f25	Change the perfectly precise message printf("No buffers busy after final sync"); to printf("All buffers synced."); in order to not leave the users wondering if there should be.	2004-10-04 13:13:23 +00:00
Julian Elischer	c233d032d2	Another case where we need to guard against a partially constructed process. Submitted by: Stephan Uphoff ( ups at tree.com ) MFC after: 3 days	2004-10-04 06:45:48 +00:00
Julian Elischer	a9b5dc7d6d	Always strt out with an initilalised ksegrp structure. MFC after: 3 days	2004-10-03 20:06:11 +00:00
David Xu	482d099c50	Don't bother to turn off other P_STOPPED bits for SIGKILL, doing so would cause kernel to produce an unkillable process in some cases, especially, P_STOPPED_SINGLE has a singling thread, turning off the bit would mess the state.	2004-10-03 13:23:49 +00:00
Alan Cox	86dac448f2	Add a SOCKBUF_LOCK() to a rarely executed path in do_sendfile().	2004-10-02 05:37:47 +00:00
Alfred Perlstein	50434413b0	Clear a process's procfs trace points upon delivery of SIGKILL. MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")	2004-10-01 14:15:20 +00:00
Poul-Henning Kamp	ba2851254f	Fix a LOR relating to freeing cdevs.	2004-10-01 06:33:39 +00:00
Alfred Perlstein	576c004fb9	cover soreadable and sowriteable with the corresponding socketbuffer locks.	2004-10-01 05:54:06 +00:00
David Schultz	299bc7367d	Avoid calling _PHOLD(p1) with p2's lock held, since _PHOLD() may block to swap in p1. Instead, call _PHOLD earlier, at a point where the only lock held happens to be p1's.	2004-10-01 05:01:29 +00:00
John Baldwin	4a2aa5d054	Fix a typo to fix the !DIAGNOSTIC build. Submitted by: many	2004-09-30 18:13:18 +00:00
Poul-Henning Kamp	0cd3cb9a15	Assign a global unit number for the tty slave devices (init/lock) using the new subr_unit.c code. For now assert Giant in ttycreate() and ttyfree(). It is not obvious that it will ever pay off to lock these with anything else.	2004-09-30 10:38:48 +00:00
Poul-Henning Kamp	f6bde1fd05	Add a new API for allocating unit number (-like) resources. Allocation is always lowest free unit number. A mixed range/bitmap strategy for maximum memory efficiency. In the typical case where no unit numbers are freed total memory usage is 56 bytes on i386. malloc is called M_WAITOK but no locking is provided (yet). A bit of experience will be necessary to determine the best strategy. Hopefully a "caller provides locking" strategy can be maintained, but that may require use of M_NOWAIT allocation and failure handling. A userland test driver is included.	2004-09-30 07:04:03 +00:00
Brian Feldman	1abf2c3678	Account for alias devices when tearing them down in destroy_dev() so we don't panic on a NULL cdev->si_devsw.	2004-09-29 16:38:38 +00:00
Dag-Erling Smørgrav	479439b4fe	Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables. MFC after: 3 days	2004-09-29 14:21:40 +00:00
Poul-Henning Kamp	cf287576e5	Add functions to create and free the "tty-ness" of a serial port in a generic way. This code will allow a similar amount of code to be removed from most if not all serial port drivers. Add generic cdevsw for tty devices. Add generic slave cdevsw for init/lock devices. Add ttypurge function which wakes up all know generic sleep points in the tty code, and calls into the hw-driver if it provides a method. Add ttycreate function which creates tty device and optionally cua device. In both cases .init/.lock devices are created as well. Change ttygone() slightly to also call the hw driver provided purge routine. Add ttyfree() which will purge and destroy the cdevs. Add ttyconsole mode for setting console friendly termios on a port.	2004-09-28 19:33:49 +00:00
John-Mark Gurney	7b12509082	improve the mbuf m_print function.. Only pull length from pkthdr if there is one, detect mbuf loops and stop, add an extra arg so you can only print the first x bytes of the data per mbuf (print all if arg is -1), print flags using %b (bitmask)... No code in the tree appears to use m_print, and it's just a maner of adding -1 as an additional arg to m_print to restore original behavior.. MFC after: 4 days	2004-09-28 18:40:18 +00:00
Poul-Henning Kamp	961da2716b	Give cluster_write() an explicit vnode argument. In the future a struct buf will not automatically point out a vnode for us.	2004-09-27 19:14:10 +00:00
Poul-Henning Kamp	a5993c332a	Used cached cdevsw pointer.	2004-09-27 06:34:30 +00:00
Poul-Henning Kamp	743cd76a73	Add cdevsw->d_purge() support. This device method shall wake up any threads sleeping in the device driver and make the depart the drivers code for good.	2004-09-27 06:18:25 +00:00
Marcel Moolenaar	e6aa723212	Fix a bug introduced in the previous commit: kdb_cpu_trap() gets to the trapframe via kdb_frame, but kdb_frame was not initialized until after the call to kdb_cpu_trap(). Ergo: kdb_cpu_trap() was moved too far up. Pointy hat: marcel	2004-09-26 06:48:59 +00:00
Julian Elischer	2179a22cc7	Use the universal 'threaded process' flag rather than the specific tests for different threading systems. MFC after: 1 week	2004-09-25 00:53:46 +00:00
John Baldwin	a9a64385e7	Some more whitespace, style, and comment fixes. Submitted by: bde (mostly)	2004-09-24 20:27:04 +00:00
Pawel Jakub Dawidek	d0257d9c10	Rename 'mount_root_delay' tunable to 'vfs.root.mountdelay', which fits a bit better to our current naming scheme. Discussed with: ru	2004-09-24 09:19:03 +00:00
Poul-Henning Kamp	b2deb1d205	Remove the cdevsw() function which is now unused.	2004-09-24 08:30:57 +00:00
Poul-Henning Kamp	6f077de596	Hold threadcount while throbbing cdevsw in our underlying driver. This is a bit heavyhanded, and will be simplified once the tty code learns to properly deal with disappearing hw and drivers.	2004-09-24 08:26:03 +00:00
Poul-Henning Kamp	8f7bea8b99	Hold threadcount reference when we call into the underlying console driver.	2004-09-24 07:16:56 +00:00
Poul-Henning Kamp	fe0b82752b	Eliminate devsw() call, we are not dereferencing the pointer.	2004-09-24 07:11:02 +00:00
Poul-Henning Kamp	8ff33adb8c	Hold threadref while we throb cdevsw in devtoname()	2004-09-24 06:29:23 +00:00
Poul-Henning Kamp	38f878d739	Use vn_isdisk() to check if vnode is a disk. (repeat, CVS core dumped on me)	2004-09-24 06:23:31 +00:00
Poul-Henning Kamp	233b81be1c	use vn_isdisk() to see if vnode is a disk.	2004-09-24 06:21:43 +00:00
Poul-Henning Kamp	6e8d420249	Hold dev_lock and check for NULL devsw pointer when we service FIODTYPE ioctl.	2004-09-24 06:16:48 +00:00
Poul-Henning Kamp	70526ca6a5	Hold dev_lock and check for NULL devsw pointer when we determine if a vnode is a disk.	2004-09-24 06:16:08 +00:00
Poul-Henning Kamp	6230ce6aa9	use dev_re[fl]thread() rather than home rolled versions.	2004-09-24 05:55:03 +00:00
Poul-Henning Kamp	2c15afd888	Introduce dev_re[lf]thread() functions. dev_refthread() will return the cdevsw pointer or NULL. If the return value is non-NULL a threadcount is held which much be released with dev_relthread(). If the returned cdevsw is NULL no threadcount is held on the device.	2004-09-24 05:54:32 +00:00
John Baldwin	6111dcd2ef	A modest collection of various and sundry style, spelling, and whitespace fixes. Submitted by: bde (mostly)	2004-09-24 00:38:15 +00:00
Olivier Houchard	e0370a187c	On arm, set the default elf brand to FreeBSD, until the binutils do it for us.	2004-09-23 23:29:24 +00:00
John Baldwin	ea73c1ea21	Don't try to protect td_sticks with sched_lock. It doesn't need it as it is only accessed by curthread.	2004-09-23 21:03:58 +00:00
John Baldwin	654e92bf10	- Assert sched_lock in upcall_remove() since it is needed there and all callers already lock it there. - Lock sched_lock slightly earlier in kse_create() so that it covers kg_numupcalls.	2004-09-23 21:03:16 +00:00
John Baldwin	63993cf011	- Don't try to unlock Giant if single threading fails since we don't have it locked. - Unlock Giant before calling exit1() since exit1() does not require Giant.	2004-09-23 21:01:50 +00:00
Poul-Henning Kamp	fd92686dd5	Split the ioctl function in control and slave side, this eliminated a troublesome devsw() call.	2004-09-23 16:13:46 +00:00
Poul-Henning Kamp	1a52a73d68	Eliminate DEV_STRATEGY() macro: call dev_strategy() directly. Make dev_strategy() handle errors and departing devices properly.	2004-09-23 14:45:04 +00:00
Pawel Jakub Dawidek	5a19f8b0c4	Introduce new /boot/loader.conf variable: root_mount_delay. It can be used to delay mounting root partition to give a chance to GEOM providers to show up. Now, when there is no needed provider, vfs_rootmount() function will look for it every second and if it can't be find in defined time, it'll ask for root device name (before this change it was done immediately). This will allow to boot from gmirror device in degraded mode.	2004-09-23 10:13:18 +00:00
Poul-Henning Kamp	a0e78d2eb0	Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount of the number of threads which are inside whatever is behind the cdevsw for this particular cdev. Make the device mutex visible through dev_lock() and dev_unlock(). We may want finer granularity later. Replace spechash_mtx use with dev_lock()/dev_unlock().	2004-09-23 07:17:41 +00:00
John Baldwin	7eaec467d8	Various small style fixes.	2004-09-22 15:24:33 +00:00
Julian Elischer	2e2e32b201	Revert the last change.. Better to kill all other threads than to panic the system if 2 threads call execve() at the same time. A better fix will be committed later. Note that this only affects the case where the execve fails.	2004-09-22 01:30:23 +00:00
Julian Elischer	297800599a	In a threaded process, don't kill off all the other threads until we have a reasonable chance that the eceve() is going to succeeed. I.e. wait until we've done the permission checks etc. MFC after: 1 week	2004-09-21 21:05:13 +00:00
Poul-Henning Kamp	90a660e199	If a vnode has no v_rdev we cannot hope to answer FIODTYPE ioctl.	2004-09-21 08:33:05 +00:00
John Baldwin	b89daf89a8	Remove unused macro.	2004-09-20 19:01:44 +00:00
Brian Somers	a04946cf6e	CTASSERT that MSZIE is a power of 2 (otherwise dtom() breaks) Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks) As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE anyway, but that was an implementation side-effect.... KASSERT -> CTASSERT suggested by: dd@ Approved by: silence on -net	2004-09-20 08:52:04 +00:00
David Schultz	8daa8c602a	The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.	2004-09-19 18:34:17 +00:00
Poul-Henning Kamp	9e16d66689	Initialize new ttys a bit more. Check TS_GONE flag for gone-ness.	2004-09-18 17:02:18 +00:00
Marcel Moolenaar	ddf4122592	Move makectx() after kdb_cpu_trap(), so the PCB will have possible MD corrections made to the trapframe. This is more logical.	2004-09-17 22:27:23 +00:00
Poul-Henning Kamp	e8d3e08098	Add ttyopen and ttyclose functions which will do the right stuff for most if not all of our tty drivers in the future. Centralizing this stuff enables us to remove about 100 lines of almost but not quite perfectly copy&paste code from each tty driver.	2004-09-17 11:43:35 +00:00
Poul-Henning Kamp	3e6bf9fb86	Add ttyalloc() which in due time will be the successor to ttymalloc(), but without the "struct tty *" argument.	2004-09-17 06:13:47 +00:00
Poul-Henning Kamp	f33ed262da	Use the tty->t_sc field to find our softc.	2004-09-16 12:07:25 +00:00
Julian Elischer	14f0e2e9bf	clean up thread runq accounting a bit. MFC after: 3 days	2004-09-16 07:12:59 +00:00
Julian Elischer	9da3e923f4	e specific code to revert a partial add ot teh run queue, not remrunqueue() which can't handle a partially added thread. MFC after: 1 week	2004-09-16 05:37:40 +00:00
Poul-Henning Kamp	08dbd671ff	Remove unused B_WRITEINPROG flag	2004-09-15 21:49:22 +00:00
Poul-Henning Kamp	273350ad0f	Simplify initialization of va_null a little bit.	2004-09-15 21:42:03 +00:00
Poul-Henning Kamp	4095f485c8	undent some functions a bit.	2004-09-15 21:08:58 +00:00
Poul-Henning Kamp	ab19cad78e	stylistic polishing.	2004-09-15 20:54:23 +00:00
Julian Elischer	915996978d	Try harder to get back to being a non threaded process. Submitted by: DavidXu MFC after: 3 days	2004-09-15 18:39:09 +00:00
Julian Elischer	e8807f22f9	Oops accidentally removed #ifdef SCHED_4BSD as part of another commit This function is not yet used in ULE	2004-09-15 03:51:51 +00:00
John-Mark Gurney	31580e6817	unlock global lock in kqueue_scan before msleep'ing to prevent dead lock.. we didn't unlock global lock earlier to prevent just having to reaquire it again.. Found by: peter Reviewed by: ps MFC after: 3 days	2004-09-14 18:38:16 +00:00
Julian Elischer	1f9f5df61d	Commit a fix for some panics we've been seeing with preemption. MFC after: 2 days	2004-09-13 23:06:39 +00:00
Julian Elischer	b2578c6c06	Add some kasserts	2004-09-13 23:02:52 +00:00
Julian Elischer	a3aa559270	make some of these conditions apply equally to both threading systems.	2004-09-13 22:10:04 +00:00
Poul-Henning Kamp	67673e6677	Create struct snapdata which contains the snapshot fields from cdev and the previously malloc'ed snapshot lock. Malloc struct snapdata instead of just the lock. Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes). While here, give the private readblock() function a vnode argument in preparation for moving UFS to access GEOM directly.	2004-09-13 07:29:45 +00:00
Poul-Henning Kamp	883d3c0c07	Remove the buffercache/vnode side of BIO_DELETE processing in preparation for integration of p4::phk_bufwork. In the future, local filesystems will talk to GEOM directly and they will consequently be able to issue BIO_DELETE directly. Since the removal of the fla driver, BIO_DELETE has effectively been a no-op anyway.	2004-09-13 06:50:42 +00:00
Scott Long	1e7fad6b6a	Revert the previous round of changes to td_pinned. The scheduler isn't fully initialed when the pmap layer tries to call sched_pini() early in the boot and results in an quick panic. Use ke_pinned instead as was originally done with Tor's patch. Approved by: julian	2004-09-11 10:07:22 +00:00
Julian Elischer	513efa5b39	Try committing from the right tree this time MFC after: 2 days	2004-09-11 00:11:09 +00:00
Julian Elischer	5c854accc1	Make up my mind if cpu pinning is stored in the thread structure or the scheduler specific extension to it. Put it in the extension as the implimentation details of how the pinning is done needn't be visible outside the scheduler. Submitted by: tegge (of course!) (with changes) MFC after: 3 days	2004-09-10 22:28:33 +00:00
Julian Elischer	3389af30e8	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week	2004-09-10 21:04:38 +00:00
John-Mark Gurney	ca95b2de43	remove giant required from kqueue_close.. Reported by: kuriyama MFC after: 3 days	2004-09-10 03:14:32 +00:00
Robert Watson	030c6fb156	Hard code witness lock order for BPF locks.	2004-09-09 05:01:37 +00:00
Poul-Henning Kamp	1affa3adc8	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.	2004-09-07 09:17:05 +00:00
Julian Elischer	246409821c	fix typo MFC after: 2 days	2004-09-07 07:04:47 +00:00
Julian Elischer	5498350529	Make debug printf less threatenning and make it only print out once. MFC after: 2 days	2004-09-07 06:38:22 +00:00
Julian Elischer	a8b491c121	Give libthr a choice (per system) of scope_system or scope_thread scheduling. MFC after: 4 days	2004-09-07 06:33:39 +00:00
John-Mark Gurney	80e6bbe95b	make witness it's own sysctl branch instead of using _ to do this. I have left the old tunables in to give people a few days to transition their loader.conf and sysctl.conf's over to the new names.. MFC after: 5 days	2004-09-06 23:27:28 +00:00
John-Mark Gurney	9b90387dcf	don't call f_detach if the filter has alread removed the knote.. This happens when a proc exits, but needs to inform the user that this has happened.. This also means we can remove the check for detached from proc and sig f_detach functions as this is doing in kqueue now... MFC after: 5 days	2004-09-06 19:02:42 +00:00
Julian Elischer	6a574b2afc	Don't do IPIs on behalf of interrupt threads. just punt straight on through to teh preemption code. Make a KASSSERT out of a condition that can no longer occur. MFC after: 1 week	2004-09-06 07:23:14 +00:00
Julian Elischer	0fe38d47b7	slight code cleanup MFC after: 1 week	2004-09-05 23:23:58 +00:00
Alfred Perlstein	4c0bef6230	It's too easy to panic the machine when INVARIANTS are turned on and you botch a call to nmount(2). This is because there is an INVARIANTS check that asserts that opt->len must be zero if opt->val is not NULL. The problem is that the code does not actually follow this invariant if there is an error while processing mount options. Fix the code to honor the INVARIANT. Silence on: fs@	2004-09-05 22:24:28 +00:00
Robert Watson	76f6939888	Expand the scope of the socket buffer locks in sopoll() to include the state test as well as set, or we risk a race between a socket wakeup and registering for select() or poll() on the socket. This does increase the cost of the poll operation, but can probably be optimized some in the future. This appears to correct poll() "wedges" experienced with X11 on SMP systems with highly interactive applications, and might affect a plethora of other select() driven applications. RELENG_5 candidate. Problem reported by: Maxim Maximov <mcsi at mcsi dot pp dot ru> Debugged with help of: dwhite	2004-09-05 14:33:21 +00:00
Julian Elischer	bce73aeddb	turn on IPIs for 4bsd scheduler by default. MFC after: 1 week	2004-09-05 02:19:53 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
Julian Elischer	00b0483d5c	Don't declare a function we are not defining.	2004-09-03 09:19:49 +00:00
Julian Elischer	37c28a022b	fix compile for UP	2004-09-03 09:15:10 +00:00
Julian Elischer	293968d8d3	ooops finish last commit. moved the variables but not the declarations.	2004-09-03 08:19:31 +00:00
Julian Elischer	82a1dfc16d	Move 4bsd specific experimental IP code into the 4bsd file. Move the sysctls into kern.sched	2004-09-03 07:42:31 +00:00
Alan Cox	94ddc7076d	Push Giant deep into vm_forkproc(), acquiring it only if the process has mapped System V shared memory segments (see shmfork_myhook()) or requires the allocation of an ldt (see vm_fault_wire()).	2004-09-03 05:11:32 +00:00
Robert Watson	b6ac582880	Tag AIO as requiring Giant over the network stack using NET_NEEDS_GIANT(). RELENG_5 candidate.	2004-09-03 03:19:14 +00:00
Julian Elischer	44692526be	remove unused code MFC after: 2 days	2004-09-02 23:37:41 +00:00
Scott Long	9923b511ed	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.	2004-09-02 18:59:15 +00:00
Julian Elischer	7e37fb1729	Blush forgot to test non SMP builds.. oddly enough some UP code (particularly in the acpi code) seems to want this in a UP build. (I guess so you can have a sigle kernel module that works for both)	2004-09-01 18:05:43 +00:00
Julian Elischer	6804a3ab6d	Give the 4bsd scheduler the ability to wake up idle processors when there is new work to be done. MFC after: 5 days	2004-09-01 06:42:02 +00:00
Julian Elischer	2630e4c90c	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days	2004-09-01 02:11:28 +00:00
David Xu	cf1867f932	Remove TDP_USTATCLOCK, we no longer need it because we now always update tick count for userland in thread_userret. This change also removes a "no upcall owned" panic because fuword() schedules an upcall under heavily loaded, and code assumes there is no upcall can occur. Reported and Tested by: Peter Holm <peter@holm.cc>	2004-08-31 11:52:05 +00:00
Julian Elischer	5995adc206	Remove an unneeded argument.. The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days	2004-08-31 07:34:54 +00:00
Julian Elischer	99e9dcb817	Remove sched_free_thread() which was only used in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code. MFC after: 2 days	2004-08-31 06:12:13 +00:00
Warner Losh	2063e05140	Fix BUS_DEBUG case	2004-08-30 05:48:49 +00:00
Pawel Jakub Dawidek	2e4db7cfd7	Add a missing '\n'.	2004-08-30 01:10:20 +00:00
David Xu	45a4bfa17d	Only test return_instead if P_SINGLE_EXIT is set, otherwise a fork() syscall can interrupt other thread's syscall in sleepq_catch_signals(). Current, all callers know thread_suspend_check may suspend thread itself, so we need't to check return_instead for normal suspension flags (no P_SINGLE_EXIT set). Tested by: deischen Reported by: Maarten L. Hekkelman <m.hekkelman@cmbi.kun.nl>	2004-08-29 23:10:02 +00:00
Warner Losh	f52c5866ea	Initial support (disabled) for rebidding devices. I've been running this in my tree for a while and in its disabled state there are no issues. It isn't enabled yet because some drivers (in acpi) have side effects in their probe routines that need to be resolved in some manner before this can be turned on. The consensus at the last developer's summit was to provide a static method for each driver class that will return characteristics of the driver, one of which is if can be reprobed idempotently.	2004-08-29 18:25:21 +00:00
Warner Losh	3cdf2a3f20	MFp4: Merge in the patches, submitted long ago by someone whose email address I've lost, that move the location information to the atttach routine as well. While one could use devinfo to get this data, that is difficult and error prone and subject to races for short lived devices. Would make a good MT5 candidate.	2004-08-29 18:11:10 +00:00
Dag-Erling Smørgrav	0eac4495db	Remove the HW_WDOG option; it serves no purpose. MFC after: 3 days	2004-08-29 11:10:09 +00:00
Ian Dowse	70b7ffee1b	Add support for completing the installation of ELF relocatable object format modules that were read in by the loader. Loading modules via the loader should now work on the amd64 platform.	2004-08-29 01:21:51 +00:00
David Xu	5897f840f0	1. try to use existing mailbox address in thread_update_usr_ticks. 2. remove '\n' in KASSERT.	2004-08-28 04:16:32 +00:00
David Xu	ad1280b593	Move TDF_CAN_UNBIND to thread private flags td_pflags, this eliminates need of sched_lock in some places. Also in thread_userret, remove spare thread allocation code, it is already done in thread_user_enter. Reviewed by: julian	2004-08-28 04:08:05 +00:00
Peter Wemm	6f96710c60	Backout the previous backout (with scott's ok). sched_ule.c:1.122 is believed to fix the problem with ULE that this change triggered.	2004-08-28 01:04:44 +00:00
David E. O'Brien	dd68efd05b	s/smp_rv_mtx/smp_ipi_mtx/g Requested by: jhb	2004-08-28 00:49:55 +00:00
Peter Wemm	91c1172a5a	Commit Jeff's suggested changes for avoiding a bug that is exposed by preemption and/or the rev 1.79 kern_switch.c change that was backed out. The thread was being assigned to a runq without adding in the load, which would cause the counter to hit -1.	2004-08-28 00:49:22 +00:00
Andre Oppermann	2580f4e584	Poll() uses the array smallbits that is big enough to hold 32 struct pollfd's to avoid calling malloc() on small numbers of fd's. Because smalltype's members have type char, its address might be misaligned for a struct pollfd. Change the array of char to an array of struct pollfd. PR: kern/58214 Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at> Reviewed by: bde (a long time ago) MFC after: 3 days	2004-08-27 21:23:50 +00:00
Alexander Kabaev	4cef6d5a53	Reintroduce slightly modified patch from kern/69964. Check for LK_HAVE_EXL in both acquire invocations. MFC after: 5 days	2004-08-27 01:41:28 +00:00
Ian Dowse	0ca311f6a1	When trying each linker class in turn with a preloaded module, exit the loop if the preload was successful. Previously a successful preload was ignored if the linker class was not the last in the list.	2004-08-27 01:20:26 +00:00
Robert Watson	161a0c7cff	Don't hold the UNIX domain socket subsystem lock over the body of the UNIX domain socket garbage collection implementation, as that risks holding the mutex over potentially sleeping operations (as well as introducing some nasty lock order issues, etc). unp_gc() will hold the lock long enough to do necessary deferal checks and set that it's running, but then release it until it needs to reset the gc state. RELENG_5 candidate. Discussed with: alfred	2004-08-25 21:24:36 +00:00
Robert Watson	fe0f2d4e11	Conditional acquisition of socket buffer mutexes when testing socket buffers with kqueue filters is no longer required: the kqueue framework will guarantee that the mutex is held on entering the filter, either due to a call from the socket code already holding the mutex, or by explicitly acquiring it. This removes the last of the conditional socket locking.	2004-08-24 05:28:18 +00:00
Warner Losh	0160658e84	Set the description to NULL in the right detach routine. This should keep dangling pointers to strings in loaded modules from hanging around after the drivers are unloaded.	2004-08-24 05:19:15 +00:00
David Xu	d30412a8db	Remove checking of single exit flag in thread_user_enter(), this is generic code for threaded process, should not be here.	2004-08-23 22:54:37 +00:00
Peter Wemm	f1009e1e1f	Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock. We were obtaining different spin mutexes (which disable interrupts after aquisition) and spin waiting for delivery. For example, KSE processes do LDT operations which use smp_rendezvous, while other parts of the system are doing things like tlb shootdowns with a different mutex. This patch uses the common smp_rendezvous mutex for all MD home-grown IPIs that spinwait for delivery. Having the single mutex means that the spinloop to aquire it will enable interrupts periodically, thus avoiding the cross-ipi deadlock. Obtained from: dwhite, alc Reviewed by: jhb	2004-08-23 21:39:29 +00:00
Alexander Kabaev	cffdaf2dce	Temporarily back out r1.74 as it seems to cause a number of regressions accordimg to numerous reports. It might get reintroduced some time later when an exact failure mode is understood better.	2004-08-23 02:39:45 +00:00
Robert Watson	d963815baf	Make debug.kdb.stop_cpus also a TUNABLE() so it can be set prior to boot to help debug early nasty hangs.	2004-08-22 15:10:52 +00:00
Julian Elischer	ad59c36ba1	diff reduction for upcoming patch. Use a macro that masks some of the odd goings on with sub-structures, because they will go away anyhow.	2004-08-22 05:21:41 +00:00
Don Lewis	1a1c04b6b3	Don't bother calling the module event handlers from module_shutdown() in the shutdown_final state if the RB_NOSYNC flag is set. The specific motivation in this case is that a system panic in an interrupt context results in a call to module_shutdown(), which calls g_modevent(), which calls g_malloc(..., M_WAITOK), which results in a second panic. While g_modevent() could be fixed to not call malloc() for MOD_SHUTDOWN events (which it doesn't handle in any case), it is probably also a good idea to entirely skip the execution of the module shutdown handlers after a panic. This may be a MFC candidate for RELENG_5.	2004-08-20 21:47:48 +00:00
Don Lewis	8ded654028	Don't attempt to trigger the syncer thread final sync code in the shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the likely cause of hangs after a system panic that are keeping crash dumps from being done. This is a MFC candidate for RELENG_5. MFC after: 3 days	2004-08-20 19:21:47 +00:00
John Baldwin	55c45354ff	Remove some dead code under a straggling APIC_IO #ifdef that I missed back before 5.2.	2004-08-20 17:24:52 +00:00
Robert Watson	7b38f0d3c3	Back out uipc_socket.c:1.208, as it incorrectly assumes that all sockets are connection-oriented for the purposes of kqueue registration. Since UDP sockets aren't connection-oriented, this appeared to break a great many things, such as RPC-based applications and services (i.e., NFS). Since jmg isn't around I'm backing this out before too many more feet are shot, but intend to investigate the right solution with him once he's available. Apologies to: jmg Discussed with: imp, scottl	2004-08-20 16:24:23 +00:00
Scott Long	2384290ced	Revert the previous change. It works great for 4BSD but causes major problems for ULE. The reason is quite unknown and worrisome.	2004-08-20 05:58:38 +00:00
Scott Long	2c86298c6c	In maybe_preempt(), ignore threads that are in an inconsistent state. This is an effective band-aid for at least some of the scheduler corruption seen recently. The real fix will involve protecting threads while they are inconsistent, and will come later. Submitted by: julian	2004-08-20 05:18:50 +00:00
John-Mark Gurney	5d6dd4685a	make sure that the socket is either accepting connections or is connected when attaching a knote to it... otherwise return EINVAL... Pointed out by: benno	2004-08-20 04:15:30 +00:00
Nate Lawson	0b54748fec	Add a newline.	2004-08-19 20:16:09 +00:00
Poul-Henning Kamp	d298f91974	Add bioq_takefirst(). If the bioq is empty, NULL is returned. Otherwise the front element is removed and returned. This can simplify locking in many drivers from: lock() bp = bioq_first(bq); if (bp == NULL) { unlock() return } bioq_remove(bp, bq) unlock to: lock() bp = bioq_takefirst(bq); unlock() if (bp == NULL) return;	2004-08-19 19:51:51 +00:00
Nate Lawson	c003dab8ff	Add debugging to rman_manage_region() as well. This is useful since we manage subregions in ACPI. MFC after: 3 days	2004-08-19 16:41:12 +00:00
Robert Watson	16239786ca	Remove GIANT_REQUIRED from setugidsafety() as knote_fdclose() no longer requires Giant.	2004-08-19 14:59:51 +00:00
John Baldwin	007ddf7e7a	Now that the return value semantics of cv's for multithreaded processes have been unified with that of msleep(9), further refine the sleepq interface and consolidate some duplicated code: - Move the pre-sleep checks for theaded processes into a thread_sleep_check() function in kern_thread.c. - Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c. Specifically, if a thread is awakened by something other than a signal while checking for signals before going to sleep, clear TDF_SINTR in sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in that edge case during an interruptible sleep. Also, fix sleepq_check_signals() to properly handle the condition if TDF_SINTR is clear rather than requiring the callers of the sleepq API to notice this edge case and call a non-_sig variant of sleepq_wait(). - Clarify the flags arguments to sleepq_add(), sleepq_signal() and sleepq_broadcast() by creating an explicit submask for sleepq types. Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of 0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and move the setting of TDF_SINTR to sleepq_add() if this flag is set rather than sleepq_catch_signals(). Note that it is the caller's responsibility to ensure that sleepq_catch_signals() is called if and only if this flag is passed to the preceeding sleepq_add(). Note that this also removes a sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures that for an interruptible sleep, TDF_SINTR is always set when TD_ON_SLEEPQ() is true.	2004-08-19 11:31:42 +00:00
John-Mark Gurney	000968010a	add options MPROF_BUFFERS and MPROF_HASH_SIZE that adjust the sizes of the mutex profiling buffers. Document them in the man page and in NOTES. Ensure _HASH_SIZE is larger than _BUFFERS with a cpp error.	2004-08-19 06:38:26 +00:00
Robert Watson	4c5bc1ca39	Add UNP_UNLOCK_ASSERT() to asser that the UNIX domain socket subsystem lock is not held. Rather than annotating that the lock is released after calls to unp_detach() with a comment, annotate with an assertion. Assert that the UNIX domain socket subsystem lock is not held when unp_externalize() and unp_internalize() are called.	2004-08-19 01:45:16 +00:00
Robert Watson	2cfe973b62	Annotate call to DELAY() in interrupt storm mitigation as being something to revisit. Approved by: re (scottl)	2004-08-17 04:09:09 +00:00
Alexander Kabaev	c8b876219f	Upgrading a lock does not play well together with acquiring an exclusive lock and can lead to two threads being granted exclusive access. Check that no one has the same lock in exclusive mode before proceeding to acquire it. The LK_WANT_EXCL and LK_WANT_UPGRADE bits act as mini-locks and can block other threads. Normally this is not a problem since the mini locks are upgraded to full locks and the release of the locks will unblock the other threads. However if a thread reset the bits without obtaining a full lock other threads are not awoken. Add missing wakeups for these cases. PR: kern/69964 Submitted by: Stephan Uphoff <ups at tree dot com> Very good catch by: Stephan Uphoff <ups at tree dot com>	2004-08-16 15:01:22 +00:00
David E. O'Brien	78c37b0de8	s/MAX_SAFE_MAXVNODES/MAXVNODES_MAX/g	2004-08-16 08:33:37 +00:00
Robert Watson	40f2ac28a0	Always acquire the UNIX domain socket subsystem lock (UNP lock) before dereferencing sotounpcb() and checking its value, as so_pcb is protected by protocol locking, not subsystem locking. This prevents races during close() by one thread and use of ths socket in another. unp_bind() now assert the UNP lock, and uipc_bind() now acquires the lock around calls to unp_bind().	2004-08-16 04:41:03 +00:00
Brian Feldman	8912c44d9f	Add the missing knote_fdclose().	2004-08-16 03:09:01 +00:00

... 3 4 5 6 7 ...

8014 Commits