freebsd-skq

Author	SHA1	Message	Date
alfred	4ba4eee48e	constify the second args to timevaladd() and timevalsub().	2003-10-26 02:19:00 +00:00
rwatson	b43632b153	Check (locked) before performing an advisory unlock following a failure of vn_start_write(). Otherwise, we may inconsistently attempt to release the advisory lock. Pointed out by: teggej	2003-10-25 16:43:50 +00:00
rwatson	e4935eb9ae	When generate a core dump, use advisory locking in an advisory way: if we do acquire an advisory lock, great! We'll release it later. However, if we fail to acquire a lock, we perform the coredump anyway. This problem became particularly visible with NFS after the introduction of rpc.lockd: if the lock manager isn't running, then locking calls will fail, aborting the core dump (resulting in a zero-byte dump file). Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>	2003-10-25 16:14:09 +00:00
rwatson	723804b261	Allow MAC policies to block/revoke kern_alq write access to a file. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: jeff	2003-10-25 16:10:41 +00:00
imp	a73d1ed688	Convenience functions to generate notifications from the kernel. The ACPI code will start using these shortly. Reviewed by: njl	2003-10-24 22:41:54 +00:00
jmg	abe50d83e6	don't allow reading from files that haven't been open'd for reading.	2003-10-24 21:07:53 +00:00
jhb	7d3a70bca2	- Add a DDB command 'show intrcnt' to show the non-zero interrupt counts. - Add a DDB function to dump the contents of an ithread and optionally details about each handler in that ithread. This function can be used by MD code to implement DDB commands that display information about interrupt sources and their registered handlers.	2003-10-24 21:05:30 +00:00
jhb	aee5e4f914	Writes to p_flag in __setugid() no longer need Giant.	2003-10-23 21:20:34 +00:00
jhb	84fcdc7f5a	Move the P_COWINPROGRESS flag from being a per-process p_flag to being a per-thread td_pflag which doesn't require any locks to read or write as it is only read or written by curthread on itself. Glanced at by: mckusick	2003-10-23 21:14:08 +00:00
wollman	dabc1a332d	Add appropriate const poisoning to the assert_*locked() family so that I can call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic. Inspired by: the evil and rude OpenAFS cache manager code	2003-10-23 18:17:36 +00:00
rwatson	966f1ed0b5	mac_Finish break-out of kern_mac.c into parts: Include src/sys/security/mac/mac_internal.h in kern_mac.c. Remove redundant defines from the include: SYSCTL_DECL(), debug macros, composition macros. Unstaticize various bits now exposed to the remainder of the kernel: mac_init_label(), mac_destroy_label(). Remove all the functions now implemented in mac_process/mac_vfs/mac_net/ mac_pipe. Also remove debug counters, sysctls exporting debug counters, enforcement flags, sysctls exporting enforcement flags. Leave module declaration, sysctl nodes, mactemp malloc type, system calls. This should conclude MAC/LINT/NOTES breakage from the break-out process, but I'm running builds now to make sure I caught everything. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-10-22 20:59:31 +00:00
rwatson	bae92ad3e6	Variable cleanup following break-out of kern_mac.c into sys/security/mac: Unstaticize mac_late. Remove ea_warn_once, now in mac_vfs.c. Unstaticisize mac_policy_list, mac_static_policy_list, use struct mac_policy_list_head instead of LIST_HEAD() directly. Unstaticize and un-inline MAC policy locking functions so they can be referenced from mac_*.c. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-10-22 20:47:41 +00:00
rwatson	e36fe77ad7	Rename error_select() to mac_error_select(), and unstaticize so it can be used from src/sys/security/mac/mac_*.c. Obtained from: TrustedBSD Project Sponosred by: DARPA, Network Associates Laboratories	2003-10-22 20:42:22 +00:00
silby	f0e686a675	Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.	2003-10-21 18:28:36 +00:00
simokawa	aeab60718e	We need to initialize bp->b_offset and bp->b_iooffset becuase bp->b_blkno is ignored now.	2003-10-21 13:18:19 +00:00
scottl	0564a95894	Don peril-sensitive sunglasses and mark pipe(2) as MPSAFE. I've beaten up on it for the last 15 hours with no signs of problems. It gives a small (1%) gain on buildworld since pipe_read/pipe_write are already free of Giant.	2003-10-21 07:03:27 +00:00
phk	01dad440c7	Remove KASSERTS on B_PHYS for vmapbuf() and vunmapbuf(), B_PHYS is going away.	2003-10-21 06:53:10 +00:00
marcel	7df6e35964	Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.	2003-10-21 01:13:49 +00:00
sam	b058e8665f	revert default for idle polling to zero until we can resolve the livelock problem	2003-10-20 21:14:24 +00:00
jeff	d477fdf956	- If a thread is not bound to a kse return 0 from sched_pctcpu(). Reported by: pawel.worach@nordea.com	2003-10-20 19:55:21 +00:00
alc	512489f301	Initialize the buf's b_object in pbgetvp(). Clear it in pbrelvp(). (This facilitates synchronization of the vm page's valid field using the vm object's lock.) Suggested by: tegge	2003-10-20 18:24:38 +00:00
dwmalone	72e9866f3d	Mark dup as MPSAFE. Giant was pushed into dup ages ago, but it looks like it was missed in syscalls.master. Spotted by: alc	2003-10-20 16:16:03 +00:00
alc	ba669f6199	- Synchronize access to a vm page's valid field using the containing vm object's lock.	2003-10-20 05:57:55 +00:00
marcel	87282b0dcb	Put the RSE backing store at a fixed address. This change is triggered by libguile that needs to know the base of the RSE backing store. We currently do not export the fixed address to userland by means of a sysctl so user code needs to hardcode it for now. This will be revisited later. The RSE backing store is now at the bottom of region 4. The memory stack is at the top of region 4. This means that the whole region is usable for the stacks, giving a 61-bit stack space. Port: lang/guile (depended of x11/gnome2)	2003-10-20 05:34:10 +00:00
dwmalone	be405a4cbd	falloc allocates a file structure and adds it to the file descriptor table, acquiring the necessary locks as it works. It usually returns two references to the new descriptor: one in the descriptor table and one via a pointer argument. As falloc releases the FILEDESC lock before returning, there is a potential for a process to close the reference in the file descriptor table before falloc's caller gets to use the file. I don't think this can happen in practice at the moment, because Giant indirectly protects closes. To stop the file being completly closed in this situation, this change makes falloc set the refcount to two when both references are returned. This makes life easier for several of falloc's callers, because the first thing they previously did was grab an extra reference on the file. Reviewed by: iedowse Idea run past: jhb	2003-10-19 20:41:07 +00:00
alc	676085d671	- Add vm object locking to vfs_clean_pages() and vfs_bio_set_validclean(). This is to synchronize access to the vm page's valid field by vm_page_set_validclean().	2003-10-19 20:39:06 +00:00
peter	7b3a8f1308	Tidy up loose ends in the idle process. Call the MI cpu_idle() function for all platforms now. XXX alpha/sparc64/powerpc should fill in the function. Submitted by: bde	2003-10-19 02:43:57 +00:00
phk	50ecf36141	Initialize b_iooffset before calling VOP_[SPEC]STRATEGY	2003-10-18 19:49:46 +00:00
phk	4b7ade98cd	Initialize b_iooffset before calling strategy	2003-10-18 19:48:21 +00:00
phk	80b79c02da	Don't report b_pblkno, it is going away.	2003-10-18 17:59:02 +00:00
phk	adc5cdd07a	Report bio_pblkbo instead of bio_blkno.	2003-10-18 17:27:10 +00:00
phk	b60969a35e	Make bioq_disksort() sort on the bio_offset field instead of bio_pblkno.	2003-10-18 15:50:56 +00:00
phk	4c2cb3f397	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)	2003-10-18 14:10:28 +00:00
phk	d004fc1e31	I think rwatson got the sign wrong here...	2003-10-18 12:16:17 +00:00
phk	bd00da531d	Initialize bp->b_offset before calling VOP_STRATEGY()	2003-10-18 11:13:31 +00:00
phk	6b528c7911	Convert some if(bla) panic("foo") to KASSERTS to improve grep-ability.	2003-10-18 09:32:39 +00:00
phk	3dfc831549	The size and contents of the DEV_STRATEGY() macro has progressed to the point where it being a macro is no longer sensible, and it will only be more so in days to come. BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not be used directly anymore. Put the contents of both in the new function dev_strategy() and make DEV_STRATEGY() call that function. In addition, this allows us to make the rather magic bufdonebio() helper function static. This alse saves hunderedandsome bytes of code in a typical kernel.	2003-10-18 09:03:15 +00:00
rwatson	9be7eda223	Wrap db_active check in #ifdef DDB, as db_active is not defined ifndef DDB.	2003-10-18 02:23:57 +00:00
rwatson	bf32888a2a	Add a new cn_flags fields to struct consdev, the low-level console definition structure. Define one flag, CN_FLAG_NODEBUG, which indicates the console driver cannot be used in the context of the debugger. This may be used, for example, if the console device interacts with kernel services that cannot be used from the debugger context, such as the network stack. These drivers are skipped over for calls to cn_checkc() and cn_putc(), and the calling function simply moves on to the next available console.	2003-10-18 02:13:39 +00:00
jeff	1869c3be82	- Remove the correct thread from the run queue in setrunqueue(). This fixes ULE + KSE.	2003-10-17 20:53:04 +00:00
phk	888092f317	Simplify count_dev()	2003-10-17 11:56:48 +00:00
peter	c7a92905e0	Halt the cpu on amd64 as well. For some strange reason, this makes a fair bit of difference to the power consumption and lets my cpu cool down enough for the temperature sensitive fan controller to completely stop the cpu fan at times.	2003-10-17 03:49:03 +00:00
marcel	1d3178bbc0	Implement cpu_idle() on ia64. We put the processor in a lightweight halt state that minimizes power consumption while still preserving cache and TLB coherency. Halting the processor is not conditional at this time. Tested with UP and SMP kernels.	2003-10-17 02:24:59 +00:00
jeff	7a27809dea	- The kse may be null in sched_pctcpu(). Reported by: kris	2003-10-16 21:13:14 +00:00
jeff	fcda5c4f74	- Only kse_reassign() in the !running case. Reported by: kris	2003-10-16 20:32:57 +00:00
jeff	3b6db25186	- Call sched_add() with the correct argument on SMP. Reported by: Valentin Chopov <valentin@valcho.net>	2003-10-16 20:06:19 +00:00
jeff	6e062906a7	- Fix a minor problem with my last commit, we don't want to return from sched_switch if the thread is running, we want to fall through and pick a new thread because we have been preempted.	2003-10-16 10:04:54 +00:00
dfr	3dac505582	* Add multiple inheritance to kobj. Each class can have zero or more base classes and if a method is not found in a given class, its base classes are searched (in the order they were declared). This search is recursive, i.e. a method may be define in a base class of a base class. * Change the kobj method lookup algorithm to one which is SMP-safe. This relies only on the constraint that an observer of a sequence of writes of pointer-sized values will see exactly one of those values, not a mixture of two or more values. This assumption holds for all processors which FreeBSD supports. * Add locking to kobj class initialisation. * Add a simpler form of 'inheritance' for devclasses. Each devclass can have a parent devclass. Searches for drivers continue up the chain of devclasses until either a matching driver is found or a devclass is reached which has no parent. This can allow, for instance, pci drivers to match cardbus devices (assuming that cardbus declares pci as its parent devclass). * Increment __FreeBSD_version. This preserves the driver API entirely except for one minor feature used by the ISA compatibility shims. A workaround for ISA compatibility will be committed separately. The kobj and newbus ABI has changed - all modules must be recompiled.	2003-10-16 09:16:28 +00:00
jeff	4aea3a9433	- Collapse sched_switchin() and sched_switchout() into sched_switch(). Now mi_switch() calls sched_switch() which calls cpu_switch(). This is actually one less function call than it had been.	2003-10-16 08:53:46 +00:00
jeff	991febf6dd	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.	2003-10-16 08:39:15 +00:00
jeff	bf29a9dd12	- The non iterative algorithm for interact_update was broken due to rounding errors. This was the source of the majority of the interactivity problems. Reintroduce the old algorithm and its XXX. - Up the interactivity threshold to 30. It really could stand to be even a tiny bit higher. - Let the sleep and run time accumulate up to 5 seconds of history rather than two. This helps stop XFree86 from becoming non-interactive during bursts of activity.	2003-10-16 08:17:43 +00:00
jeff	72dab909cc	- If our user_pri doesn't match our actual priority our priority has been elevated either due to priority propagation or because we're in the kernel in either case, put us on the current queue so that we dont stop others from using important resources. At some point the priority elevations from sleeping in the kernel should go away. - Remove an optimization in sched_userret(). Before we would only set NEEDRESCHED if there was something of a higher priority available. This is a trivial optimization and it breaks priority propagation because it doesn't take threads which we may be blocking into account. Notice that the thread which is blocking others gets up to one tick of cpu time before we honor this NEEDRESCHED in sched_clock().	2003-10-15 07:47:06 +00:00
peter	8cbefc5894	The KERN_PROC_PROC sysctl took 4 args in 5.0-REL and 5.1-REL. We need to accept this for a bit longer. Requiring the new order of 3 args only was not very helpful.	2003-10-15 03:11:46 +00:00
sam	4e958f714d	Change default for kern.polling.idle_poll back to 1. This was set to 0 because Luigi observed livelock but in recent testing it did not occur so I'm re-enabling it by default. Reviewed by: luigi	2003-10-14 18:39:36 +00:00
phk	637477251d	Made use of 'error' argument, which was unused (by mistake) before. Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>	2003-10-14 08:09:43 +00:00
imp	e34b12b9cd	With DIAGNOSTICS, sometimes we get weird crashes when some driver accesses softc after it is freed. Use a different malloc type for softc than the rest of the bus code to make it more clear when these things happen that it is the driver that's at fault, not the bus code. Suggested by: sam and/or phk (I think)	2003-10-14 06:22:07 +00:00
jeff	ce562d35ad	- Add a mising vn_finished_write() Pointy hat: jeff Found by: robert Obtained from: kirk	2003-10-14 00:38:34 +00:00
davidxu	b24bb74b9e	Don't clear signal mask in execsig(). RELENG_4 does not clear it and POSIX asks to inherit signal mask for execv.	2003-10-13 14:03:08 +00:00
jeff	bd83534d11	- In SCHED_CURR() add holding Giant to the list of criteria that will keep you on the current queue. In the future, it would be nice if priority propagation could deterministicly pluck a thread off of the next queue and put it on the current queue. Until then this hack stops us from holding up our entire current queue, including interrupt handlers, while a thread on the next queue is blocked while holding Giant. - Inherit our pctcpu information from our parent.	2003-10-12 21:07:31 +00:00
alc	bfc309eda8	In vfs_bio_clrbuf(), ignore the state of the object lock if the page is the "bogus" page. Found by: tegge	2003-10-12 18:26:48 +00:00
phk	edaf9214c4	Simplify vn_isdisk() a bit.	2003-10-12 14:04:39 +00:00
jmg	f1d456150e	fix a problem referencing free'd memory. This is only a problem for kqueue write events on a socket and you regularly create tons of pipes which overwrites the structure causing a panic when removing the knote from the list. If the peer has gone away (and it's a write knote), then don't bother trying to remove the knote from the list. Submitted by: Brian Buchanan and myself Obtained from: nCircle	2003-10-12 07:06:02 +00:00
jeff	5b9cc4b22e	- Fix a typo, I meant & and not \|. This was causing lockups from the syncer looping forever due to list corruption. Solved by: tegge	2003-10-11 21:50:45 +00:00
alc	7f1718017c	- Synchronize access to a page's valid field in vfs_bio_clrbuf() by using the lock from its containing object. - Remove GIANT_REQUIRED from vm_hold_load_pages().	2003-10-10 07:26:21 +00:00
robert	8519aa2ff0	Implement preliminary support for the PT_SYSCALL command to ptrace(2).	2003-10-09 10:17:16 +00:00
tjr	1ac708bd28	Remove support for the unused 4th component of the KERN_PROC_PROC sysctl.	2003-10-06 01:26:11 +00:00
jeff	6be3f23255	- Add a missing vn_start_write() to flushbufqueues(). This could have caused snapshot related problems. - The vp can not be NULL here or we would panic in vfs_bio_awrite(). Stop confusing the logic by checking for it in several places. Submitted by: kirk and then rototilled by me to remove vp == NULL checks.	2003-10-05 22:16:08 +00:00
bms	3960d861d4	Bring back sysctl_wire_old_buffer(). Fix a bug in sysctl_handle_opaque() whereby the pointers would not get reset on a retried SYSCTL_OUT() call. Noticed by: bde	2003-10-05 13:31:33 +00:00
bms	44fa0fe4a2	Fix a security problem in sysctl() the long way round. Use pre-emption detection to avoid the need for wiring a userland buffer when copying opaque data structures. sysctl_wire_old_buffer() is now a no-op. Other consumers of this API should use pre-emption detection to notice update collisions. vslock() and vsunlock() should no longer be called by any code and should be retired in subsequent commits. Discussed with: pete, phk MFC after: 1 week	2003-10-05 09:37:47 +00:00
bms	0f917818e6	Add a pre-emption counter, td_generation, so that threads can notice when they have been pre-empted by other threads. This is bumped from within mi_switch() every time a context switch takes place. Discussed with: pete	2003-10-05 09:35:08 +00:00
bms	b463ed3eb2	Fold the vslock() and vsunlock() calls in this file with #if 0's; they will go away in due course. Involuntary pre-emption means that we can't count on wiring of pages alone for consistency when performing a SYSCTL_OUT() bigger than PAGE_SIZE. Discussed with: pete, phk	2003-10-05 08:38:22 +00:00
jeff	d22d3e1bfb	- Apply a big giant lock around the namecache. This has been sitting in my tree since BSDcon.	2003-10-05 07:13:50 +00:00
jeff	2c3fea92c8	- Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify LK_RETRY either, we don't want this vnode if it turns into another. - Remove the code that checks the mount point after acquiring the lock we are guaranteed to either fail or get the vnode that we wanted.	2003-10-05 07:12:38 +00:00
bms	c600018661	Remove magic numbers surrounding locking state in the sysctl module, and replace them with more meaningful defines.	2003-10-05 05:38:30 +00:00
jeff	6b8324e841	- Rename vcanrecycle() to vtryrecycle() to reflect its new role. - In vtryrecycle() try to vgonel the vnode if all of the previous checks passed. We won't vgonel if someone has either acquired a hold or usecount or started the vgone process elsewhere. This is because we may have been removed from the free list while we were inspecting the vnode for recycling. - The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling the same vnode. To further reduce the likelyhood of this event, requeue the vnode on the tail of the list prior to calling vtryrecycle(). We can not actually remove the vnode from the list until we know that it's going to be recycled because other interlock holders may see the VI_FREE flag and try to remove it from the free list. - Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it regardless of MNT_WAIT because the vnode does not actually belong to this filesystem.	2003-10-05 05:35:41 +00:00
jeff	e15704d590	- Don't cache_purge() in getnewvnode. It's done in vclean(). With this purge, the purge in vclean, and the filesystems purge, we had 3 purges per vnode. - Move the insmntque(vp, 0) to vclean() so that we may remove it from the two vgone() functions and reduce the number of lock operations required.	2003-10-05 02:48:04 +00:00
jeff	8d0f78003f	- Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine whether or not the sync failed. This could potentially get set between the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly lead to the sync being delayed by an extra 30 seconds. If we do not move the vnode it could cause an endless loop if it continues to fail to sync. - Use vhold and vdrop to stop the vnode from changing identities while we have it unlocked. Other internal vfs lists are likely to follow this scheme.	2003-10-05 00:35:41 +00:00
jeff	8d72c43763	- Move the xlock 'locking' code into vx_lock() and vx_unlock(). - Create a new function, vgonechrl(), which performs vgone for an in-use character device. Move the code from vflush() that did this into vgonechrl(). - Hold the xlock across the entirety of vgonel() and vgonechrl() so that at no point will an invalid vnode exist on any list without XLOCK set. - Move the xlock code out of vclean() now that it is in the vgone*() functions.	2003-10-05 00:02:41 +00:00
alc	74f19ddd0e	Eliminate some unnecessary uses of the vm page queues lock around the vm page's valid field. This field is being synchronized using the containing vm object's lock.	2003-10-04 22:47:20 +00:00
alc	7c11eaebac	- Extend the scope the vm object lock to cover calls to vm_page_is_valid(). - Assert that the lock on the containing vm object is held in vm_page_is_valid().	2003-10-04 19:23:29 +00:00
jeff	55547647ec	- In sched_sync() test our preconditions prior to dropping the sync_mtx. This is so that we may grab the interlock while still holding the sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock order runs the other way. - If we don't meet any of the preconditions, reinsert the vp into the list for the next second. - We don't need to panic if we fail to sync here because each FSYNC function handles this case. Removing this redundant code also simplifies locking.	2003-10-04 18:03:53 +00:00
jeff	1fc4867692	- Change a lame iterative algorithm to a constant time algorithm. Remove the XXX that complains about it as well. Submitted by: ThomasWuerfl@gmx.de	2003-10-04 17:41:13 +00:00
jeff	bb40013912	- In a Giantless world, the vn_lock() in vcanrecycle() could legitimately fail. Remove the panic from that case and document why it might fail. - Document the reason for calling cache_purge() on a newly created vnode. - In insmntque() order the operations so that we can call mtx_unlock() one fewer times. This makes the code somewhat clearer as well. - Add XXX comments in sched_sync() and vflush(). - In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is set. - In vclean() we don't need to acquire a lock around a single TAILQ_FIRST call. It's ok if we race here, the vinvalbuf will just do nothing. - Increase the scope of the lock in vgonel() to reduce the number of lock operations that are performed.	2003-10-04 15:10:40 +00:00
jeff	c1590e8666	- If we are called with LK_NOWAIT in vn_lock() we may be holding a mutex and should not sleep while waiting for XLOCK to clear. Care needs to be taken in functions that use this capability to avoid spinning.	2003-10-04 14:35:22 +00:00
nectar	1857c0891b	Introduce a uiomove_frombuf helper routine that handles computing and validating the offset within a given memory buffer before handing the real work off to uiomove(9). Use uiomove_frombuf in procfs to correct several issues with integer arithmetic that could result in underflows/overflows. As a side-effect, the code is significantly simplified. Add additional sanity checks when computing a memory allocation size in pfs_read. Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-) Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)	2003-10-02 15:00:55 +00:00
rwatson	1c522512fd	Remove the global variable 'cmask', which was used to initialize the fd_cmask field in the file descriptor structure for the first process indirectly from CMASK, and when an fd structure is initialized before being filled in, and instead just use CMASK. This appears to be an artifact left over from the initial integration of quotas into BSD. Suggested by: peter	2003-10-02 03:57:59 +00:00
jeff	db2419a0a1	- On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in the TLB and ~1600 if it is not. Therefore, it is more effecient to invalidate the TLB after operations that use CMAP rather than before. - So that the tlb is invalidated prior to switching off of a processor, we must change the switchin functions to switchout functions. - Remove td_switchout from the thread and move it to the x86 pcb. - Move the code that calls switchout into swtch.s. These changes make this optimization truely x86 specific.	2003-09-30 08:11:36 +00:00
rwatson	0e5948bb6b	If the struct mac copied into the kernel has a negative length, return EINVAL rather than failing the following malloc due to the value being too large.	2003-09-29 18:35:17 +00:00
phk	7e52b3ce80	Retire revoke_and_destroy_dev() with extreme prejudice.	2003-09-28 20:50:36 +00:00
marcel	186d3de8bf	Remove the regstkpages sysctl variable. We have a growable register stack now.	2003-09-27 23:07:47 +00:00
marcel	d75cf98307	Part 2 of implementing rstacks: add the ability to create rstacks and use the ability on ia64 to map the register stack. The orientation of the stack (i.e. its grow direction) is passed to vm_map_stack() in the overloaded cow argument. Since the grow direction is represented by bits, it is possible and allowed to create bi-directional stacks. This is not an advertised feature, more of a side-effect. Fix a bug in vm_map_growstack() that's specific to rstacks and which we could only find by having the ability to create rstacks: when the mapped stack ends at the faulting address, we have not actually mapped the faulting address. we need to include or cover the faulting address. Note that at this time mmap(2) has not been extended to allow the creation of rstacks by processes. If such a need arises, this can be done. Tested on: alpha, i386, ia64, sparc64	2003-09-27 22:28:14 +00:00
phk	d0dceae232	Make life a little bit easier for cloning device drivers.	2003-09-27 21:50:00 +00:00
phk	74c6dfd454	Introduce no_poll() default method for device drivers. Have it do exactly the same as vop_nopoll() for consistency and put a comment in the two pointing at each other. Retire seltrue() in favour of no_poll(). Create private default functions in kern_conf.c instead of public ones. Change default strategy to return the bio with ENODEV instead of doing nothing which would lead the bio stranded. Retire public nullopen() and nullclose() as well as the entire band of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump} funtions, they are the default actions now. Move the final two trivial functions from subr_xxx.c to kern_conf.c and retire the now empty subr_xxx.c	2003-09-27 12:53:33 +00:00
phk	d92d95ded7	Don't use seltrue when that is not really what we mean.	2003-09-27 12:44:06 +00:00
phk	7099deadda	The present defaults for the open and close for device drivers which provide no methods does not make any sense, and is not used by any driver. It is a pretty hard to come up with even a theoretical concept of a device driver which would always fail open and close with ENODEV. Change the defaults to be nullopen() and nullclose() which simply does nothing. Remove explicit initializations to these from the drivers which already used them.	2003-09-27 12:01:01 +00:00
phk	0c8bfb6d00	OK, I messed up /dev/console with what I had hoped would be compat code. Convert remaining console drivers and hope for the best.	2003-09-26 19:35:50 +00:00
robert	58f93096d9	Move some tracing related code into its own function as it will be needed for system call related ptrace functionality I plan to commit soon.	2003-09-26 15:09:46 +00:00
phk	45c1eba059	Update the list of CDROM device names to try for booting with RB_CDROM flag set.	2003-09-26 09:07:27 +00:00
phk	af0dfcd07f	Remove wrongly sized cnd_name field, we now store the name in the consdev structure. If the consdev name is not set and we have a cn_dev, set the name from there. Try to issue a printf about this, even though it may not have a place to go. Modify the sysctl related code to pick up the name from the consdev instead.	2003-09-26 07:26:54 +00:00
peter	8ecb3577d8	Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.	2003-09-25 01:10:26 +00:00
fjoe	9957f857c4	Avoid NULL pointer dereferencing in modlist_lookup2(). PR: 56570 Submitted by: Thomas Wintergerst <Thomas.Wintergerst@nord-com.net>	2003-09-23 14:42:38 +00:00
alc	9c61d65266	- vm_hold_free_pages() should lock the kernel object. (The pages being freed belong to the kernel object.) - Increase the granularity of the vm object locking in vm_hold_load_pages() in order to reduce the number of times that we acquire and release the same lock.	2003-09-22 04:58:09 +00:00
dfr	8cfa3234b8	The method link_preload_finish is not static.	2003-09-20 17:39:32 +00:00
jeff	83b269493e	- Somewhere along the line I stupidly removed critical logic from sched_ptcpu_update(). This caused erroneous cpu times in TOP for processes that were asleep. Replace the code that was removed.	2003-09-20 02:05:58 +00:00
jeff	517dcea6c8	- In reassignbuf() don't unlock vp and lock newvp if they are the same. Doing so creates a race where the buf is on neither list. - Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we should. - Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this really should not happen and is fast to check.	2003-09-20 00:21:48 +00:00
jeff	45f3b1b270	- Remove spls(). The locking that has replaced them is in place and they no longer serve as guidelines for future work.	2003-09-19 23:52:06 +00:00
kan	cf77f9f005	Eliminate one case of VI_UNLOCK followed by an immediate VI_LOCK.	2003-09-19 19:13:54 +00:00
tjr	3bce48c27b	Allow the KERN_PROC_PROC sysctl to be used without the useless 4th name component, for consistency with KERN_PROC_ALL. Support for the 4-argument form will be removed some time before 5.2-R.	2003-09-19 14:16:50 +00:00
jeff	52b3368d79	- Only use UMA to cache malloc requests up to PAGE_SIZE. Values larger than this are requested very infrequently and waste memory when we cache spares.	2003-09-19 04:39:08 +00:00
alc	beb3ca1e4c	Correct a typo in the previous revision.	2003-09-15 02:56:48 +00:00
rwatson	50888524ca	Add a new sysctl, security.bsd.conservative_signals, to disable special signal-delivery protections for setugid processes. In the event that a system is relying on "unusual" signal delivery to processes that change their credentials, this can be used to work around application problems. Also, add SIGALRM to the set of signals permitted to be delivered to setugid processes by unprivileged subjects. Reported by: Joe Greco <jgreco@ns.sol.net>	2003-09-14 07:22:38 +00:00
nectar	f158e368c2	sched_setscheduler: Return EINVAL when a invalid policy is specified, thus complying with POLA and the man page. (Previously, no error was returned for this case.)	2003-09-13 18:46:24 +00:00
nectar	54f60400ec	Correct mostly harmless off-by-one error in getdomainname(). Reviewed by: imp	2003-09-13 17:12:22 +00:00
alc	ee4ef644cf	Convert vmapbuf() from using pmap_extract() to using pmap_extract_and_hold(). Note, however, that GIANT_REQUIRED should not be removed until all platforms fully implement the "prot" parameter to pmap_extract_and_hold(). Reviewed by: tegge	2003-09-13 04:29:55 +00:00
alc	6808836f35	pipe_build_write_buffer() only requires read access of the page that it obtains from pmap_extract_and_hold().	2003-09-12 07:13:15 +00:00
marcel	5b71626790	Introduce BUS_CONFIG_INTR(). The method allows devices to tell parents about interrupt trigger mode and interrupt polarity. This allows ACPI for example to pass interrupt resource information up the hierarchy. The default implementation of the method therefore is to pass the request to the parent. Reviewed by: jhb, njl	2003-09-10 21:37:10 +00:00
simokawa	c286d7e22f	Fix asynchronous physio breakage introduced in rev 1.163. We cannnot use bp->b_caller2 because DEV_STRATEGY will overwrite it.	2003-09-10 15:48:51 +00:00
jhb	68ae42e041	Update the license on this file to be a bit more sane.	2003-09-10 01:09:32 +00:00
iedowse	2e1d99cc8a	In the !MNT_BYFSID case, return EINVAL from unmount(2) when the specified directory is not found in the mount list. Before the MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent path and EINVAL for a non-mountpoint, but we can no longer distinguish between these cases. Of the two error codes, EINVAL was more likely to occur in practice, and it was the only one of the two that was documented. Update the manual page to match the current behaviour. Suggested by: tjr Reviewed by: tjr	2003-09-08 16:23:21 +00:00
alc	81a5dc108d	Use pmap_extract_and_hold() in pipe_build_write_buffer(). Consequently, pipe_build_write_buffer() no longer requires Giant on entry. Reviewed by: tegge	2003-09-08 04:58:32 +00:00
tjr	29332e48cc	Return EINVAL if the contested bit is not set on the umtx passed to _umtx_unlock() instead of firing a KASSERT.	2003-09-07 11:14:52 +00:00
alc	390b07844e	msync(2) should be declared MP-safe.	2003-09-07 05:42:07 +00:00
sam	23e7708b76	add fast swi taskqueue spinlock to the order_list so witness doesn't complain Submitted by: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>	2003-09-06 21:06:08 +00:00
sam	403b1d4a6e	correct fast swi taskqueue spinlock name to be different from the sleep lock Submitted by: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>	2003-09-06 21:05:18 +00:00
alc	0c9dc7eaa4	Giant is no longer required by pipe_destroy_write_buffer(). Reduce unnecessary white space from pipe_destroy_write_buffer().	2003-09-06 21:02:10 +00:00
sam	546fd338df	"fast swi" taskqueue support. This is a taskqueue that uses spinlocks making it useful for dispatching swi tasks from fast interrupt handlers. Sponsered by: FreeBSD Foundation	2003-09-05 23:09:22 +00:00
sam	27b68e0947	Print a message at boot for interrupt handlers created with INTR_MPSAFE and/or INTR_FAST. This belongs elsehwere and perhaps under bootverbose; I'm committing it for now as it's uesful to know which drivers have been converted and which have not.	2003-09-05 22:51:18 +00:00
peter	f79f1784c9	Log involuntary context switches correctly.	2003-09-05 22:15:26 +00:00
phk	d999957bf0	Put the message about msgbuf cksum mismatch under bootverbose and tell people what the consequence is.	2003-09-05 11:12:00 +00:00
phk	158d08d6fb	Use the quality to disable timecounters for which we deem Hz too low.	2003-09-03 08:14:16 +00:00
ken	03d0445c16	Move dynamic sysctl(8) variable creation for the cd(4) and da(4) drivers out of cdregister() and daregister(), which are run from interrupt context. The sysctl code does blocking mallocs (M_WAITOK), which causes problems if malloc(9) actually needs to sleep. The eventual fix for this issue will involve moving the CAM probe process inside a kernel thread. For now, though, I have fixed the issue by moving dynamic sysctl variable creation for these two drivers to a task queue running in a kernel thread. The existing task queues (taskqueue_swi and taskqueue_swi_giant) run in software interrupt handlers, which wouldn't fix the problem at hand. So I have created a new task queue, taskqueue_thread, that runs inside a kernel thread. (It also runs outside of Giant -- clients must explicitly acquire and release Giant in their taskqueue functions.) scsi_cd.c: Remove sysctl variable creation code from cdregister(), and move it to a new function, cdsysctlinit(). Queue cdsysctlinit() to the taskqueue_thread taskqueue once we have fully registered the cd(4) driver instance. scsi_da.c: Remove sysctl variable creation code from daregister(), and move it to move it to a new function, dasysctlinit(). Queue dasysctlinit() to the taskqueue_thread taskqueue once we have fully registered the da(4) instance. taskqueue.h: Declare the new taskqueue_thread taskqueue, update some comments. subr_taskqueue.c: Create the new kernel thread taskqueue. This taskqueue runs outside of Giant, so any functions queued to it would need to explicitly acquire/release Giant if they need it. cd.4: Update the cd(4) man page to talk about the minimum command size sysctl/loader tunable. Also note that the changer variables are available as loader tunables as well. da.4: Update the da(4) man page to cover the retry_count, default_timeout and minimum_cmd_size sysctl variables/loader tunables. Remove references to /dev/r???, they aren't used any longer. cd.9: Update the cd(9) man page to describe the CD_Q_10_BYTE_ONLY quirk. taskqueue.9: Update the taskqueue(9) man page to describe the new thread task queue, and the taskqueue_swi_giant queue. MFC after: 3 days	2003-09-03 04:46:28 +00:00
sam	8c368dfa99	move domain list mutex initialization to earlier in the boot sequence so statically configured modules like netgraph can call net_init_domain Noticed by: D.Rock@t-online.de (D. Rock)	2003-09-02 20:59:23 +00:00
silby	75c663cdc7	Implement MBUF_STRESS_TEST mark II. Changes from the original implementation: - Fragmentation is handled by the function m_fragment, which can be called from whereever fragmentation is needed. Note that this function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing use. - m_fragment works slightly differently from the old fragmentation code in that it allocates a seperate mbuf cluster for each fragment. This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent fragments. While that is a nice feature in practice, it nerfed the usefulness of mbuf_stress_test. - Add two modes of random fragmentation. Chains with fragments all of the same random length and chains with fragments that are each uniquely random in length may now be requested.	2003-09-01 05:55:37 +00:00
sam	f13a722652	o interlock domain list when adding domains o remove irrlevant spl Notes: 1. We don't lock domain list traversals as this is safe until we start removing domains. 2. The calculation of max_datalen in net_init_domain appears safe as noone depends on max_hdr and max_datalen having consistent values. 3. Giant is still held for fast and slow timeouts; this must stay until each timeout routine is properly locked (coming soon). Sponsored by: FreeBSD Fondation	2003-09-01 05:01:55 +00:00
jeff	86f70ead21	- Define a new flag for getblk(): GB_NOCREAT. This flag causes getblk() to bail out if the buffer is not already present. - The buffer returned by incore() is not locked and should not be sent to brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the desired semantics.	2003-08-31 08:50:11 +00:00
jeff	5e7832253c	- If there is no vp assume that BKGRDINPROG is not set and set RELPBUF in brelse().	2003-08-31 01:07:45 +00:00
jeff	0008f2bb1d	- In some cases bp->b_vp can be NULL in brelse, don't try to lock the interlock in that case. Found by: alc	2003-08-31 00:06:07 +00:00
alc	8b0114def1	Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy sockets into machine-dependent files. The rationale for this migration is illustrated by the modified amd64 allocator. It uses the amd64's direct map to avoid emphemeral mappings in the kernel's address space. On an SMP, the emphemeral mappings result in an IPI for TLB shootdown for each transmitted page. Yuck. Maintainers of other 64-bit platforms with direct maps should be able to use the amd64 allocator as a reference implementation.	2003-08-29 20:04:10 +00:00
marcel	b121bea1f9	In bufdone(), change the format specifier for m->valid and m->dirty to a long type and explicitly cast m->valid and m->dirty to unsigned long. When PAGE_SIZE is 32K, these fields are in fact unsigned long.	2003-08-28 19:58:11 +00:00
kan	96bae694ed	Do not return with vnode interlock held. Reviewed by: rwatson	2003-08-28 15:48:15 +00:00
jeff	fc1a2c4016	- Move BX_BKGRDWAIT and BX_BKGRDINPROG to BV_ and the b_vflags field. - Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode interlock. - Don't use the B_LOCKED flag and QUEUE_LOCKED for background write buffers. Check for the BKGRDINPROG flag before recycling or throwing away a buffer. We do this instead because it is not safe for us to move the original buffer to a new queue from the callback on the background write buffer. - Remove the B_LOCKED flag and the locked buffer queue. They are no longer used. - The vnode interlock is used around checks for BKGRDINPROG where it may not be strictly necessary. If we hold the buf lock the a back-ground write will not be started without our knowledge, one may only be completed while we're not looking. Rather than remove the code, Document two of the places where this extra locking is done. A pass should be done to verify and minimize the locking later.	2003-08-28 06:55:18 +00:00
rwatson	c020f70195	Fix a mac_policy_list reference to be a mac_static_policy_list reference: this fixes mac_syscall() for static policies when using optimized locking. Obtained from: TrustedBSD Project Sponosred by: DARPA, Network Associates Laboratories	2003-08-26 17:29:02 +00:00
davidxu	05b6d7c95a	Let SA process work under ULE scheduler, originally it would panic kernel. Reviewed by: jeff	2003-08-26 11:33:15 +00:00
alc	5d6f66de90	Hold the page queues lock when performing vm_page_clear_dirty() and vm_page_set_invalid().	2003-08-23 18:11:53 +00:00
tjr	8958714bb8	Fix a logic error in osethostid() that was introduced in rev. 1.34: allow hostid to be set when suser() returns 0, not when it returns an error. This would have allowed non-root users to set the host ID.	2003-08-23 15:45:57 +00:00
marcel	0663329f6f	On ia64 time_t is 64 bit. Explicitly cast tv_sec to long and change the corresponding format specifier to %ld in a call to printf() in function softclock(). The printf() is conditional upon DIAGNOSTIC. Found by: LINT	2003-08-23 08:31:32 +00:00
rwatson	32ed1a62a8	Introduce two new MAC Framework and MAC policy entry points: mac_reflect_mbuf_icmp() mac_reflect_mbuf_tcp() These entry points permit MAC policies to do "update in place" changes to the labels on ICMP and TCP mbuf headers when an ICMP or TCP response is generated to a packet outside of the context of an existing socket. For example, in respond to a ping or a RST packet to a SYN on a closed port. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-21 18:21:22 +00:00
eivind	ab2c97a462	Change description of kern.osreldate from "Operating system release date" to "Kernel release date" - userland version is in /usr/include/osreldate.h	2003-08-21 14:47:08 +00:00
rwatson	6f522a9e52	Add mac_check_vnode_deleteextattr() and mac_check_vnode_listextattr(): explicit access control checks to delete and list extended attributes on a vnode, rather than implicitly combining with the setextattr and getextattr checks. This reflects EA API changes in the kernel made recently, including the move to explicit VOP's for both of these operations. Obtained from: TrustedBSD PRoject Sponsored by: DARPA, Network Associates Laboratories	2003-08-21 13:53:01 +00:00
rwatson	85df7c20ad	Remove about 40 lines of #ifdef/#endif by using new macros MAC_DEBUG_COUNTER_INC() and MAC_DEBUG_COUNTER_DEC() to maintain debugging counter values rather than #ifdef'ing the atomic operations to MAC_DEBUG. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-20 19:16:49 +00:00
imp	ee0d294c7e	bde made a number of suggested improvements to the code. This commit represents the pruely stylistic changes and should have no net impact on the rest of the code. bde's more substantive changes will follow in a separate commit once we've come to closure on them. Submitted by: bde	2003-08-20 19:12:46 +00:00
imp	ef7e40c451	Fix an extreme edge case in leap second handling. We need to call ntp_update_second twice when we have a large step in case that step goes across a scheduled leap second. The only way this could happen would be if we didn't call tc_windup over the end of day on the day of a leap second, which would only happen if timeouts were delayed for seconds. While it is an edge case, it is an important one to get right for my employer. Sponsored by: Timing Solutions Corporation	2003-08-20 05:34:27 +00:00
sam	59ff2ad5c7	Change instances of callout_init that specify MPSAFE behaviour to use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.	2003-08-19 17:51:11 +00:00
phk	ba5950210c	It is not an error to have no devices in the kernel: Return the generation number and start it from one instead of zero.	2003-08-17 12:06:19 +00:00
bmilekic	e3861386da	Use constants less throughout the code and instead use the objsize variable. This makes changing the size of an mbuf or cluster for testing/debugging/whatever purposes easier. Submitted by: sam	2003-08-16 19:48:52 +00:00
marcel	c1d4b42a69	Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to cpu.h. This affects db_command.c and kern_shutdown.c. ia64: move all MD prototypes from cpu.h to md_var.h. This affects madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory(). It's not used (vm_machdep.c). alpha: the MD prototypes have been left in cpu.h with a comment that they should be there. Moving them is left for later. It was expected that the impact would be significant enough to be done in a seperate commit. powerpc: MD prototypes left in cpu.h. Comment added. Suggested by: bde Tested with: make universe (pc98 incomplete)	2003-08-16 16:57:57 +00:00
phk	34014d5261	Give timecounters a numeric quality field. A timecounter will be selected when registered if its quality is not negative and no less than the current timecounters. Add a sysctl to report all available timecounters and their qualities. Give the dummy timecounter a solid negative quality of minus a million. Give the i8254 zero and the ACPI 1000. The TSC gets 800, unless APM or SMP forces it negative. Other timecounters default to zero quality and thereby retain current selection behaviour.	2003-08-16 08:23:53 +00:00
jhb	837193af8e	- Various style fixes in both code and comments. - Update some stale comments. - Sort a couple of includes. - Only set 'newcpu' in updatepri() if we use it. - No functional changes. Obtained from: bde (via an old diff I got a long time ago)	2003-08-15 21:29:06 +00:00
marcel	77c3cd3d30	Add or finish support for machine dependent ptrace requests. When we check for permissions, do it for all requests, not the known requests. Later when we actually service the request we deal with the invalid requests we previously caught earlier. This commit changes the behaviour of the ptrace(2) interface for boundary cases such as an unknown request without proper permissions. Previously we would return EINVAL. Now we return EBUSY or EPERM. Platforms need to define __HAVE_PTRACE_MACHDEP when they have MD requests. This makes the prototype of cpu_ptrace() visible and introduces a call to this function for all requests greater or equal to PT_FIRSTMACH. Silence on: audit	2003-08-15 05:25:06 +00:00
jmg	64bcd88750	if we got this far, we definately don't have an EBADF. Return a more sane result of EPIPE. Reported by: nCircle dev team MFC after: 3 day	2003-08-15 04:31:01 +00:00
cg	d647a00dc3	add a read-only sysctl to display the number of entries in the fixed size kobj global method table; also kassert that the table has not overflowed when defining a new method. there are indications that the table is being overflowed in certain situations as we gain more kobj consumers- this will allow us to check whether kobj is at fault. symptoms would be incorrect methods being called.	2003-08-14 21:16:46 +00:00
grehan	e6a3a6744e	Update powerpc to use the (old thread,new thread) calling convention for cpu_throw() and cpu_switch().	2003-08-14 03:56:24 +00:00
alc	7a81ace60d	- The vm_object pointer in pipe_buffer is unused. Remove it. - Check for successful initialization of pipe_zone in pipeinit() rather than every call to pipe(2).	2003-08-13 20:01:38 +00:00
imp	3bc162cfa3	Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's copyrighted files. Approved by: Matt Dillon	2003-08-12 23:24:05 +00:00
mux	43629d3ba9	Remove extra space.	2003-08-12 20:34:31 +00:00
jhb	1c016824f1	- Convert Alpha over to the new calling conventions for cpu_throw() and cpu_switch() where both the old and new threads are passed in as arguments. Only powerpc uses the old conventions now. - Update comments in the Alpha swtch.s to reflect KSE changes. Tested by: obrien, marcel	2003-08-12 19:33:36 +00:00
alc	23ea8b5c7a	Pipespace() no longer requires Giant.	2003-08-11 22:23:25 +00:00
kan	91297961f6	Drop Giant in recvit before returning an error to the caller to avoid leaking the Giant on the syscall exit.	2003-08-11 19:37:11 +00:00
bms	44aa51e3ae	Add the mlockall() and munlockall() system calls. - All those diffs to syscalls.master for each architecture are necessary. This needed clarification; the stub code generation for mlockall() was disabled, which would prevent applications from linking to this API (suggested by mux) - Giant has been quoshed. It is no longer held by the code, as the required locking has been pushed down within vm_map.c. - Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES to express their intention explicitly. - Inspected at the vmstat, top and vm pager sysctl stats level. Paging-in activity is occurring correctly, using a test harness. - The RES size for a process may appear to be greater than its SIZE. This is believed to be due to mappings of the same shared library page being wired twice. Further exploration is needed. - Believed to back out of allocations and locks correctly (tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC). PR: kern/43426, standards/54223 Reviewed by: jake, alc Approved by: jake (mentor) MFC after: 2 weeks	2003-08-11 07:14:08 +00:00
silby	bd71f7b671	More pipe changes: From alc: Move pageable pipe memory to a seperate kernel submap to avoid awkward vm map interlocking issues. (Bad explanation provided by me.) From me: Rework pipespace accounting code to handle this new layout, and adjust our default values to account for the fact that we now have a solid limit on allocations. Also, remove the "maxpipes" limit, as it no longer has a purpose. (The limit on kva usage solves the problem of having two many pipes.)	2003-08-11 05:51:51 +00:00
alc	1625d6386b	Use vm_page_hold() instead of vm_page_wire(). Otherwise, a multithreaded application could cause a wired page to be freed. In general, vm_page_hold() should be preferred for ephemeral kernel mappings of pages borrowed from a user-level address space. (vm_page_wire() should really be reserved for indefinite duration pinning by the "owner" of the page.) Discussed with: silby Submitted by: tegge	2003-08-11 00:17:44 +00:00
nectar	78ff87db8b	panic() if we try to handle an out-of-range signal number in psignal()/tdsignal(). The test was historically in psignal(). It was changed into a KASSERT, and then later moved to tdsignal() when the latter was introduced. Reviewed by: iedowse, jhb	2003-08-10 23:05:37 +00:00
nectar	f5b9f87e77	Add or correct range checking of signal numbers in system calls and ioctls. In the particular case of ptrace(), this commit more-or-less reverts revision 1.53 of sys_process.c, which appears to have been erroneous. Reviewed by: iedowse, jhb	2003-08-10 23:04:55 +00:00
alc	c37c941215	Background: When proc_rwmem() wired and mapped a page, it also added a reference to the containing object. The purpose of the reference being to prevent the destruction of the object and an attempt to free the wired page. (Wired pages can't be freed.) Unfortunately, this approach does not work. Some operations, like fork(2) that call vm_object_split(), can move the wired page to a difference object, thereby making the reference pointless and opening the possibility of the wired page being freed. A solution is to use vm_page_hold() in place of vm_page_wire(). Held pages can be freed. They are moved to a special hold queue until the hold is released. Submitted by: tegge	2003-08-09 18:01:19 +00:00
alc	f5d5533b42	- Remove GIANT_REQUIRED from pipespace(). - Remove a duplicate initialization from pipe_create().	2003-08-08 22:38:15 +00:00
deischen	547619d0d3	Copyin the thread mailbox flags from the correct location in the mailbox.	2003-08-08 20:23:10 +00:00
jhb	af302d132f	td_dupfd just needs to be less than 0, it does not have to hold the negative value of the index of the new file, so just use -1.	2003-08-07 17:08:26 +00:00
nectar	df9de6c5cd	Update some argument-documenting comments to match reality. Add an explicit range check to those same arguments to reduce risk of cardiac arrest in future code readers.	2003-08-07 16:42:27 +00:00
jhb	37641f86f1	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
jhb	12f44bde5d	The ktrace mutex does not need to be locked around the post of the ktrace semaphore and doing so can lead to a possible reversal. WITNESS would have caught this if semaphores were used more often in the kernel. Submitted by: Ted Unangst <tedu@stanford.edu>, Dawson Engler	2003-08-07 13:58:13 +00:00
alc	6178e0ad16	- Remove GIANT_REQUIRED from pipe_free_kmem(). - Remove the acquisition and release of Giant around pipe_kmem_free() and uma_zfree() in pipeclose().	2003-08-07 04:32:40 +00:00
yar	65e4901760	If connect(2) has been interrupted by a signal and therefore the connection is to be established asynchronously, behave as in the case of non-blocking mode: - keep the SS_ISCONNECTING bit set thus indicating that the connection establishment is in progress, which is the case (clearing the bit in this case was just a bug); - return EALREADY, instead of the confusing and unreasonable EADDRINUSE, upon further connect(2) attempts on this socket until the connection is established (this also brings our connect(2) into accord with IEEE Std 1003.1.)	2003-08-06 14:04:47 +00:00
davidxu	69df6d1c3b	kse.h is not needed for these files.	2003-08-05 12:08:49 +00:00
davidxu	93e075cf7a	Introduce a thread mailbox flag TMF_NOUPCALL. On some architectures other than i386 or AMD64, TP register points to thread mailbox, and they can not atomically clear km_curthread in kse mailbox, in this case, thread retrieves its thread pointer from TP register and sets flag TMF_NOUPCALL in its thread mailbox to indicate a critical region.	2003-08-05 12:00:55 +00:00
hsu	fb82c18f66	Make the second argument to sooptcopyout() constant in order to simplify the upcoming PIM patches. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-05 00:27:54 +00:00
iedowse	7bf5fa9caf	In the mknod(), mkfifo(), link(), symlink() and undelete() syscalls, use vrele() instead of vput() on the parent directory vnode returned by namei() in the case where it is equal to the target vnode. This handles namei()'s somewhat strange (but documented) behaviour of not locking either vnode when the two vnodes are equal and LOCKPARENT but not LOCKLEAF is specified. Note that since a vnode double-unlock is not currently fatal, these coding errors were effectively harmless. Spotted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de> Reviewed by: mckusick	2003-08-05 00:26:51 +00:00
dwmalone	cb188056e6	Do some minor Giant pushdown made possible by copyin, fget, fdrop, malloc and mbuf allocation all not requiring Giant. 1) ostat, fstat and nfstat don't need Giant until they call fo_stat. 2) accept can copyin the address length without grabbing Giant. 3) sendit doesn't need Giant, so don't bother grabbing it until kern_sendit. 4) move Giant grabbing from each indivitual recv* syscall to recvit.	2003-08-04 21:28:57 +00:00
jhb	e71dfc3b00	Adjust a comment to remove staleness and take slightly less implementation specific perspective.	2003-08-04 20:35:13 +00:00
jhb	52adb98aef	Set td_critnest to 1 when setting up a thread since it is a MI field with MI values. This ensures that td_critnest for a newly fork'd thread is always valid. Requested by: bde (a long time ago)	2003-08-04 20:28:20 +00:00
jhb	a69166c61f	Insert cosmetic spaces. Reported by: kris	2003-08-04 19:24:25 +00:00
rwatson	543a037619	Move more ACL logic from the UFS code (ufs_acl.c) to the central POSIX.1e support routines in kern_acl.c: - Define ACL_OVERRIDE_MASK and ACL_PRESERVE_MASK centrally in acl.h: the mode bits that are (and aren't) stored in the ACL. - Add acl_posix1e_acl_to_mode(): given a POSIX.1e extended ACL, generate a compatibility mode (only the bits supported by the POSIX.1e ACL). - acl_posix1e_newfilemode(): Given a requested creation mode and default ACL, calculate the mode for the new file system object (only the bits supported by the POSIX.1e ACL). PR: 50148 Reported by: Ritz, Bruno <bruno_ritz@gmx.ch> Obtained from: TrustedBSD Project	2003-08-04 02:13:05 +00:00
jhb	f0ef0df712	Both 'c' an 'lines' are unused, the bogus init of lines was accidentally left behind.	2003-08-02 17:35:00 +00:00
alc	15ec2b9212	Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in proc_rwmem(). See revision 1.140 of kern/sys_pipe.c for a detailed rationale. Submitted by: tegge	2003-08-02 17:08:21 +00:00
phk	adb4818b64	Grab Giant in bufdonebio() since drivers may not hold it. This only protects the "struct buf" consumers (ie: DEV_STRATEGY()), but does not protect BIO_STRATEGY() users.	2003-08-02 09:45:10 +00:00
phk	e1e146913d	Grab Giant in physio() since non-giant drivers are starting to appear.	2003-08-02 09:40:53 +00:00
alc	507ad47156	Eliminate an abuse of kmem_alloc_pageable() in bufinit() by using VM_ALLOC_NOOBJ to allocate the bogus page. Reviewed by: tegge	2003-08-02 05:05:34 +00:00
alc	4d05c167d2	Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in sf_buf_init(). (See revision 1.140 of kern/sys_pipe.c for a detailed rationale.) Submitted by: tegge	2003-08-02 04:18:56 +00:00
obrien	1c53f0726f	Fix kernel build -- 'c' was the unused var, not 'lines'.	2003-08-01 17:00:49 +00:00
rwatson	23fd91f044	Attempt to simplify #ifdef logic for MAC_ALWAYS_LABEL_MBUF. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-01 15:45:14 +00:00
alc	7199d3e24f	Remove Giant from writev(2). Eliminate trivial style differences between writev(2) and readv(2).	2003-08-01 02:21:54 +00:00

... 2 3 4 5 6 ...

6875 Commits