freebsd-nq

Author	SHA1	Message	Date
Edward Tomasz Napierala	afcc55f318	All the racct_*() calls need to happen with the proc locked. Fixing this won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".	2011-07-06 20:06:44 +00:00
Marius Strobl	2e569926f8	Call pmap_qremove() before freeing or unwiring the pages, otherwise there's a window during which a page can be re-used before its previous mapping is removed. Reviewed by: alc MFC after: 1 week	2011-07-05 18:40:37 +00:00
Jonathan Anderson	9acdfe6549	Rework _fget to accept capability parameters. This new version of _fget() requires new parameters: - cap_rights_t needrights the rights that we expect the capability's rights mask to include (e.g. CAP_READ if we are going to read from the file) - cap_rights_t haverights used to return the capability's rights mask (ignored if NULL) - u_char maxprotp the maximum mmap() rights (e.g. VM_PROT_READ) that can be permitted (only used if we are going to mmap the file; ignored if NULL) - int fget_flags FGET_GETCAP if we want to return the capability itself, rather than the underlying object which it wraps Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc	2011-07-05 13:45:10 +00:00
Jonathan Anderson	af098ed8e7	Add kernel functions to unwrap capabilities. cap_funwrap() and cap_funwrap_mmap() unwrap capabilities, exposing the underlying object. Attempting to unwrap a capability with an inadequate rights mask (e.g. calling cap_funwrap(fp, CAP_WRITE \| CAP_MMAP, &result) on a capability whose rights mask is CAP_READ \| CAP_MMAP) will result in ENOTCAPABLE. Unwrapping a non-capability is effectively a no-op. These functions will be used by Capsicum-aware versions of _fget(), etc. Approved by: mentor (rwatson), re (Capsicum blanket) Sponsored by: Google Inc	2011-07-04 14:40:32 +00:00
Attilio Rao	470107b2f1	MFC	2011-07-04 11:13:00 +00:00
Attilio Rao	a2f4e284b0	Completely remove now unused pc_other_cpus, pc_cpumask. Tested by: pluknet	2011-07-04 10:45:54 +00:00
Bjoern A. Zeeb	35fd7bc020	Add infrastructure to allow all frames/packets received on an interface to be assigned to a non-default FIB instance. You may need to recompile world or ports due to the change of struct ifnet. Submitted by: cjsp Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) (original versions) Reviewed by: julian Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru) MFC after: 2 weeks X-MFC: use spare in struct ifnet	2011-07-03 12:22:02 +00:00
Ed Schouten	18f5477167	Reintroduce the cioctl() hook in the TTY layer for digi(4). The cioctl() hook can be used by drivers to add ioctls to the .init and .lock devices. This commit breaks the ttydevsw ABI, since this structure didn't provide any padding. To prevent ABI breakage in the future, add a tsw_spare. Submitted by: Peter Jeremy <peter jeremy alcatel lucent com> Obtained from: kern/152254 (slightly modified)	2011-07-02 13:54:20 +00:00
Jonathan Anderson	c0467b5e6e	When Capsicum starts creating capabilities to wrap existing file descriptors, we will want to allocate a new descriptor without installing it in the FD array. Split falloc() into falloc_noinstall() and finstall(), and rewrite falloc() to call them with appropriate atomicity. Approved by: mentor (rwatson), re (bz)	2011-06-30 15:22:49 +00:00
Jonathan Anderson	12bc222e57	Add some checks to ensure that Capsicum is behaving correctly, and add some more explicit comments about what's going on and what future maintainers need to do when e.g. adding a new operation to a sys_machdep.c. Approved by: mentor(rwatson), re(bz)	2011-06-30 10:56:02 +00:00
Attilio Rao	7b744f6b01	MFC	2011-06-30 10:19:43 +00:00
Alan Cox	6bbee8e28a	Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib	2011-06-29 16:40:41 +00:00
Jonathan Anderson	24c1c3bf71	We may split today's CAPABILITIES into CAPABILITY_MODE (which has to do with global namespaces) and CAPABILITIES (which has to do with constraining file descriptors). Just in case, and because it's a better name anyway, let's move CAPABILITIES out of the way. Also, change opt_capabilities.h to opt_capsicum.h; for now, this will only hold CAPABILITY_MODE, but it will probably also hold the new CAPABILITIES (implying constrained file descriptors) in the future. Approved by: rwatson Sponsored by: Google UK Ltd	2011-06-29 13:03:05 +00:00
Attilio Rao	40a034576b	MFC	2011-06-28 14:40:17 +00:00
Attilio Rao	c0757daf1f	Fix a mismerge.	2011-06-27 13:02:23 +00:00
Ed Schouten	7c9669276e	Fix whitespace inconsistencies in the TTY layer and its drivers owned by me.	2011-06-26 18:26:20 +00:00
Attilio Rao	cfdfd32d34	MFC	2011-06-26 17:30:46 +00:00
Jonathan Anderson	54350dfa33	Remove redundant Capsicum sysctl. Since we're now declaring FEATURE(security_capabilities), there's no need for an explicit SYSCTL_NODE. Approved by: rwatson	2011-06-25 12:37:06 +00:00
Andriy Gapon	31c5a6e2b8	unconditionally stop other cpus when entering kdb in smp system ... and thus retire debug.kdb.stop_cpus tunable/sysctl. The knob was to work around CPU stopping issues, which since have been either fixed or greatly reduced. kdb should really operate in a special environment with scheduler stopped and interrupts disabled to provide deterministic debugging. Discussed with: attilio, rwatson X-MFC after: 2 months or never	2011-06-25 10:28:16 +00:00
Andriy Gapon	1aac6ac94a	generic_stop_cpus: pull timeout logic from under DIAGNOSTIC ... and also increase the timeout. It's better to try to proceed somehow despite stuck CPUs than to hang indefinitely. Especially so during shutdown and when entering kdb or panic. Timeout value is still an aribitrary value. Timeout diagnostic is just a printf; the work on something more debuggable is planned by attilio. Need to be careful here as stop_cpus_hard is called very early while enetering kdb and soon(-ish) it may become called very early when entering panic. Reviewed by: attilio MFC after: 2 months	2011-06-25 10:01:43 +00:00
Attilio Rao	de138ec703	MFC	2011-06-24 16:35:40 +00:00
Jonathan Anderson	5e26234ff1	Tidy up a capabilities-related comment. This comment refers to an #ifdef that hasn't been merged [yet?]; remove it. Approved by: rwatson	2011-06-24 14:40:22 +00:00
Attilio Rao	9b571ec6b3	MFC	2011-06-22 19:42:32 +00:00
Jung-uk Kim	a49399a903	Set negative quality to TSC timecounter when C3 state is enabled for Intel processors unless the invariant TSC bit of CPUID is set. Intel processors may stop incrementing TSC when DPSLP# pin is asserted, according to Intel processor manuals, i. e., TSC timecounter is useless if the processor can enter deep sleep state (C3/C4). This problem was accidentally uncovered by r222869, which increased timecounter quality of P-state invariant TSC, e.g., for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c). Reported by: Fabian Keil (freebsd-listen at fabiankeil dot de) Ian FREISLICH (ianf at clue dot co dot za) Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de) - Core2 Duo T5870 (C3 state available/enabled) jkim - Xeon X5150 (C3 state unavailable)	2011-06-22 16:40:45 +00:00
Attilio Rao	5519971c21	MFC	2011-06-19 14:22:35 +00:00
David E. O'Brien	52e95a64ea	Add comment from CSRG rev 7.27 (1992/06/23 19:56:55; author: mckusick)	2011-06-17 21:44:13 +00:00
Konstantin Belousov	508462ed1b	Do not trash the argv[0] pointer for an a.out process on amd64. Found with the binary provided by joerg.	2011-06-16 22:00:59 +00:00
Konstantin Belousov	5a888c066f	Fix silly typo that resulted in the a.out process stack to end at ~200MB instead of 3GB on amd64.	2011-06-16 21:59:16 +00:00
Marcel Moolenaar	8a71031712	Even if the loaded module has no symbols, we still need to notify MD code about it and update the link map for GDB's use.	2011-06-16 17:41:21 +00:00
Attilio Rao	49f5aeaf41	MFC	2011-06-15 07:20:22 +00:00
Justin T. Gibbs	583bef3863	sys/kern/subr_kdb.c: Modify the "alternate break sequence" detecting state machine so that only a contiguous invocation of the break sequence is accepted. The old implementation did not reset the state machine when detecting an unexpected character. While here, use an enum for the states of the machine instead of magic numbers.bmitted by: Sponsored by: Spectra Logic Corporation	2011-06-14 21:37:25 +00:00
David E. O'Brien	f528c3fdbc	We should not return ECHILD when debugging a child and the parent does a "wait4(-1, ..., WNOHANG, ...)". Instead wait(2) should behave as if the child does not wish to report status at this time. Reviewed by: jhb	2011-06-14 17:09:30 +00:00
Justin T. Gibbs	aa76615dd1	sys/sys/conf.h: sys/kern/kern_conf.c: Add make_dev_physpath_alias(). This interface takes the parent cdev of the alias, an old alias cdev (if any) to replace with the newly created alias, and the physical path string. The alias is visiable as a symlink to the parent, with the same name as the parent, rooted at physpath in devfs. Note: make_dev_physpath_alias() has hard coded knowledge of the Solaris style prefix convention for physical path data, "id1,". In the future, I expect the convention to change to allow "physical path quality" to be reported in the prefix. For example, a physical path based on NewBus topology would be of "lower quality" than a physical path reported by a device enclosure. Sponsored by: Spectra Logic Corporation	2011-06-14 16:29:43 +00:00
Kenneth D. Merry	094efe753d	Instead of using an atomic operation to determine whether the devstat(9) device node has been created, pass MAKEDEV_CHECKNAME in so that the devfs code will do the check. Use a regular static variable as before, that's good enough to keep us from calling into devfs most of the time. Suggested by: kib MFC after: 1 week Sponsored by: Spectra Logic Corporation	2011-06-13 22:08:24 +00:00
Justin T. Gibbs	27c959cf05	Fix a couple of race conditions in devstat(9) initialization. In devstat_new_entry(), there is no need to initialize the queue and the mutex in this function. There are ways to do static initialization on both, so use STAILQ_HEAD_INITIALIZER and MTX_SYSINIT to initialize the queue and the mutex. In devstat_alloc(), use an atomic test and set routine to guard making our entry in /dev. Using just a plain static variable creates a race condition on multiprocessor machines. If you attempt to create a second entry in devfs, the kernel will panic. Submitted by: kdm Reviewed by: gibbs Sponsored by: Spectra Logic Corporation MFC after: 1 week.	2011-06-13 21:21:02 +00:00
Attilio Rao	a38f1f263b	Remove pc_cpumask and pc_other_cpus usage from MI code. Tested by: pluknet	2011-06-13 13:28:31 +00:00
Jeff Roberson	6f59b2bd33	- When printing bufs with show buf the lblkno is often more useful than the blkno. Print them both.	2011-06-10 22:15:36 +00:00
Attilio Rao	e3adb68519	In the current code, a double panic condition may lead to dumps interleaving. Signal dumping to happen only for the first panic which should be the most important. Sponsored by: Sandvine Incorporated Submitted by: Nima Misaghian (nmisaghian AT sandvine DOT com) MFC after: 2 weeks	2011-06-08 19:28:59 +00:00
John Baldwin	c721b93449	Log the socket address passed as the destination to sendto() and sendmsg() via ktrace. MFC after: 1 week	2011-06-07 17:40:33 +00:00
Attilio Rao	5e9857e76b	MFC	2011-06-07 08:24:29 +00:00
Kenneth D. Merry	5e319c480c	Set pca.p_bufr to NULL when we haven't allocated a buffer. Otherwise, p_bufr is set to garbage on the stack, and if that garbage happens to be non-NULL, and the TOLOG or TOCONS flag is set, putbuf() will get called and attempt to fill the non-existent buffer. This is really only relevant for tprintf() (and only when the priority is not -1), but set it in uprintf() and ttyprintf() for completeness. The next step, to avoid log buffer scrambling, would be to add the PRINTF_BUFR_SIZE code to tprintf(), but this should prevent panics. Submitted by: rmacklem Found by: pho	2011-06-07 05:04:37 +00:00
David Xu	a231144921	Use p4prio_to_tsprio to calculate TS priority instead of using p4prio_to_rtpprio which is for RT priority. PR: kern/157657 Submitted by: krivenok.dmitry at gmail dot com MFC after: 3 days	2011-06-07 02:50:14 +00:00
Marcel Moolenaar	299cceef03	Fix making kernel dumps from the debugger by creating a command for it. Do not not expect a developer to call doadump(). Calling doadump does not necessarily work when it's declared static. Nor does it necessarily do what was intended in the context of text dumps. The dump command always creates a core dump. Move printing of error messages from doadump to the dump command, now that we don't have to worry about being called from DDB.	2011-06-07 01:28:12 +00:00
Attilio Rao	81c02539f1	MFC	2011-06-06 21:38:39 +00:00
John Baldwin	69b63a9dc7	Clear the device_t pointer in 'struct resource' when releasing a device as otherwise the sysctl to export rman info can dereference a stale pointer. PR: kern/115371 Submitted by: Arthur Hartwig MFC after: 1 week	2011-06-06 13:12:56 +00:00
Attilio Rao	bc6339618e	MFC	2011-06-01 16:54:33 +00:00
Kenneth D. Merry	534917efef	Fix a bug introduced in revision 222537. In msgbuf_reinit() and msgbuf_init(), we weren't initializing the mutex. Depending on the contents of memory, the LO_INITIALIZED flag might be set on the mutex (either due to a warm reboot, and the message buffer remaining in place, or due to garbage in memory) and in that case, with INVARIANTS turned on, we would trigger an assertion that the mutex had already been initialized. Fix this by bzeroing the message buffer mutex for the _init() and _reinit() paths. Reported by: mdf	2011-05-31 22:39:32 +00:00
Attilio Rao	61b926921f	MFC	2011-05-31 21:22:44 +00:00
Attilio Rao	e370959707	Fix KTR_CPUMASK in order to accept a string representing a cpuset_t. This introduce all the underlying support for making this possible (via the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64 implements its own assembly primitives for tracing events and needs to properly check it. Anyway the sparc64 logic is not implemented yet due to lack of knowledge (by me) and time (by marius), but it is just a matter of using ktr_cpumask when possible. Tested and fixed by: pluknet Reviewed by: marius	2011-05-31 20:48:58 +00:00
Attilio Rao	d0984adc98	Revert a change that crept in during MFC.	2011-05-31 20:23:33 +00:00
Kenneth D. Merry	d42a4eb507	Fix apparent garbage in the message buffer. While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix scrambled console output, the message buffer and syslog were still getting log messages one character at a time. While all of the characters still made it into the log (courtesy of atomic operations), they were often interleaved when there were multiple threads writing to the buffer at the same time. This fixes message buffer accesses to use buffering logic as well, so that strings that are less than PRINTF_BUFR_SIZE will be put into the message buffer atomically. So now dmesg output should look the same as console output. subr_msgbuf.c: Convert most message buffer calls to use a new spin lock instead of atomic variables in some places. Add a new routine, msgbuf_addstr(), that adds a NUL-terminated string to a message buffer. This takes a priority argument, which allows us to eliminate some races (at least in the the string at a time case) that are present in the implementation of msglogchar(). (dangling and lastpri are static variables, and are subject to races when multiple callers are present.) msgbuf_addstr() also allows the caller to request that carriage returns be stripped out of the string. This matches the behavior of msglogchar(), but in testing so far it doesn't appear that any newlines are being stripped out. So the carriage return removal functionality may be a candidate for removal later on if further analysis shows that it isn't necessary. subr_prf.c: Add a new msglogstr() routine that calls msgbuf_logstr(). Rename putcons() to putbuf(). This now handles buffered output to the message log as well as the console. Also, remove the logic in putcons() (now putbuf()) that added a carriage return before a newline. The console path was the only path that needed it, and cnputc() (called by cnputs()) already adds a carriage return. So this duplication resulted in kernel-generated console output lines ending in '\r''\r''\n'. Refactor putchar() to handle the new buffering scheme. Add buffering to log(). Change log_console() to use msglogstr() instead of msglogchar(). Don't add extra newlines by default in log_console(). Hide that behavior behind a tunable/sysctl (kern.log_console_add_linefeed) for those who would like the old behavior. The old behavior led to the insertion of extra newlines for log output for programs that print out a string, and then a trailing newline on a separate write. (This is visible with dmesg -a.) msgbuf.h: Add a prototype for msgbuf_addstr(). Add three new fields to struct msgbuf, msg_needsnl, msg_lastpri and msg_lock. The first two are needed for log message functionality previously handled by msglogchar(). (Which is still active if buffering isn't enabled.) Include sys/lock.h and sys/mutex.h for the new mutex. Reviewed by: gibbs	2011-05-31 17:29:58 +00:00
Nathan Whitehorn	d098f93019	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb	2011-05-31 15:11:43 +00:00
Attilio Rao	5b6ea0b538	MFC	2011-05-31 14:18:10 +00:00
Attilio Rao	da3dd8b7ab	MFC	2011-05-29 18:33:13 +00:00
Mikolaj Golub	3204c8e596	In soreceive_generic(), if MSG_WAITALL is set but the request is larger than the receive buffer, we have to receive in sections. When notifying the protocol that some data has been drained the lock is released for a moment. Returning we block waiting for the rest of data. There is a race, when data could arrive while the lock was released and then the connection stalls in sbwait. Fix this by checking for data before blocking and skip blocking if there are some. PR: kern/154504 Reported by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Tested by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Reviewed by: rwatson Approved by: kib (co-mentor) MFC after: 2 weeks	2011-05-29 18:00:50 +00:00
Attilio Rao	c7df91af4b	MFC	2011-05-29 00:59:38 +00:00
Edward Tomasz Napierala	7e2548ae0a	Remove definitions for RACCT_FSIZE and RACCT_SBSIZE - these two are rather performance-sensitive and not that useful, so I won't be merging them before 9.0.	2011-05-27 19:57:58 +00:00
Attilio Rao	9cb46334ee	MFC	2011-05-27 16:09:10 +00:00
Edward Tomasz Napierala	b8fdb0d94d	Fix support for RACCT_CORE by merging forgotten file.	2011-05-26 18:54:07 +00:00
Attilio Rao	7fcdc9a26f	MFC	2011-05-26 17:38:00 +00:00
John Baldwin	5b41f90fd1	Silly spelling typos. Submitted by: "b. f."	2011-05-24 19:55:57 +00:00
John Baldwin	47ad691f87	Fix an issue with critical sections and SMP rendezvous handlers. Specifically, a critical_exit() call that drops the nesting level to zero has a brief window where the pending preemption flag is set and the nesting level is set to zero. This is done purposefully to avoid races where a preemption scheduled by an interrupt could be lost otherwise (see revision 144777). However, this does mean that if an interrupt fires during this window and enters and exits a critical section, it may preempt from the interrupt context. This is generally fine as the interrupt code is careful to arrange critical sections so that they are not exited until it is safe to preempt (e.g. interrupts EOI'd and masked if necessary). However, the SMP rendezvous IPI handler does not quite follow this rule, and in general a rendezvous can never be preempted. Rendezvous handlers are also not permitted to schedule threads to execute, so they will not typically trigger preemptions. SMP rendezvous handlers may use spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks, but using a spinlock also enters and exits a critical section. If the interrupted top-half code is in the brief window of critical_exit() where the nesting level is zero but a preemption is pending, then releasing the spinlock can trigger a preemption. Because we know that SMP rendezvous handlers can never schedule a thread, we know that a critical_exit() in an SMP rendezvous handler will only preempt in this edge case. We also know that the top-half thread will happily handle the deferred preemption once the SMP rendezvous has completed, so the preemption will not be lost. This makes it safe to employ a workaround where we use a nested critical section in the SMP rendezvous code itself around rendezvous action routines to prevent any preemptions during an SMP rendezvous. The workaround intentionally avoids checking for a deferred preemption when leaving the critical section on the assumption that if there is a pending preemption it will be handled by the interrupted top-half code. Submitted by: mlaier (variation specific to rm_cleanIPI()) Obtained from: Isilon MFC after: 1 week	2011-05-24 13:36:41 +00:00
John Baldwin	af21235ac4	Update comments for DEVICE_PROBE() to reflect that BUS_PROBE_DEFAULT is now the preferred typical return value from a probe routine. Discourage the use of 0 (BUS_PROBE_SPECIFIC) as it should be used very rarely. Point the reader to the DEVICE_PROBE(9) manpage for more detailed notes on possible probe return values. Submitted by: Philip Soeberg philip-dev of soeberg net	2011-05-24 13:22:40 +00:00
John Baldwin	211d4a2c42	Simplify a stale assertion. We have not called mi_switch() from a nested critical section during a preemption for several years. MFC after: 1 week	2011-05-24 13:17:08 +00:00
Attilio Rao	3ac3f6002b	MFC	2011-05-23 23:58:02 +00:00
Attilio Rao	217e1c0ebc	Revert a patch that unvolountary sneaked in while I was MFCing.	2011-05-23 23:50:21 +00:00
Ruslan Ermilov	5e863acb63	BKVASIZE was bumped to 16k more than a decade ago.	2011-05-23 19:59:01 +00:00
Jaakko Heinonen	f53edc909e	In init_dynamic_kenv(), ignore environment strings exceeding the KENV_MNAMELEN + 1 + KENV_MVALLEN + 1 length limit to avoid buffer overflow in getenv(). Currenly loader(8) doesn't limit the length of environment strings. PR: kern/132104 MFC after: 1 month	2011-05-23 16:40:44 +00:00
Attilio Rao	a9ff18a210	MFC	2011-05-23 01:17:30 +00:00
Attilio Rao	e3071102d6	Merge r221912 from largeSMP project branch: Fix a long-standing bug in cpuset_thread0() where only the first part of cs_mask is set full. Submitted by: anonymous MFC after: 1 week	2011-05-22 21:35:03 +00:00
Attilio Rao	8c4431d022	MFC	2011-05-22 20:41:10 +00:00
Attilio Rao	34a1e065bd	Make cpusetobj_strprint() prepare the string in order to print the least significant cpuset_t word at the outmost right part of the string (more far from the beginning of it). This follows the natural build of bits rappresentation in the words.	2011-05-22 20:29:47 +00:00
Rick Macklem	694a586a43	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib	2011-05-22 01:07:54 +00:00
Attilio Rao	1fff3a5663	MFC	2011-05-19 22:55:37 +00:00
Konstantin Belousov	dbe66680b0	The CDP_ACTIVE flag is cleared at the beginning of destroy_devl(), and destroy_devl() drops dev_mtx. The protection against the race with dev_rel(), introduced in r163328, should be extended to cover destroy_devl() calls for the children of the destroyed dev. Reported and tested by: joerg MFC after: 1 week	2011-05-18 22:36:58 +00:00
Attilio Rao	a8586beeb0	Fix mismerge. Reported by: pluknet	2011-05-18 15:50:12 +00:00
Attilio Rao	a0a43452ae	Merge r221285 from largeSMP project: - Remove the following sysctl: kern.sched.ipiwakeup.onecpu kern.sched.ipiwakeup.htt2 Because they are absolutely obsolete. Probabilly the whole wakeup forward mechanism should be revisited for a better fitting in modern hw, in the future. - As map2 variable is no longer used rename map3 to map2 - Fix a string by making more informative the msg and removing the arguments passing. Reviewed by: julian Tested by: several	2011-05-17 22:14:00 +00:00
Attilio Rao	fea3a3fa94	MFC	2011-05-17 22:03:01 +00:00
John Baldwin	f83e8b25c1	Fix a race in the SMP rendezvous code. Specifically, the write by the last CPU to to finish the rendezvous action may become visible to different CPUs at different times. As a result, the CPU that initiated the rendezvous may exit the rendezvous and drop the lock allowing another rendezvous to be initiated on the same CPU or a different CPU. In that case the exit sentinel may be cleared before all CPUs have noticed causing those CPUs to hang forever. Workaround this by using a generation count to notice when this race occurs and to exit the rendezvous in that case. The problem was independently diagnosted by mlaier@ and avg@ as well. Submitted by: neel Reviewed by: avg, mlaier Obtained from: NetApp MFC after: 1 week	2011-05-17 16:39:08 +00:00
Poul-Henning Kamp	384bf94c48	Use memset() instead of bzero() and memcpy() instead of bcopy(), there is no relevant difference for sbufs, and it increases portability of the source code. Split the actual initialization of the sbuf into a separate local function, so that certain static code checkers can understand what sbuf_new() does, thus eliminating on silly annoyance of MISRA compliance testing. Contributed by: An anonymous company in the last business I expected sbufs to invade.	2011-05-17 11:04:50 +00:00
Poul-Henning Kamp	eb05ee7a71	Don't expect PAGE_SIZE to exist on all platforms (It is a pretty arbitrary choice of default size in the first place) Reverse the order of arguments to the internal static sbuf_put_byte() function to match everything else in this file. Move sbuf_putc_func() inside the kernel version of sbuf_vprintf where it belongs. sbuf_putc() incorrectly used sbuf_putc_func() which supress NUL characters, it should use sbuf_put_byte(). Make sbuf_finish() return -1 on error. Minor stylistic nits fixed.	2011-05-17 06:36:32 +00:00
Attilio Rao	d59dd76c22	Merge r221278 from largeSMP project: idle_cpus_mask is just used in sched_4bsd, thus make it private for it. Tested by: several	2011-05-16 23:20:12 +00:00
Attilio Rao	7e7a34e520	MFC	2011-05-16 16:34:03 +00:00
Poul-Henning Kamp	71c2bc5c6b	Change the length quantities of sbufs to be ssize_t rather than int. Constify a couple of arguments.	2011-05-16 16:18:40 +00:00
Andriy Gapon	dd7498ae03	better integrate cyclic module with clocksource/eventtimer subsystem Now in the case when one-shot timers are used cyclic events should fire closer to theier scheduled times. As the cyclic is currently used only to drive DTrace profile provider, this is the area where the change makes a difference. Reviewed by: mav (earlier version, a while ago) X-MFC after: clocksource/eventtimer subsystem	2011-05-16 15:29:59 +00:00
Attilio Rao	f27aed53d0	Fix a longstanding bug where only the first part of the cpumask was correctly set full. Submitted by: anonymous	2011-05-14 19:36:12 +00:00
Attilio Rao	faa0e911fb	Simplify the code here. Submitted by: jhb	2011-05-14 18:22:08 +00:00
Attilio Rao	739e31f6d7	MFC	2011-05-13 15:20:57 +00:00
Matthew D Fleming	fa2c76c975	Correctly use INOUT for the offset/len parameters to vop_allocate. As far as I can tell this is for documentation only at the moment.	2011-05-13 14:29:28 +00:00
Alexander Motin	167aee3895	Refactor Xen PV code to use new event timers subsystem. That uses one-shot Xen timer and time counter to provide one-shot and periodic time events. On my tests this reduces idle interruts rate down to about 30Hz, and accor- ding to Xen VM Manager reduces host CPU load by three times comparing to the previous periodic 100Hz clock. Also now, when needed, it is possible to increase HZ rate without useless CPU burning during idle periods. Now only ia64 and some ARMs left not migrated to the new event timers.	2011-05-13 12:39:37 +00:00
Matthew D Fleming	3d08a76bbc	Use a name instead of a magic number for kern_yield(9) when the priority should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >	2011-05-13 05:27:58 +00:00
Attilio Rao	ef607a6aa3	MFC	2011-05-12 14:01:40 +00:00
Stanislav Sedov	ff6f41a472	- Do no try to drop a NULL filedesc pointer.	2011-05-12 10:56:33 +00:00
Stanislav Sedov	0daf62d9f5	- Commit work from libprocstat project. These patches add support for runtime file and processes information retrieval from the running kernel via sysctl in the form of new library, libprocstat. The library also supports KVM backend for analyzing memory crash dumps. Both procstat(1) and fstat(1) utilities have been modified to take advantage of the library (as the bonus point the fstat(1) utility no longer need superuser privileges to operate), and the procstat(1) utility is now able to display information from memory dumps as well. The newly introduced fuser(1) utility also uses this library and able to operate via sysctl and kvm backends. The library is by no means complete (e.g. KVM backend is missing vnode name resolution routines, and there're no manpages for the library itself) so I plan to improve it further. I'm commiting it so it will get wider exposure and review. We won't be able to MFC this work as it relies on changes in HEAD, which was introduced some time ago, that break kernel ABI. OTOH we may be able to merge the library with KVM backend if we really need it there. Discussed with: rwatson	2011-05-12 10:11:39 +00:00
Attilio Rao	b9f714be9f	MFC	2011-05-07 23:34:14 +00:00
Jaakko Heinonen	852bee75b7	To avoid duplicated warning, move WITNESS_WARN() added in r221597 to the branch which doesn't call malloc(9). Suggested by: kib	2011-05-07 17:59:07 +00:00
Jaakko Heinonen	816c203937	Add WITNESS_WARN() to getenv() to explicitly note that the function may sleep. This helps to expose bugs when the requested environment variable doesn't exist.	2011-05-07 11:10:58 +00:00
Attilio Rao	71a19bdc64	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
Attilio Rao	7505ef3a41	MFC	2011-05-04 15:45:23 +00:00
Attilio Rao	94ebcddde3	MFC	2011-05-03 18:57:46 +00:00
Andrey V. Elsukov	b50a7799b8	Add make_dev_alias_p() function. It is similar to make_dev_alias(), but it may return an error like make_dev_p() does. Reviewed by: kib (previous version) MFC after: 2 weeks	2011-05-03 18:54:18 +00:00
Edward Tomasz Napierala	a7ad07bff3	Change the way rctl interfaces with jails by introducing prison_racct structure, which acts as a proxy between them. This makes jail rules persistent, i.e. they can be added before jail gets created, and they don't disappear when the jail gets destroyed.	2011-05-03 07:32:58 +00:00
Attilio Rao	f0283a735c	- Remove the following sysctl: kern.sched.ipiwakeup.onecpu kern.sched.ipiwakeup.htt2 Because they are absolutely obsolete. Probabilly the whole wakeup forward mechanism should be revisited for a better fitting in modern hw. - As map2 variable is no longer used rename map3 to map2 - Fix a string by making more informative the msg and removing the arguments passing Approved by: julian	2011-04-30 23:28:07 +00:00
Attilio Rao	3121f5347e	idle_cpus_mask is just used in the SMP case and within sched_4BSD. Declare appropriately.	2011-04-30 22:30:18 +00:00
John Baldwin	85ee63c923	Add a new bus method, BUS_ADJUST_RESOURCE() that is intended to be a wrapper around rman_adjust_resource(). Include a generic implementation, bus_generic_adjust_resource() which passes the request up to the parent bus. There is currently no default implementation. A bus_adjust_resource() wrapper is provided for use in drivers.	2011-04-29 21:36:45 +00:00
John Baldwin	bb82622c3e	Extend the rman(9) API to support altering an existing resource. Specifically, these changes allow a resource to back a relocatable and resizable resource such as the I/O window decoders in PCI-PCI bridges. - rman_adjust_resource() can adjust the start and end address of an existing resource. It only succeeds if the newly requested address space is already free. It also supports shrinking a resource in which case the freed space will be marked unallocated in the rman. - rman_first_free_region() and rman_last_free_region() return the start and end addresses for the first or last unallocated region in an rman, respectively. This can be used to determine by how much the resource backing an rman must be adjusted to accomodate an allocation request that does not fit into the existing rman. While here, document the rm_start and rm_end fields in struct rman, rman_is_region_manager(), the bound argument to rman_reserve_resource_bound(), and rman_init_from_resource().	2011-04-29 20:05:19 +00:00
John Baldwin	b67d11bbcc	Change rman_manage_region() to actually honor the rm_start and rm_end constraints on the rman and reject attempts to manage a region that is out of range. - Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of ~0ul). - To preserve existing behavior, change rman_init() to set rm_start and rm_end to allow managing the full range (0 to ~0ul) if they are not set by the caller when rman_init() is called.	2011-04-29 18:41:21 +00:00
Attilio Rao	2be767e069	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
Ryan Stone	60dd73b78b	If the 4BSD scheduler tries to schedule a thread that has been pinned or bound to an AP before SMP has started, the system will panic when we try to touch per-CPU state for that AP because that state has not been initialized yet. Fix this in the same way as ULE: place all threads in the global run queue before SMP has started. Reviewed by: jhb MFC after: 1 month	2011-04-26 20:34:30 +00:00
Konstantin Belousov	b2ad91f26b	Implement the delayed task execution extension to the taskqueue mechanism. The caller may specify a timeout in ticks after which the task will be scheduled. Sponsored by: The FreeBSD Foundation Reviewed by: jeff, jhb MFC after: 1 month	2011-04-26 11:39:56 +00:00
Jeff Roberson	5bd186a65a	- Catch up to falloc() changes. - PHOLD() before using a task structure on the stack. - Fix a LOR between the sleepq lock and thread lock in _intr_drain().	2011-04-26 07:30:52 +00:00
Rick Macklem	c65c068a5f	Fix a LOR in vfs_busy() where, after msleeping, it would lock the mutexes in the wrong order for the case where the MBF_MNTLSTLOCK is set. I believe this did have the potential for deadlock. For example, if multiple nfsd threads called vfs_busyfs(), which calls vfs_busy() with MBF_MNTLSTLOCK. Thanks go to pho for catching this during his testing. Tested by: pho Submitted by: kib MFC after: 2 weeks	2011-04-23 11:22:48 +00:00
Jaakko Heinonen	1b0fe69dc9	Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is no more need to add "noro" in vfs_donmount() to cancel "ro". This also fixes a problem of canceling options beginning with "no". For example, "noatime" didn't cancel "nonoatime". Thus it was possible that both "noatime" and "nonoatime" were active at the same time. Reviewed by: bde	2011-04-22 07:26:09 +00:00
Matthew D Fleming	1ce4508f6d	Allow VOP_ALLOCATE to be iterative, and have kern_posix_fallocate(9) drive looping and potentially yielding. Requested by: kib	2011-04-19 16:36:24 +00:00
Matthew D Fleming	5d253e418f	Fix a copy/paste whitespace error.	2011-04-18 16:40:47 +00:00
Matthew D Fleming	7323776b01	Regen.	2011-04-18 16:32:47 +00:00
Matthew D Fleming	d91f88f7f3	Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)	2011-04-18 16:32:22 +00:00
Jilles Tjoelker	6100955206	ktrace: Log the code for all signals (PSIG events). The code provides information on how the signal was generated. Formerly, the code was only logged for traps, much like only signal handlers for traps received a meaningful si_code before FreeBSD 7.0. In rare cases, no information is available and 0 is still logged. MFC after: 1 week	2011-04-17 14:38:11 +00:00
Dmitry Chagin	fa2835d296	Remove malloc(9) return value checks when M_WAITOK is used. MFC after: 2 Week	2011-04-16 16:20:51 +00:00
Gleb Smirnoff	443301e296	Revert r194662, since it breaks ng_ksocket(4) and may break other socket consumers with alternate sb_upcall. PR: kern/154676 Submitted by: Arnaud Lacombe <lacombar gmail.com> MFC after: 7 days	2011-04-14 14:54:22 +00:00
Sergey Kandaurov	ced9253e4e	Remove stale M_ZOMBIE malloc type. This type is unused since embedding p_ru into struct proc. MFC after: 1 week	2011-04-14 14:25:47 +00:00
Gavin Atkinson	0f4d3c921d	Add a new DDB command, "show rmans", which will show the address and brief details of each rman header, but not the contents of all rman structures in the system. This is especially useful on platforms where some rmans have many thousands of entries in rmans, making scrolling through the output of "show all rman" impractical. Individual rmans can then be viewed including their contents with "show rman 0xaddr" as usual. Reviewed by: jhb	2011-04-13 19:10:56 +00:00
Sergey Kandaurov	6bed196c35	Staticize malloc types. Approved by: lstewart MFC after: 1 week	2011-04-13 11:28:46 +00:00
Lawrence Stewart	891b8ed467	Use the full and proper company name for Swinburne University of Technology throughout the source tree. Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days	2011-04-12 08:13:18 +00:00
Edward Tomasz Napierala	415896e3b1	Rename a misnamed structure field (hr_loginclass), and reorder priv(9) constants to match the order and naming of syscalls. No functional changes.	2011-04-10 18:35:43 +00:00
Konstantin Belousov	2cfd5d3c91	Some callers of proc_reparent() already have the parent process locked. Detect the situation and avoid process lock recursion. Reported by: Fabian Keil <freebsd-listen fabiankeil de>	2011-04-10 17:07:02 +00:00
Attilio Rao	1283e9cd60	Reintroduce the fix already discussed in r216805 (please check its history for a detailed explanation of the problems). The only difference with the previous fix is in Solution2: CPUBLOCK is no longer set when exiting from callout_reset_*() functions, which avoid the deadlock (leading to r217161). There is no need to CPUBLOCK there because the running-and-migrating assumption is strong enough to avoid problems there. Furthermore add a better !SMP compliancy (leading to shrinked code and structures) and facility macros/functions. Tested by: gianni, pho, dim MFC after: 3 weeks	2011-04-08 18:48:57 +00:00
Edward Tomasz Napierala	722581d9e6	Add RACCT_NOFILE accounting. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 19:13:04 +00:00
Edward Tomasz Napierala	b1fb5f9c8d	Style fix. Submitted by: jhb@	2011-04-06 19:08:50 +00:00
Edward Tomasz Napierala	3bcf74459f	Add accounting for SysV-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 18:11:24 +00:00
John Baldwin	e806d352d2	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week	2011-04-06 17:47:22 +00:00
Edward Tomasz Napierala	8caddd81e2	Add ucred pointer to the SysV-related memory structures. This is required for racct. Note that after this commit, ipcs(1) needs to be rebuilt. Otherwise, it will fail with "ipcs: sysctlbyname: kern.ipc.msqids: Cannot allocate memory". Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 16:59:54 +00:00
Edward Tomasz Napierala	1ba5ad4210	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
Edward Tomasz Napierala	c98fe0a557	Add missing stubs.	2011-04-05 19:50:34 +00:00
Sergey Kandaurov	443db69597	Remove malloc type M_NETADDR unused since splitting into vfs_subr.c and vfs_export.c. MFC after: 1 week	2011-04-04 16:23:01 +00:00
Edward Tomasz Napierala	6fd8c2bd8a	Add accounting for RACCT_NPTS.	2011-04-02 15:02:42 +00:00
Konstantin Belousov	1fe80828e7	After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9) and remove the falloc() version that lacks flag argument. This is done to reduce the KPI bloat. Requested by: jhb X-MFC-note: do not	2011-04-01 13:28:34 +00:00
Konstantin Belousov	7332c129e0	Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month	2011-04-01 11:16:29 +00:00
Edward Tomasz Napierala	58c77a9d53	Enable accounting for RACCT_NPROC and RACCT_NTHR. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-31 19:22:11 +00:00
Edward Tomasz Napierala	e4dcb7046a	Notify racct when process credentials change. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-31 18:12:04 +00:00
Fabien Thomas	586cb6ec77	Clearing the flag when preempting will let the preempted thread run too much time. This can finish in a scheduler deadlock with ping-pong between two threads. One sample of this is: - device lapic (to have a preemption point on critical_exit()) - options DEVICE_POLLING with HZ>1499 (to have lapic freq = hardclock freq) - running a cpu intensive task (that does not enter the kernel) - only one CPU on SMP or no SMP. As requested by jhb@ 4BSD have received the same type of fix instead of propagating the flag to the new thread. Reviewed by: jhb, jeff MFC after: 1 month	2011-03-31 13:59:47 +00:00
Edward Tomasz Napierala	66db16fc49	Regenerate.	2011-03-30 17:59:54 +00:00
Edward Tomasz Napierala	ec125fbbc5	Add rctl. It's used by racct to take user-configurable actions based on the set of rules it maintains and the current resource usage. It also privides userland API to manage that ruleset. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-30 17:48:15 +00:00
Konstantin Belousov	8666550972	Provide compat32 shims for kldstat(2). Requested and tested by: jpaetzel MFC after: 1 week	2011-03-30 14:46:12 +00:00
Edward Tomasz Napierala	d31b45e164	Remove pointless (always true) KASSERTs. Submitted by: pjd	2011-03-29 19:19:10 +00:00
Edward Tomasz Napierala	097055e26d	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-29 17:47:25 +00:00
Konstantin Belousov	cea8f30a54	Fix the check for vm_map_remove() error. Pointed out by: alc MFC after: 2 weeks	2011-03-28 19:44:54 +00:00
Konstantin Belousov	cce6e354aa	Trim white spaces, adjust style. MFC after: 2 weeks	2011-03-28 13:28:23 +00:00
Konstantin Belousov	937060a843	Handle zero length in copyout_unmap(). Submitted by: John Wehle <john feith com> MFC after: 2 weeks	2011-03-28 13:21:26 +00:00
Konstantin Belousov	0f502d1c4e	Promote ksyms_map() and ksyms_unmap() to general facility copyout_map() and copyout_unmap() interfaces. Submitted by: John Wehle <john feith com>, nox MFC after: 2 weeks	2011-03-28 12:48:33 +00:00

1 2 3 4 5 ...

12330 Commits