freebsd-skq

Author	SHA1	Message	Date
attilio	bc4d32e80b	MFC	2011-05-31 21:22:44 +00:00
attilio	a924571ff7	Fix KTR_CPUMASK in order to accept a string representing a cpuset_t. This introduce all the underlying support for making this possible (via the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64 implements its own assembly primitives for tracing events and needs to properly check it. Anyway the sparc64 logic is not implemented yet due to lack of knowledge (by me) and time (by marius), but it is just a matter of using ktr_cpumask when possible. Tested and fixed by: pluknet Reviewed by: marius	2011-05-31 20:48:58 +00:00
attilio	066c7ac96c	Revert a change that crept in during MFC.	2011-05-31 20:23:33 +00:00
ken	0febb6df5e	Fix apparent garbage in the message buffer. While we have had a fix in place (options PRINTF_BUFR_SIZE=128) to fix scrambled console output, the message buffer and syslog were still getting log messages one character at a time. While all of the characters still made it into the log (courtesy of atomic operations), they were often interleaved when there were multiple threads writing to the buffer at the same time. This fixes message buffer accesses to use buffering logic as well, so that strings that are less than PRINTF_BUFR_SIZE will be put into the message buffer atomically. So now dmesg output should look the same as console output. subr_msgbuf.c: Convert most message buffer calls to use a new spin lock instead of atomic variables in some places. Add a new routine, msgbuf_addstr(), that adds a NUL-terminated string to a message buffer. This takes a priority argument, which allows us to eliminate some races (at least in the the string at a time case) that are present in the implementation of msglogchar(). (dangling and lastpri are static variables, and are subject to races when multiple callers are present.) msgbuf_addstr() also allows the caller to request that carriage returns be stripped out of the string. This matches the behavior of msglogchar(), but in testing so far it doesn't appear that any newlines are being stripped out. So the carriage return removal functionality may be a candidate for removal later on if further analysis shows that it isn't necessary. subr_prf.c: Add a new msglogstr() routine that calls msgbuf_logstr(). Rename putcons() to putbuf(). This now handles buffered output to the message log as well as the console. Also, remove the logic in putcons() (now putbuf()) that added a carriage return before a newline. The console path was the only path that needed it, and cnputc() (called by cnputs()) already adds a carriage return. So this duplication resulted in kernel-generated console output lines ending in '\r''\r''\n'. Refactor putchar() to handle the new buffering scheme. Add buffering to log(). Change log_console() to use msglogstr() instead of msglogchar(). Don't add extra newlines by default in log_console(). Hide that behavior behind a tunable/sysctl (kern.log_console_add_linefeed) for those who would like the old behavior. The old behavior led to the insertion of extra newlines for log output for programs that print out a string, and then a trailing newline on a separate write. (This is visible with dmesg -a.) msgbuf.h: Add a prototype for msgbuf_addstr(). Add three new fields to struct msgbuf, msg_needsnl, msg_lastpri and msg_lock. The first two are needed for log message functionality previously handled by msglogchar(). (Which is still active if buffering isn't enabled.) Include sys/lock.h and sys/mutex.h for the new mutex. Reviewed by: gibbs	2011-05-31 17:29:58 +00:00
nwhitehorn	a69e106b2f	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb	2011-05-31 15:11:43 +00:00
attilio	b1bf71d3c5	MFC	2011-05-31 14:18:10 +00:00
attilio	8dd6262cd3	MFC	2011-05-29 18:33:13 +00:00
trociny	1dfa9ab873	In soreceive_generic(), if MSG_WAITALL is set but the request is larger than the receive buffer, we have to receive in sections. When notifying the protocol that some data has been drained the lock is released for a moment. Returning we block waiting for the rest of data. There is a race, when data could arrive while the lock was released and then the connection stalls in sbwait. Fix this by checking for data before blocking and skip blocking if there are some. PR: kern/154504 Reported by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Tested by: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua> Reviewed by: rwatson Approved by: kib (co-mentor) MFC after: 2 weeks	2011-05-29 18:00:50 +00:00
attilio	55a3bf38a5	MFC	2011-05-29 00:59:38 +00:00
trasz	5499b0b9d5	Remove definitions for RACCT_FSIZE and RACCT_SBSIZE - these two are rather performance-sensitive and not that useful, so I won't be merging them before 9.0.	2011-05-27 19:57:58 +00:00
attilio	eefddaeed6	MFC	2011-05-27 16:09:10 +00:00
trasz	6a13eaa4d1	Fix support for RACCT_CORE by merging forgotten file.	2011-05-26 18:54:07 +00:00
attilio	867c6223e7	MFC	2011-05-26 17:38:00 +00:00
jhb	3c1a24d701	Silly spelling typos. Submitted by: "b. f."	2011-05-24 19:55:57 +00:00
jhb	7028e129fd	Fix an issue with critical sections and SMP rendezvous handlers. Specifically, a critical_exit() call that drops the nesting level to zero has a brief window where the pending preemption flag is set and the nesting level is set to zero. This is done purposefully to avoid races where a preemption scheduled by an interrupt could be lost otherwise (see revision 144777). However, this does mean that if an interrupt fires during this window and enters and exits a critical section, it may preempt from the interrupt context. This is generally fine as the interrupt code is careful to arrange critical sections so that they are not exited until it is safe to preempt (e.g. interrupts EOI'd and masked if necessary). However, the SMP rendezvous IPI handler does not quite follow this rule, and in general a rendezvous can never be preempted. Rendezvous handlers are also not permitted to schedule threads to execute, so they will not typically trigger preemptions. SMP rendezvous handlers may use spinlocks (carefully) such as the rm_cleanIPI() handler used in rmlocks, but using a spinlock also enters and exits a critical section. If the interrupted top-half code is in the brief window of critical_exit() where the nesting level is zero but a preemption is pending, then releasing the spinlock can trigger a preemption. Because we know that SMP rendezvous handlers can never schedule a thread, we know that a critical_exit() in an SMP rendezvous handler will only preempt in this edge case. We also know that the top-half thread will happily handle the deferred preemption once the SMP rendezvous has completed, so the preemption will not be lost. This makes it safe to employ a workaround where we use a nested critical section in the SMP rendezvous code itself around rendezvous action routines to prevent any preemptions during an SMP rendezvous. The workaround intentionally avoids checking for a deferred preemption when leaving the critical section on the assumption that if there is a pending preemption it will be handled by the interrupted top-half code. Submitted by: mlaier (variation specific to rm_cleanIPI()) Obtained from: Isilon MFC after: 1 week	2011-05-24 13:36:41 +00:00
jhb	4d0fe668f7	Update comments for DEVICE_PROBE() to reflect that BUS_PROBE_DEFAULT is now the preferred typical return value from a probe routine. Discourage the use of 0 (BUS_PROBE_SPECIFIC) as it should be used very rarely. Point the reader to the DEVICE_PROBE(9) manpage for more detailed notes on possible probe return values. Submitted by: Philip Soeberg philip-dev of soeberg net	2011-05-24 13:22:40 +00:00
jhb	d73862793b	Simplify a stale assertion. We have not called mi_switch() from a nested critical section during a preemption for several years. MFC after: 1 week	2011-05-24 13:17:08 +00:00
attilio	9879530ca1	MFC	2011-05-23 23:58:02 +00:00
attilio	66305282ac	Revert a patch that unvolountary sneaked in while I was MFCing.	2011-05-23 23:50:21 +00:00
ru	5a5a985b61	BKVASIZE was bumped to 16k more than a decade ago.	2011-05-23 19:59:01 +00:00
jh	fbe30c6e5c	In init_dynamic_kenv(), ignore environment strings exceeding the KENV_MNAMELEN + 1 + KENV_MVALLEN + 1 length limit to avoid buffer overflow in getenv(). Currenly loader(8) doesn't limit the length of environment strings. PR: kern/132104 MFC after: 1 month	2011-05-23 16:40:44 +00:00
attilio	6d7371f950	MFC	2011-05-23 01:17:30 +00:00
attilio	a8b367d89d	Merge r221912 from largeSMP project branch: Fix a long-standing bug in cpuset_thread0() where only the first part of cs_mask is set full. Submitted by: anonymous MFC after: 1 week	2011-05-22 21:35:03 +00:00
attilio	627bd73cdb	MFC	2011-05-22 20:41:10 +00:00
attilio	08bcb681d2	Make cpusetobj_strprint() prepare the string in order to print the least significant cpuset_t word at the outmost right part of the string (more far from the beginning of it). This follows the natural build of bits rappresentation in the words.	2011-05-22 20:29:47 +00:00
rmacklem	fbb8a5e8ec	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib	2011-05-22 01:07:54 +00:00
attilio	0372174d48	MFC	2011-05-19 22:55:37 +00:00
kib	8407c0a698	The CDP_ACTIVE flag is cleared at the beginning of destroy_devl(), and destroy_devl() drops dev_mtx. The protection against the race with dev_rel(), introduced in r163328, should be extended to cover destroy_devl() calls for the children of the destroyed dev. Reported and tested by: joerg MFC after: 1 week	2011-05-18 22:36:58 +00:00
attilio	0828d417d4	Fix mismerge. Reported by: pluknet	2011-05-18 15:50:12 +00:00
attilio	01e90e3193	Merge r221285 from largeSMP project: - Remove the following sysctl: kern.sched.ipiwakeup.onecpu kern.sched.ipiwakeup.htt2 Because they are absolutely obsolete. Probabilly the whole wakeup forward mechanism should be revisited for a better fitting in modern hw, in the future. - As map2 variable is no longer used rename map3 to map2 - Fix a string by making more informative the msg and removing the arguments passing. Reviewed by: julian Tested by: several	2011-05-17 22:14:00 +00:00
attilio	2cdf500faf	MFC	2011-05-17 22:03:01 +00:00
jhb	8d84cd707e	Fix a race in the SMP rendezvous code. Specifically, the write by the last CPU to to finish the rendezvous action may become visible to different CPUs at different times. As a result, the CPU that initiated the rendezvous may exit the rendezvous and drop the lock allowing another rendezvous to be initiated on the same CPU or a different CPU. In that case the exit sentinel may be cleared before all CPUs have noticed causing those CPUs to hang forever. Workaround this by using a generation count to notice when this race occurs and to exit the rendezvous in that case. The problem was independently diagnosted by mlaier@ and avg@ as well. Submitted by: neel Reviewed by: avg, mlaier Obtained from: NetApp MFC after: 1 week	2011-05-17 16:39:08 +00:00
phk	c0026c6642	Use memset() instead of bzero() and memcpy() instead of bcopy(), there is no relevant difference for sbufs, and it increases portability of the source code. Split the actual initialization of the sbuf into a separate local function, so that certain static code checkers can understand what sbuf_new() does, thus eliminating on silly annoyance of MISRA compliance testing. Contributed by: An anonymous company in the last business I expected sbufs to invade.	2011-05-17 11:04:50 +00:00
phk	99b5f98226	Don't expect PAGE_SIZE to exist on all platforms (It is a pretty arbitrary choice of default size in the first place) Reverse the order of arguments to the internal static sbuf_put_byte() function to match everything else in this file. Move sbuf_putc_func() inside the kernel version of sbuf_vprintf where it belongs. sbuf_putc() incorrectly used sbuf_putc_func() which supress NUL characters, it should use sbuf_put_byte(). Make sbuf_finish() return -1 on error. Minor stylistic nits fixed.	2011-05-17 06:36:32 +00:00
attilio	fd96a5afd1	Merge r221278 from largeSMP project: idle_cpus_mask is just used in sched_4bsd, thus make it private for it. Tested by: several	2011-05-16 23:20:12 +00:00
attilio	d57a3c7c06	MFC	2011-05-16 16:34:03 +00:00
phk	9ed2621ed9	Change the length quantities of sbufs to be ssize_t rather than int. Constify a couple of arguments.	2011-05-16 16:18:40 +00:00
avg	576b51ab8f	better integrate cyclic module with clocksource/eventtimer subsystem Now in the case when one-shot timers are used cyclic events should fire closer to theier scheduled times. As the cyclic is currently used only to drive DTrace profile provider, this is the area where the change makes a difference. Reviewed by: mav (earlier version, a while ago) X-MFC after: clocksource/eventtimer subsystem	2011-05-16 15:29:59 +00:00
attilio	c5a5c48e70	Fix a longstanding bug where only the first part of the cpumask was correctly set full. Submitted by: anonymous	2011-05-14 19:36:12 +00:00
attilio	9309cc63ed	Simplify the code here. Submitted by: jhb	2011-05-14 18:22:08 +00:00
attilio	d62a193525	MFC	2011-05-13 15:20:57 +00:00
mdf	5ffb572962	Correctly use INOUT for the offset/len parameters to vop_allocate. As far as I can tell this is for documentation only at the moment.	2011-05-13 14:29:28 +00:00
mav	1881f29e6e	Refactor Xen PV code to use new event timers subsystem. That uses one-shot Xen timer and time counter to provide one-shot and periodic time events. On my tests this reduces idle interruts rate down to about 30Hz, and accor- ding to Xen VM Manager reduces host CPU load by three times comparing to the previous periodic 100Hz clock. Also now, when needed, it is possible to increase HZ rate without useless CPU burning during idle periods. Now only ia64 and some ARMs left not migrated to the new event timers.	2011-05-13 12:39:37 +00:00
mdf	bbbc4c5455	Use a name instead of a magic number for kern_yield(9) when the priority should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >	2011-05-13 05:27:58 +00:00
attilio	99e65551b9	MFC	2011-05-12 14:01:40 +00:00
stas	fc10099a3d	- Do no try to drop a NULL filedesc pointer.	2011-05-12 10:56:33 +00:00
stas	5f9f795476	- Commit work from libprocstat project. These patches add support for runtime file and processes information retrieval from the running kernel via sysctl in the form of new library, libprocstat. The library also supports KVM backend for analyzing memory crash dumps. Both procstat(1) and fstat(1) utilities have been modified to take advantage of the library (as the bonus point the fstat(1) utility no longer need superuser privileges to operate), and the procstat(1) utility is now able to display information from memory dumps as well. The newly introduced fuser(1) utility also uses this library and able to operate via sysctl and kvm backends. The library is by no means complete (e.g. KVM backend is missing vnode name resolution routines, and there're no manpages for the library itself) so I plan to improve it further. I'm commiting it so it will get wider exposure and review. We won't be able to MFC this work as it relies on changes in HEAD, which was introduced some time ago, that break kernel ABI. OTOH we may be able to merge the library with KVM backend if we really need it there. Discussed with: rwatson	2011-05-12 10:11:39 +00:00
attilio	cae315a375	MFC	2011-05-07 23:34:14 +00:00
jh	264f4e742a	To avoid duplicated warning, move WITNESS_WARN() added in r221597 to the branch which doesn't call malloc(9). Suggested by: kib	2011-05-07 17:59:07 +00:00
jh	f428a5f2b3	Add WITNESS_WARN() to getenv() to explicitly note that the function may sleep. This helps to expose bugs when the requested environment variable doesn't exist.	2011-05-07 11:10:58 +00:00
attilio	fe4de567b5	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
attilio	0987be4d6b	MFC	2011-05-04 15:45:23 +00:00
attilio	b29cc3952a	MFC	2011-05-03 18:57:46 +00:00
ae	c39b1f9995	Add make_dev_alias_p() function. It is similar to make_dev_alias(), but it may return an error like make_dev_p() does. Reviewed by: kib (previous version) MFC after: 2 weeks	2011-05-03 18:54:18 +00:00
trasz	752ffacc69	Change the way rctl interfaces with jails by introducing prison_racct structure, which acts as a proxy between them. This makes jail rules persistent, i.e. they can be added before jail gets created, and they don't disappear when the jail gets destroyed.	2011-05-03 07:32:58 +00:00
attilio	7ac8b4739c	- Remove the following sysctl: kern.sched.ipiwakeup.onecpu kern.sched.ipiwakeup.htt2 Because they are absolutely obsolete. Probabilly the whole wakeup forward mechanism should be revisited for a better fitting in modern hw. - As map2 variable is no longer used rename map3 to map2 - Fix a string by making more informative the msg and removing the arguments passing Approved by: julian	2011-04-30 23:28:07 +00:00
attilio	d0e06d02bc	idle_cpus_mask is just used in the SMP case and within sched_4BSD. Declare appropriately.	2011-04-30 22:30:18 +00:00
jhb	deafe4e593	Add a new bus method, BUS_ADJUST_RESOURCE() that is intended to be a wrapper around rman_adjust_resource(). Include a generic implementation, bus_generic_adjust_resource() which passes the request up to the parent bus. There is currently no default implementation. A bus_adjust_resource() wrapper is provided for use in drivers.	2011-04-29 21:36:45 +00:00
jhb	22b381b721	Extend the rman(9) API to support altering an existing resource. Specifically, these changes allow a resource to back a relocatable and resizable resource such as the I/O window decoders in PCI-PCI bridges. - rman_adjust_resource() can adjust the start and end address of an existing resource. It only succeeds if the newly requested address space is already free. It also supports shrinking a resource in which case the freed space will be marked unallocated in the rman. - rman_first_free_region() and rman_last_free_region() return the start and end addresses for the first or last unallocated region in an rman, respectively. This can be used to determine by how much the resource backing an rman must be adjusted to accomodate an allocation request that does not fit into the existing rman. While here, document the rm_start and rm_end fields in struct rman, rman_is_region_manager(), the bound argument to rman_reserve_resource_bound(), and rman_init_from_resource().	2011-04-29 20:05:19 +00:00
jhb	08955ceac0	Change rman_manage_region() to actually honor the rm_start and rm_end constraints on the rman and reject attempts to manage a region that is out of range. - Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of ~0ul). - To preserve existing behavior, change rman_init() to set rm_start and rm_end to allow managing the full range (0 to ~0ul) if they are not set by the caller when rman_init() is called.	2011-04-29 18:41:21 +00:00
attilio	d685681d59	Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long. I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations). Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks	2011-04-28 16:02:05 +00:00
rstone	8a5b424a2c	If the 4BSD scheduler tries to schedule a thread that has been pinned or bound to an AP before SMP has started, the system will panic when we try to touch per-CPU state for that AP because that state has not been initialized yet. Fix this in the same way as ULE: place all threads in the global run queue before SMP has started. Reviewed by: jhb MFC after: 1 month	2011-04-26 20:34:30 +00:00
kib	cff99f5d10	Implement the delayed task execution extension to the taskqueue mechanism. The caller may specify a timeout in ticks after which the task will be scheduled. Sponsored by: The FreeBSD Foundation Reviewed by: jeff, jhb MFC after: 1 month	2011-04-26 11:39:56 +00:00
jeff	14f1aafe26	- Catch up to falloc() changes. - PHOLD() before using a task structure on the stack. - Fix a LOR between the sleepq lock and thread lock in _intr_drain().	2011-04-26 07:30:52 +00:00
rmacklem	872195caf9	Fix a LOR in vfs_busy() where, after msleeping, it would lock the mutexes in the wrong order for the case where the MBF_MNTLSTLOCK is set. I believe this did have the potential for deadlock. For example, if multiple nfsd threads called vfs_busyfs(), which calls vfs_busy() with MBF_MNTLSTLOCK. Thanks go to pho for catching this during his testing. Tested by: pho Submitted by: kib MFC after: 2 weeks	2011-04-23 11:22:48 +00:00
jh	91fe746b18	Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is no more need to add "noro" in vfs_donmount() to cancel "ro". This also fixes a problem of canceling options beginning with "no". For example, "noatime" didn't cancel "nonoatime". Thus it was possible that both "noatime" and "nonoatime" were active at the same time. Reviewed by: bde	2011-04-22 07:26:09 +00:00
mdf	597ae9f19b	Allow VOP_ALLOCATE to be iterative, and have kern_posix_fallocate(9) drive looping and potentially yielding. Requested by: kib	2011-04-19 16:36:24 +00:00
mdf	45c5f27863	Fix a copy/paste whitespace error.	2011-04-18 16:40:47 +00:00
mdf	b0f8474766	Regen.	2011-04-18 16:32:47 +00:00
mdf	9c9a32d97b	Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)	2011-04-18 16:32:22 +00:00
jilles	a860eeb5be	ktrace: Log the code for all signals (PSIG events). The code provides information on how the signal was generated. Formerly, the code was only logged for traps, much like only signal handlers for traps received a meaningful si_code before FreeBSD 7.0. In rare cases, no information is available and 0 is still logged. MFC after: 1 week	2011-04-17 14:38:11 +00:00
dchagin	95da67cdb8	Remove malloc(9) return value checks when M_WAITOK is used. MFC after: 2 Week	2011-04-16 16:20:51 +00:00
glebius	3a11690b71	Revert r194662, since it breaks ng_ksocket(4) and may break other socket consumers with alternate sb_upcall. PR: kern/154676 Submitted by: Arnaud Lacombe <lacombar gmail.com> MFC after: 7 days	2011-04-14 14:54:22 +00:00
pluknet	b1187e01aa	Remove stale M_ZOMBIE malloc type. This type is unused since embedding p_ru into struct proc. MFC after: 1 week	2011-04-14 14:25:47 +00:00
gavin	0b7551a5c9	Add a new DDB command, "show rmans", which will show the address and brief details of each rman header, but not the contents of all rman structures in the system. This is especially useful on platforms where some rmans have many thousands of entries in rmans, making scrolling through the output of "show all rman" impractical. Individual rmans can then be viewed including their contents with "show rman 0xaddr" as usual. Reviewed by: jhb	2011-04-13 19:10:56 +00:00
pluknet	e48732e3d4	Staticize malloc types. Approved by: lstewart MFC after: 1 week	2011-04-13 11:28:46 +00:00
lstewart	545f7c0ca7	Use the full and proper company name for Swinburne University of Technology throughout the source tree. Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days	2011-04-12 08:13:18 +00:00
trasz	e80aa2a5fe	Rename a misnamed structure field (hr_loginclass), and reorder priv(9) constants to match the order and naming of syscalls. No functional changes.	2011-04-10 18:35:43 +00:00
kib	1b5526fe98	Some callers of proc_reparent() already have the parent process locked. Detect the situation and avoid process lock recursion. Reported by: Fabian Keil <freebsd-listen fabiankeil de>	2011-04-10 17:07:02 +00:00
attilio	bacffe590f	Reintroduce the fix already discussed in r216805 (please check its history for a detailed explanation of the problems). The only difference with the previous fix is in Solution2: CPUBLOCK is no longer set when exiting from callout_reset_*() functions, which avoid the deadlock (leading to r217161). There is no need to CPUBLOCK there because the running-and-migrating assumption is strong enough to avoid problems there. Furthermore add a better !SMP compliancy (leading to shrinked code and structures) and facility macros/functions. Tested by: gianni, pho, dim MFC after: 3 weeks	2011-04-08 18:48:57 +00:00
trasz	a0192d37e6	Add RACCT_NOFILE accounting. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 19:13:04 +00:00
trasz	97c31dedde	Style fix. Submitted by: jhb@	2011-04-06 19:08:50 +00:00
trasz	d3c78eed8e	Add accounting for SysV-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 18:11:24 +00:00
jhb	96b1d8b6d7	Fix several places to ignore processes that are not yet fully constructed. MFC after: 1 week	2011-04-06 17:47:22 +00:00
trasz	d6f4192036	Add ucred pointer to the SysV-related memory structures. This is required for racct. Note that after this commit, ipcs(1) needs to be rebuilt. Otherwise, it will fail with "ipcs: sysctlbyname: kern.ipc.msqids: Cannot allocate memory". Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-06 16:59:54 +00:00
trasz	92bec9b84c	Add accounting for most of the memory-related resources. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-04-05 20:23:59 +00:00
trasz	fffd1b22a5	Add missing stubs.	2011-04-05 19:50:34 +00:00
pluknet	6d33997006	Remove malloc type M_NETADDR unused since splitting into vfs_subr.c and vfs_export.c. MFC after: 1 week	2011-04-04 16:23:01 +00:00
trasz	400f21cacb	Add accounting for RACCT_NPTS.	2011-04-02 15:02:42 +00:00
kib	eb730d92e4	After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9) and remove the falloc() version that lacks flag argument. This is done to reduce the KPI bloat. Requested by: jhb X-MFC-note: do not	2011-04-01 13:28:34 +00:00
kib	7c2eaa21fe	Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month	2011-04-01 11:16:29 +00:00
trasz	4c83b1bba4	Enable accounting for RACCT_NPROC and RACCT_NTHR. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-31 19:22:11 +00:00
trasz	596c078ed8	Notify racct when process credentials change. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-31 18:12:04 +00:00
fabient	e0588db8d2	Clearing the flag when preempting will let the preempted thread run too much time. This can finish in a scheduler deadlock with ping-pong between two threads. One sample of this is: - device lapic (to have a preemption point on critical_exit()) - options DEVICE_POLLING with HZ>1499 (to have lapic freq = hardclock freq) - running a cpu intensive task (that does not enter the kernel) - only one CPU on SMP or no SMP. As requested by jhb@ 4BSD have received the same type of fix instead of propagating the flag to the new thread. Reviewed by: jhb, jeff MFC after: 1 month	2011-03-31 13:59:47 +00:00
trasz	3adbd8337d	Regenerate.	2011-03-30 17:59:54 +00:00
trasz	2f99052d80	Add rctl. It's used by racct to take user-configurable actions based on the set of rules it maintains and the current resource usage. It also privides userland API to manage that ruleset. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-30 17:48:15 +00:00
kib	5c02dd1e9a	Provide compat32 shims for kldstat(2). Requested and tested by: jpaetzel MFC after: 1 week	2011-03-30 14:46:12 +00:00
trasz	8d3dbe6760	Remove pointless (always true) KASSERTs. Submitted by: pjd	2011-03-29 19:19:10 +00:00
trasz	b8d3e8755d	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)	2011-03-29 17:47:25 +00:00
kib	7b2d837e2c	Fix the check for vm_map_remove() error. Pointed out by: alc MFC after: 2 weeks	2011-03-28 19:44:54 +00:00
kib	0529b2424a	Trim white spaces, adjust style. MFC after: 2 weeks	2011-03-28 13:28:23 +00:00
kib	52182b9c4a	Handle zero length in copyout_unmap(). Submitted by: John Wehle <john feith com> MFC after: 2 weeks	2011-03-28 13:21:26 +00:00
kib	7b58e8da9b	Promote ksyms_map() and ksyms_unmap() to general facility copyout_map() and copyout_unmap() interfaces. Submitted by: John Wehle <john feith com>, nox MFC after: 2 weeks	2011-03-28 12:48:33 +00:00
jh	b6e26fa641	Fix some style issues in r219925. Reported by: bde MFC after: 1 month	2011-03-26 17:17:24 +00:00
kib	fc2bd01611	Add O_CLOEXEC flag to open(2) and fhopen(2). The new function fallocf(9), that is renamed falloc(9) with added flag argument, is provided to facilitate the merge to stable branch. Reviewed by: jhb MFC after: 1 week	2011-03-25 14:00:36 +00:00
jhb	c7ac62aecd	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week	2011-03-24 18:40:11 +00:00
jh	fc3bd08464	Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant occurrences of these options. It was possible that for example both "ro" and "rw" options became active concurrently. PR: kern/133614 Discussed on: freebsd-hackers MFC after: 1 month	2011-03-23 17:56:38 +00:00
alc	c84b8f6e0c	Modestly increase the maximum allowed size of the kmem map on i386. Also, express this new maximum as a fraction of the kernel's address space size rather than a constant so that increasing KVA_PAGES will automatically increase this maximum. As a side-effect of this change, kern.maxvnodes will automatically increase by a proportional amount. While I'm here ensure that this change doesn't result in an unintended increase in maxpipekva on i386. Calculate maxpipekva based upon the size of the kernel address space and the amount of physical memory instead of the size of the kmem map. The memory backing pipes is not allocated from the kmem map. It is allocated from its own submap of the kernel map. In short, it has no real connection to the kmem map. (In fact, the commit messages for the maxpipekva auto-sizing talk about using the kernel map size, cf. r117325 and r117391, even though the implementation actually used the kmem map size.) Although the calculation is now done differently, the resulting value for maxpipekva should remain almost the same on i386. However, on amd64, the value will be reduced by 2/3. This is intentional. The recent change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had the unnecessary side-effect of increasing maxpipekva. This change is effectively restoring maxpipekva on amd64 to its prior value. Eliminate init_param3() since it is no longer used.	2011-03-23 16:38:29 +00:00
jhb	df83f2ffe5	Small style fix.	2011-03-23 13:44:32 +00:00
trasz	359437c29d	Make UFS use PSARC/2010/029 NFSv4 ACL semantics by default, bringing it in line with ZFSv28. X-MFC-After: ZFSv28.	2011-03-22 19:52:29 +00:00
trasz	eb401e64c1	Move the code around so that libc behaviour does not depend on a variable that was supposed to be kernel-only. There should be no functional changes.	2011-03-22 17:44:07 +00:00
jeff	2d7d8c05e7	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
alc	e498ba0188	Update a comment. The sending process has not mapped the buffer pages since before r127501. Strictly speaking, the buffer pages are not "wired". They remain in the paging queues. However, they are pinned in memory using vm_page_hold().	2011-03-20 15:04:43 +00:00
ivoras	dc9c77949c	The hardware has caught up; improvements are now observed even at 128, but stay conservative and bump read_max to "only" 64 (it will probably be a good idea to increase this to 128 after the next major release).	2011-03-16 16:22:59 +00:00
avg	666906fcd7	add DTrace systrace support for linux32 and freebsd32 on amd64 syscalls This commits makes necessary changes in syscall/sysent generation infrastructure. PR: kern/152822 Submitted by: Artem Belevich <fbsdlist@src.cx> Reviewed by: jhb (ealier version) MFC after: 3 weeks	2011-03-12 08:51:43 +00:00
dchagin	69b8756d3d	Extend struct sysvec with new method sv_schedtail, which is used for an explicit process at fork trampoline path instead of eventhadler(schedtail) invocation for each child process. Remove eventhandler(schedtail) code and change linux ABI to use newly added sysvec method. While here replace explicit comparing of module sysentvec structure with the newly created process sysentvec to detect the linux ABI. Discussed with: kib MFC after: 2 Week	2011-03-08 19:01:45 +00:00
jhb	269e1daa8f	When constructing a new cpuset, apply the parent cpuset's mask to the new set's mask rather than the root mask. This was causing the root mask to be modified incorrectly. Reviewed by: jeff MFC after: 1 week	2011-03-08 14:18:21 +00:00
kib	4d0733e0f8	Do not assert buffer lock in VFS_STRATEGY() when kernel already paniced. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2011-03-08 11:50:59 +00:00
kib	f3c6442b93	The execution of the shebang script requires putting interpreter path, possible option and script path in the place of argv[0] supplied to execve(2). It is possible and valid for the substitution to be shorter then the argv[0]. Avoid signed underflow in this case. Submitted by: Devon H. O'Dell <devon.odell gmail com> PR: kern/155321 MFC after: 1 week	2011-03-06 22:59:30 +00:00
trasz	0d90d0ceb0	Temporarily revert r219272; it breaks acl_is_trivial_np(3).	2011-03-06 20:12:09 +00:00
dchagin	de259f37cb	Style(9) fix. Fix indentation in comment, double ';' in variable declaration. MFC after: 1 Week	2011-03-05 20:54:17 +00:00
dchagin	99b120c8f2	Partially reworked r219042. The reason for this is a bug at ktrops() where process dereferenced without having a lock. This might cause a panic if ktrace was runned with -p flag and the specified process exited between the dropping a lock and writing sv_flags. Since it is impossible to acquire sx lock while holding mtx switch to use asynchronous enqueuerequest() instead of writerequest(). Rename ktr_getrequest_ne() to more understandable name [1]. Requested by: jhb [1] MFC after: 1 Week	2011-03-05 20:36:42 +00:00
trasz	1618438630	Export login class information via kinfo and make it possible to view it using "ps -o class".	2011-03-05 14:41:49 +00:00
trasz	0525662d59	Regenerate.	2011-03-05 12:46:24 +00:00
trasz	62f6a13e39	Add two new system calls, setloginclass(2) and getloginclass(2). This makes it possible for the kernel to track login class the process is assigned to, which is required for RCTL. This change also make setusercontext(3) call setloginclass(2) and makes it possible to retrieve current login class using id(1). Reviewed by: kib (as part of a larger patch)	2011-03-05 12:40:35 +00:00
trasz	e78a2a669a	Make UFS use PSARC/2010/029 NFSv4 ACL semantics by default, just like ZFSv28 does. MFC after: 2 months	2011-03-04 19:53:07 +00:00
netchild	b66d64d436	- Add a FEATURE for capsicum (security_capabilities). - Rename mac FEATURE to security_mac. Discussed with: rwatson	2011-03-04 09:03:54 +00:00
trasz	758a56db95	Make "struct pts_softc" point to ucred instead of uidinfo. This is no-op, required for resource containers. Reviewed by: kib (as part of a larger patch), ed	2011-03-03 17:33:22 +00:00
jhb	d64a5d112c	Similar to 189574, properly handle subclasses of bus drivers when deleting a driver during kldunload. Specifically, recursively walk the tree of subclasses of a given driver attachment's bus device class detaching all instances of that driver for each class and its subclasses. Reported by: bschmidt Reviewed by: imp MFC after: 1 week	2011-03-01 14:43:37 +00:00
rwatson	f1981d366a	Continue introducing Capsicum capability mode support: If a system call wasn't listed in capabilities.conf, return ECAPMODE at syscall entry. Reviewed by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months	2011-03-01 13:32:07 +00:00
rwatson	9c02915234	Regenerate system call files following addition of cap_enter(2), cap_getmode(2), and capabilities.conf. Reviewed by: anderson Discussed with: benl, kris, pjd Obtained from: Capsicum Project Sponsored by: Google, Inc. MFC after: 3 months	2011-03-01 13:30:23 +00:00
rwatson	fa27828ce8	Continue to introduce Capsicum Capability Mode support: Add a new system call flag, SYF_CAPENABLED, which indicates that a particular system call is available in capability mode. Add a new configuration file, kern/capabilities.conf (similar files may be introduced for other ABIs in the future), which enumerates system calls that are available in capability mode. When a new system call is added to syscalls.master, it will also need to be added here (if needed). Teach sysent parts to use this file to set values for SYF_CAPENABLED for the native ABI. Reviewed by: anderson Discussed with: benl, kris, pjd Obtained from: Capsicum Project MFC after: 3 months	2011-03-01 13:28:27 +00:00
rwatson	6894aabcb5	Add initial support for Capsicum's Capability Mode to the FreeBSD kernel, compiled conditionally on options CAPABILITIES: Add a new credential flag, CRED_FLAG_CAPMODE, which indicates that a subject (typically a process) is in capability mode. Add two new system calls, cap_enter(2) and cap_getmode(2), which allow setting and querying (but never clearing) the flag. Export the capability mode flag via process information sysctls. Sponsored by: Google, Inc. Reviewed by: anderson Discussed with: benl, kris, pjd Obtained from: Capsicum Project MFC after: 3 months	2011-03-01 13:23:37 +00:00
dchagin	44c082b543	Introduce preliminary support of the show description of the ABI of traced process by adding two new events which records value of process sv_flags to the trace file at process creation/execing/exiting time. MFC after: 1 Month.	2011-02-25 22:05:33 +00:00
dchagin	049c9dd26f	ktrace_resize_pool() locking slightly reworked: 1) do not take a lock around the single atomic operation. 2) do not lose the invariant of lock by dropping/acquiring ktrace_mtx around free() or malloc(). MFC after: 1 Month.	2011-02-25 22:03:28 +00:00
netchild	b8896acc71	Make the description of the feature consistent with another similar description for another feature. Noticed by: trasz	2011-02-25 12:46:43 +00:00
netchild	cc4128c6b1	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project	2011-02-25 10:11:01 +00:00
pluknet	b7b945fd78	Clean up the now unused #include statement. Approved by: kib (mentor) MFC after: 1 week X-MFC with: r218972	2011-02-23 18:22:40 +00:00
kib	b083c8ec96	Move the max_threads_per_proc and max_threads_hits variables to the file where they are used. Declare the kern.threads sysctl node at the same location. Since no external use for the variables exists, make them static. Discussed with: dchagin MFC after: 1 week	2011-02-23 13:50:24 +00:00
jhb	5d87a422b4	Revert previous change, the existing check was correct. Pointy hat to: jhb	2011-02-23 13:25:42 +00:00
jhb	3eb951ea57	Expose the umtx_key structure and API to the rest of the kernel. MFC after: 3 days	2011-02-23 13:19:14 +00:00
jhb	902a44bea0	Fix off-by-one error in check against max_threads_per_proc. Submitted by: arundel MFC after: 1 week	2011-02-23 12:56:25 +00:00
brucec	6d9b42b486	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
jh	d4c0c6675e	Don't restore old mount options and flags if VFS_MOUNT(9) succeeds but vfs_export() fails. Restoring old options and flags after successful VFS_MOUNT(9) call may cause the file system internal state to become inconsistent with mount options and flags. Specifically the FFS super block fs_ronly field and the MNT_RDONLY flag may get out of sync. PR: kern/133614 Discussed on: freebsd-hackers	2011-02-19 14:27:14 +00:00
mdf	baf9dec697	Modify kdb_trap() so that it re-calls the dbbe_trap function as long as the debugger back-end has changed. This means that switching from ddb to gdb no longer requires a "step" which can be dangerous on an already-crashed kernel. Also add a capability to get from the gdb back-end back to ddb, by typing ^C in the console window. While here, simplify kdb_sysctl_available() by using sbuf_new_for_sysctl(), and use strlcpy() instead of strncpy() since the strlcpy semantic is desired. MFC after: 1 month	2011-02-18 22:25:11 +00:00
bz	b9b7d3e93a	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
bz	51beb8d124	Mfp4 CH=177256: Catch a set vnet upon return to user space. This usually means return paths with CURVNET_RESTORE() missing. If VNET_DEBUG is turned on we can even tell the function that did the CURVNET_SET() which is really helpful; else we print "N/A". Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 11 days	2011-02-14 20:49:37 +00:00
deischen	cc0a83ec21	Allow the SO_SETFIB socket option to select the default (0) routing table. Reviewed by: julian	2011-02-13 00:14:13 +00:00
alc	060dcf42aa	Retire VFS_BIO_DEBUG. Convert those checks that were still valid into KASSERT()s and eliminate the rest. Replace excessive printf()s and a panic() in bufdone_finish() with a KASSERT() in vm_page_io_finish(). Reviewed by: kib	2011-02-12 01:00:00 +00:00
jmallett	96da23f472	With smp_topo_none, set cg_mask to all_cpus rather than setting the mp_ncpus low bits. Submitted by: Bhanu Prakash Reviewed by: jeffr	2011-02-11 22:43:10 +00:00

1 2 3 4 5 ...

12291 Commits