freebsd-skq

Author	SHA1	Message	Date
jh	c2cb836190	execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege. Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately. PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)	2010-08-30 16:30:18 +00:00
kib	8505815b26	Regen	2010-08-30 14:26:02 +00:00
kib	0a6b8011f4	Make the syscalls reserved for AFS usable by OpenAFS port. Submitted by: Benjamin Kaduk <kaduk mit edu> MFC after: 2 weeks	2010-08-30 14:24:44 +00:00
kib	9d21e17f07	For some file types, select code registers two selfd structures. E.g., for socket, when specified POLLIN\|POLLOUT in events, you would have one selfd registered for receiving socket buffer, and one for sending. Now, if both events are not ready to fire at the time of the initial scan, but are simultaneously ready after the sleep, pollrescan() would iterate over the pollfd struct twice. Since both times revents is not zero, returned value would be off by one. Fix this by recalculating the return value in pollout(). PR: kern/143029 MFC after: 2 weeks	2010-08-28 17:42:08 +00:00
pjd	bc73fabf27	There is a bug in vfs_allocate_syncvnode() failure handling in mount code. Actually it is hard to properly handle such a failure, especially in MNT_UPDATE case. The only reason for the vfs_allocate_syncvnode() function to fail is getnewvnode() failure. Fortunately it is impossible for current implementation of getnewvnode() to fail, so we can assert this and make vfs_allocate_syncvnode() void. This in turn free us from handling its failures in the mount code. Reviewed by: kib MFC after: 1 month	2010-08-28 08:57:15 +00:00
pjd	43af1b0877	Run all tasks from a proper context, with proper priority, etc. Reviewed by: jhb MFC after: 1 month	2010-08-28 08:38:03 +00:00
kib	65295a82b8	Fix typo. Submitted by: Ben Kaduk <minimarmot gmail com>	2010-08-26 11:20:57 +00:00
brian	e098f7b033	If we read zero bytes from the directory, early out with ENOENT rather than forging ahead and interpreting garbage buffer content and dirent structures. This change backs out r211684 which was essentially a no-op. MFC after: 1 week	2010-08-25 18:09:51 +00:00
davidxu	86cb0861ef	If a thread is removed from umtxq while sleeping, reset error code to zero, this gives userland a better indication that a thread needn't to be cancelled.	2010-08-25 03:14:32 +00:00
davidxu	22bc7d14ad	Optimize thr_suspend, if timeout is zero, don't call msleep, just return immediately.	2010-08-24 07:29:55 +00:00
davidxu	6616b254f2	- According to specification, SI_USER code should only be generated by standard kill(). On other systems, SI_LWP is generated by lwp_kill(). This will allow conforming applications to differentiate between signals generated by standard events and those generated by other implementation events in a manner compatible with existing practice. - Bump __FreeBSD_version	2010-08-24 07:22:24 +00:00
imp	50d4a3193c	This should really be MACHINE not MACHINE_ARCH, and is this Makefile even used?	2010-08-23 06:22:35 +00:00
brian	89b2d8bbb4	uio_resid isn't updated by VOP_READDIR for nfs filesystems. Use the uio_offset adjustment instead to calculate a correct *len. Without this change, we run off the end of the directory data we're reading and panic horribly for nfs filesystems. MFC after: 1 week	2010-08-23 05:33:31 +00:00
rpaulo	dda24289cb	Call the systrace_probe_func() when the error value. Sponsored by: The FreeBSD Foundation	2010-08-22 11:30:49 +00:00
rpaulo	ea11ba6788	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
rpaulo	6f62630bc2	Bump KDTRACE_THREAD_ZERO and use M_ZERO as a malloc flag instead of calling bzero. Sponsored by: The FreeBSD Foundation	2010-08-22 11:09:53 +00:00
rpaulo	a34abf7c98	Fix style issues. Sponsored by: The FreeBSD Foundation	2010-08-22 11:08:18 +00:00
davidxu	84d25462c9	make sure thread lock is locked.	2010-08-20 23:51:34 +00:00
jhb	d4890c88b0	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days	2010-08-20 19:46:50 +00:00
davidxu	89f466d2b2	If thread set a TDP_WAKEUP for itself, clears the flag and returns EINTR immediately, this is used for implementing reliable pthread cancellation.	2010-08-20 04:28:30 +00:00
jhb	d02cab2556	Remove unused KTRACE includes.	2010-08-19 16:41:27 +00:00
jhb	2f662f7a9c	There isn't really a need to hold the ktrace mutex just to read the value of p_traceflag that is stored in the kinfo_proc structure. It is still racey even with the lock and the code will read a consistent snapshot of the flag without the lock.	2010-08-19 16:40:30 +00:00
jhb	faa167a723	Fix a whitespace nit and remove a questioning comment. STAILQ_CONCAT() does require the STAILQ the existing list is being added to to already be initialized (it is CONCAT() vs MOVE()).	2010-08-19 16:38:58 +00:00
jhb	d64a4df941	Keep the process locked when calling ktrops() or ktrsetchildren() instead of dropping the lock only to immediately reacquire it.	2010-08-17 21:34:19 +00:00
kib	d9f088a03e	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
pjd	120209c66c	Simplify taskqueue_drain() by using proved macros.	2010-08-13 19:20:35 +00:00
gibbs	f5039e4d7d	Allow interrupt driven config hooks to be registered from config hook callbacks. Interrupt driven configuration hooks serve two purposes: they are a mechanism for registering for a callback that is invoked once interrupt services are available, and they hold off root device selection so long as any configuration hooks are still active. Before this change, it was not possible to safely register additional hooks from the context of a configuration hook callback. The need for this feature arises when interrupts are required to discover new devices (e.g. access to the XenStore to find para-virtualized devices) which in turn also require the ability to hold off root device selection until some lengthy, interrupt driven, configuration task has completed (e.g. Xen front/back device driver negotiation). More specifically, the mutex protecting the list of active configuration hooks is never held during a callback, and static information is used to ensure proper ordering and only a single callback to each hook even when faced with registration or removal of a hook during an active run. Sponsored by: Spectra Logic Corporation MFC after: 1 week.	2010-08-12 19:50:40 +00:00
gibbs	6b6ab892f9	Properly indent a continue statement. No functional changes.	2010-08-12 19:26:27 +00:00
jkim	f9341f06d7	Add the half of time-of-day clock resolution when we adjust system time from time-of-day clock or vice versa. For x86 systems, RTC resolution is one second and we used to lose up to one second whenever we initialize system time from RTC or write system time back to RTC. With this change, margin of error per conversion is roughly between -0.5 and +0.5 second rather than between -1 and 0 second. Note that it does not take care of errors from getnanotime(9) (which is up to 1/hz second) or CLOCK_GETTIME() latency. These are just too expensive to correct and it is not worthy of the cost.	2010-08-12 17:17:05 +00:00
jkim	1974c6514b	Provide description for 'machdep.disable_rtc_set' sysctl. Clean up style(9) nits. Remove a redundant return statement and an unnecessary variable.	2010-08-12 16:13:24 +00:00
kib	ade28bdd40	The buffers b_vflags field is not always properly protected by bufobj lock. If b_bufobj is not NULL, then bufobj lock should be held when manipulating the flags. Not doing this sometimes leaves BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to stuck indefinitely in "getbuf" sleep, waiting for background write to finish which is not actually performed. Add BO_LOCK() in the cases where it was missed. In collaboration with: pho Tested by: bz Reviewed by: jeff MFC after: 1 month	2010-08-12 08:36:23 +00:00
mdf	0737955344	Rework memguard(9) to reserve significantly more KVA to detect use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month	2010-08-11 22:10:37 +00:00
ivoras	9cecaf1c60	Fix (hopefully) the spelling of "queuing." Submitted by: bf1783 at gmail com	2010-08-09 23:32:37 +00:00
ivoras	191f678b27	Bumping the read-ahead count once more, to value equivalent to 512 KiB on most system, based on benchmark results on a low-end fibre channel SAN under VMWare: vfs.read_max read performance 8 (historical default) 83 MB/s 16 (recent bump) 131 MB/s 32 (this version) 152 MB/s 64 157 MB/s (results are +/- 3 MB/s) As read-ahead is heuristic, based on past IO requests, it shouldn't be problematic. The new default is still smaller then in other OSes.	2010-08-09 22:56:10 +00:00
ivoras	fa067e3c30	Elaborate on how hirunningspace was chosen.	2010-08-09 22:22:46 +00:00
gavin	dbc7cd5ae9	Add descriptions to a handful of sysctl nodes. PR: kern/148580 Submitted by: Galimov Albert <wtfcrap mail.ru> MFC after: 1 week	2010-08-09 14:48:31 +00:00
attilio	307b2c04a2	The r208165 fixed a bug related to unsigned integer overflowing for the number of CPUs detection. However, that was not mention at all, the problem was not reported, the patch has not been MFCed and the fix is mostly improper. Fix the original overflow (caused when 32 CPUs must be detected) by just using a different mathematical computation (it also makes more explicit the size of operands involved, which is good in the moment waiting for a more complete support for a large number of CPUs). PR: kern/148698 Submitted by: Joe Landers <jlanders at vmware dot com> Tested by: gianni MFC after: 10 days	2010-08-09 00:23:57 +00:00
jamie	4e0690ba81	Back out r210974. Any convenience of not typing "persist" is outweighed by the possibility of unintended partially-formed jails.	2010-08-08 23:22:55 +00:00
ivoras	252207bbfb	To help with sequential read UFS performance on modern systems, increase the vfs.read_max default. For most systems this means going from 128 KiB to 256 KiB, which is still very conservative and lower than what most other operating systems use, but as a sane default should not interfere much with existing systems. For systems with RAID volumes and/or virtualization envirnments, where read performance is very important, increasing this sysctl tunable to 32 or even more will demonstratively yield additional performance benefits. If MAXPHYS ever gets bumped up, it will probably be a good idea to slave read_max to it.	2010-08-07 18:30:10 +00:00
tuexen	542f657a7f	Fix a bug where MSG_TRUNC was not returned in all necessary cases for SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could not be copied to the application. If some data was left in the last mbuf, it was correctly discarded, but MSG_TRUNC was not set. Reviewed by: bz MFC after: 3 weeks	2010-08-07 17:57:58 +00:00
jamie	37e8c8fb79	Implicitly make a new jail persistent if it's set not to attach. MFC after: 3 days	2010-08-06 22:04:18 +00:00
jhb	19ddbf5c38	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
csjp	1e529a8eb9	Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixes the virtualization detection successfully disabling the clflush instruction. This fixes insta-panics for XEN hvm users when the hw.clflush_disable tunable is -1 or 0 (-1 by default). Discussed with: jhb	2010-08-06 15:04:40 +00:00
kib	7c864123d4	Add "show cdev" ddb command. In collaboration with: pho MFC after: 1 month	2010-08-06 09:44:01 +00:00
kib	ba7ee96f4a	Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month	2010-08-06 09:42:15 +00:00
alc	b6ec5a5f0a	In order for MAXVNODES_MAX to be an "int" on powerpc and sparc, we must cast PAGE_SIZE to an "int". (Powerpc and sparc, unlike the other architectures, define PAGE_SIZE as a "long".) Submitted by: Andreas Tobler	2010-08-04 05:09:02 +00:00
alc	329f9f0435	Update the "desiredvnodes" calculation. In particular, make the part of the calculation that is based on the kernel's heap size more conservative. Hopefully, this will eliminate the need for MAXVNODES_MAX, but for the time being set MAXVNODES_MAX to a large value. Reviewed by: jhb@ MFC after: 6 weeks	2010-08-02 21:33:36 +00:00
rpaulo	1c3476a3fa	Bump the witness pendlist to 768 to accomodate the increased number of spinlocks.	2010-07-29 16:13:26 +00:00
mdf	6857471cf3	Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)	2010-07-28 15:36:12 +00:00
alc	55426fcc55	The interpreter name should no longer be treated as a buffer that can be overwritten. (This change should have been included in r210545.) Submitted by: kib	2010-07-28 04:47:40 +00:00
alc	256c63de28	Introduce exec_alloc_args(). The objective being to encapsulate the details of the string buffer allocation in one place. Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string. Reviewed by: kib	2010-07-27 17:31:03 +00:00
alc	02c0473d35	Change the order in which the file name, arguments, environment, and shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space. Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string. Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.) Reviewed by: kib	2010-07-25 17:43:38 +00:00
alc	0c709bf109	Eliminate a little bit of duplicated code.	2010-07-23 18:58:27 +00:00
avg	b44b5ccee0	completely ignore zero-sized elf sections in modules of elf object type (amd64) Current code doesn't check size of elf sections and may perform needless actions of zero-sized memory allocation and similar. The bigger issue is that alignment requirement of a zero-sized section gets effectively applied to the next section if it has smaller alignment requirement. But other tools, like gdb and consequently kgdb, completely ignore zero-sized sections and thus may map symbols to addresses differently. Zero-sized sections are not typical in general. Their typical (only, even) cause in FreeBSD modules is inline assembly that creates custom sections which is found in pcpu.h and vnet.h. Mere inclusion of one of those header files produces a custom section in elf output. If there is no actual use for the section in a given module, then the section remains empty. Better solution is to avoid creating zero-sized sections altogether, which is in plans. Preloaded modules are handled in boot code (load_elf_obj.c), while dynamically loaded modules are handled by kernel (link_elf_obj.c). Based on code by: np MFC after: 3 weeks	2010-07-23 17:07:51 +00:00
avg	0152f7748b	cpufreq: allocate long-lived buffer for handling of sysctl requests At present the cpufreq sysctl handler for current level setting would allocate and deallocate a temporary buffer of 24KB even to handle a read-only query. This puts unnecessary load on memory subsystem when current level is checked frequently, e.g. when the likes of powerd and system monitoring software are running. Change the strategy to allocating a long-lived buffer for handling the requests. Reviewed by: njl MFC after: 2 weeks	2010-07-23 16:46:42 +00:00
ivoras	dd4be6368d	Make lorunningspace catch up with hirunningspace. While there, add comment about the magic numbers. Prodded by: alc	2010-07-23 12:30:29 +00:00
mdf	e8106ea76c	Remove unused variable that snuck in during development. Approved by: zml (mentor)	2010-07-22 17:23:43 +00:00
mdf	fa23fa820a	Fix taskqueue_drain(9) to not have false negatives. For threaded taskqueues, more than one task can be running simultaneously. Also make taskqueue_run(9) static to the file, since there are no consumers in the base kernel and the function signature needs to change with this fix. Remove mention of taskqueue_run(9) and taskqueue_run_fast(9) from the taskqueue(9) man page. Reviewed by: jhb Approved by: zml (mentor)	2010-07-22 16:41:09 +00:00
kib	9ac2754b6d	When compat32 binary asks for the value of hw.machine_arch, report the name of 32bit sibling architecture instead of the host one. Do the same for hw.machine on amd64. Add a safety belt debug.adaptive_machine_arch sysctl, to turn the substitution off. Reviewed by: jhb, nwhitehorn MFC after: 2 weeks	2010-07-22 09:13:49 +00:00
trasz	a5239fb269	Remove spurious '/*-' marks and fix some other style problems. Submitted by: bde@	2010-07-22 05:42:29 +00:00
mav	f7b270cbd0	Use proper sysctl type (quad) for et_frequency. It fixes output on sparc64.	2010-07-21 12:23:49 +00:00
attilio	800d46f6e4	Probabilly defaulting to KTR_GEN is not the right decision when KTR_MASK is not defined at all because KTR_GEN is still a valid class and some traces may fit in. Default to 0, instead, and block any tracing. As long as this is a POLA violation (some thirdy-part code, even if that may be a questionable choice, could be rely on that feature) a MFC possibility might be carefully evaluated. Sponsored by: Sandvine Incorporated	2010-07-21 10:14:04 +00:00
mav	0ea74c96a2	Fix several un-/signedness bugs of r210290 and r210293. Add one more check.	2010-07-20 15:48:29 +00:00
ivoras	d9b793e64d	Fix expression style. Prodded by: jhb	2010-07-20 13:59:51 +00:00
mav	1021ed9c1f	Extend timer driver API to report also minimal and maximal supported period lengths. Make MI wrapper code to validate periods in request. Make kernel clock management code to honor these hardware limitations while choosing hz, stathz and profhz values.	2010-07-20 10:58:56 +00:00
davidxu	cdb7adc908	Fix function name in error messages.	2010-07-20 02:23:12 +00:00
trasz	3e54021797	Revert r210225 - turns out I was wrong; the "/*-" is not license-only thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@	2010-07-18 20:57:53 +00:00
trasz	935237a66a	The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright occurences from sys/sys/ and sys/kern/.	2010-07-18 20:23:10 +00:00
trasz	dd1ffe6ba1	Remove outdated comment and move part of it into more applicable place.	2010-07-18 19:29:12 +00:00
ivoras	56cd1257b0	In keeping with the Age-of-the-fruitbat theme, scale up hirunningspace on machines which can clearly afford the memory. This is a somewhat conservative version of the patch - more fine tuning may be necessary. Idea from: Thread on hackers@ Discussed with: alc	2010-07-18 10:15:33 +00:00
jhb	96d598c33f	Retire td_syscalls now that it is no longer needed.	2010-07-15 20:24:37 +00:00
ivoras	3fb9f87a34	A cosmetic change - don't output empty <flags>.	2010-07-15 13:46:30 +00:00
mav	bd622e7c20	Rename timeevents.c to kern_clocksource.c. Suggested by: jhb@	2010-07-14 18:43:27 +00:00
jhb	fb1e0aa66f	- Document layout of KTR_STRUCT payload in a comment. - Simplify ktrstruct() calling convention by having ktrstruct() use strlen() rather than requiring the caller to hand-code the length of constant strings. MFC after: 1 month	2010-07-14 17:38:01 +00:00
mav	b8b00841c9	Move timeevents.c to MI code, as it is not x86-specific. I already have it working on Marvell ARM SoCs, and it would be nice to unify timer code between more platforms.	2010-07-14 13:31:27 +00:00
cperciva	14d1adbf2c	Correctly copy the M_RDONLY flag when duplicating a reference to an mbuf external buffer. Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-10:07.mbuf	2010-07-13 02:45:17 +00:00
jkim	06b6c2769b	Use type-specific inline function imax() instead of deprecated macro MAX(). Prodded by: bde	2010-07-12 15:32:45 +00:00
alc	db4ca9f5c2	Change the implementation of vm_hold_free_pages() so that it performs at most one call to pmap_qremove(), and thus one TLB shootdown, instead of one call and TLB shootdown per page. Simplify the interface to vm_hold_free_pages(). MFC after: 3 weeks	2010-07-11 20:11:44 +00:00
mav	d760bd51fb	Remove interval validation from cpu_tick_calibrate(). As I found, check was needed at preliminary version of the patch, where number of CPU ticks was divided strictly on 16 seconds. Final code instead uses real interval duration, so precise interval should not be important. Same time aliasing issues around second boundary causes false positives, periodically logging useless "t_delta ... too long/short" messages when HZ set below 256.	2010-07-11 16:47:45 +00:00
alc	7c09dc242c	Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently, the maintenance of vm_pageout_deficit can be localized to just two places: vm_page_alloc() and vm_pageout_scan(). This change also corrects an off-by-one error in the maintenance of vm_pageout_deficit. Historically, the buffer cache functions, allocbuf() and vm_hold_load_pages(), have not taken into account that vm_page_alloc() already increments vm_pageout_deficit by one. Reviewed by: kib	2010-07-09 19:38:30 +00:00
jhb	f338f6d0f8	Accidentally committed an older version of this comment rather than the final one.	2010-07-09 13:59:53 +00:00
jhb	7e3b216a37	Refine a comment. Reviewed by: bde	2010-07-09 13:53:25 +00:00
jh	d171161918	Remove redundant high >= 0. Reported by: rstone	2010-07-09 10:57:55 +00:00
jkim	93b88a93da	Implement optional 'precision' for numbers. Previously, it was parsed but ignored. Some third-party modules (e.g., APCICA) prefer this format over zero padding flag '0'.	2010-07-08 22:13:23 +00:00
jhb	1f4cf66ed2	- Various style and whitespace fixes. - Make sugid_coredump and kern_logsigexit private to kern_sig.c. Submitted by: bde (partially) MFC after: 1 month	2010-07-08 19:15:26 +00:00
jh	f673b7098a	Assert that low and high are >= 0. The allocator doesn't support the negative range.	2010-07-08 16:53:19 +00:00
attilio	865de58a04	- Simplify logic in handling ticks wrap-up - Fix a bug where thread may be in sleeping state but the wchan won't be set, leading to an empty container for sleepq_type(). [0] Sponsored by: Sandvine Incorporated [0] Submitted by: Bryan Venteicher <bryanv at daemoninthecloset dot org> MFC after: 3 days X-MFC: 209577	2010-07-07 12:00:11 +00:00
kib	15d16124c2	In revoke(), verify that VCHR vnode indeed belongs to devfs. Found and tested by: pho MFC after: 1 week	2010-07-06 18:20:49 +00:00
ed	1075ceb3e2	Fix a race condition, where a TTY could be destroyed twice. There are special cases where tty_rel_free() can be called twice in a row, namely when closing and revoking the TTY at the same moment. Only call destroy_dev_sched_cb() once. Reported by: Jeremie Le Hen MFC after: 1 week	2010-07-06 08:56:34 +00:00
kib	15a394fbba	Add the ability for the allocflag argument of the vm_page_grab() to specify the increment of vm_pageout_deficit when sleeping due to page shortage. Then, in allocbuf(), the code to allocate pages when extending vmio buffer can be replaced by a call to vm_page_grab(). Suggested and reviewed by: alc MFC after: 2 weeks	2010-07-05 21:13:32 +00:00
jh	b0744cfb8d	Extend the kernel unit number allocator for allocating specific unit numbers. This change adds a new function alloc_unr_specific() which returns the requested unit number if it is free. If the number is already allocated or out of the range, -1 is returned. Update alloc_unr(9) manual page accordingly and add a MLINK for alloc_unr_specific(9). Discussed on: freebsd-hackers	2010-07-05 16:23:55 +00:00
kib	4de7ec3dbb	Obey sv_syscallnames bounds in syscallname(). Reported and tested by: pho	2010-07-04 18:16:17 +00:00
kib	22a31bdc6e	Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused debugee stop. The change should keep the ABI. Take care of compat32. Discussed with: davidxu, jhb MFC after: 2 weeks	2010-07-04 11:48:30 +00:00
alc	afd002fb75	Use vm_page_next() instead of vm_page_lookup() in exec_map_first_page() because vm_page_next() is faster.	2010-07-02 15:50:30 +00:00
jhb	de324e256c	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.	2010-06-30 18:03:42 +00:00
jhb	738cd61a3d	Update comment for tdsignal() -> tdsendsignal() rename. Forgot to include this in 209592.	2010-06-30 18:00:45 +00:00
alc	df23299909	Improve bufdone_finish()'s handling of the bogus page. Specifically, if one or more mappings to the bogus page must be replaced, call pmap_qenter() just once. Previously, pmap_qenter() was called for each mapping to the bogus page. MFC after: 3 weeks	2010-06-30 04:52:42 +00:00
jhb	44b49a3eaa	Send SIGPIPE to the thread that issued the offending system call rather than to the entire process. Reported by: Anit Chakraborty Reviewed by: kib, deischen (concept) MFC after: 1 week	2010-06-29 20:44:19 +00:00
jhb	df7979cf76	Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process. Reviewed by: kib	2010-06-29 20:41:52 +00:00
dougb	ebed8715b6	If i is going to be used in the loop unconditionally the declaration has to be unconditional as well. Conical head covering to: kib	2010-06-29 01:04:24 +00:00
kib	180cca1c2d	Regenerate	2010-06-28 18:17:21 +00:00
kib	2ab2a361d3	Despite system call deregistration drains the threads executing System V shm syscalls, and initial check for the number of allocated segments in the module deinitialization code, the following might happen: after the check for active segment, while waiting for threads to leave some other syscall, shmget(2) is called. Then, we can end up with the shared segment that cannot be detached since sysvshm module is unloaded. Prevent the leak by rechecking and disclaiming a reference to the vm object owned by sysvshm module, that might have grown during the drain. Tested by: pho Reviewed by: jhb MFC after: 1 month	2010-06-28 18:12:42 +00:00
kib	b6d8416eac	Count number of threads that enter and leave dynamically registered syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading. Reviewed by: jhb Tested by: pho MFC after: 1 month	2010-06-28 18:06:46 +00:00
attilio	f818dc9368	Fix a lock leak in the deadlock resolver in case the ticks counter wrapped up. Sponsored by: Sandvine Incorporated Submitted by: pluknet <pluknet at gmail dot com> Reported by: Anton Yuzhaninov <citrin at citrin dot ru> Reviewed by: jhb MFC after: 3 days	2010-06-28 17:45:00 +00:00
jh	0a8e6bb738	Correct a comment typo.	2010-06-27 12:19:09 +00:00
pjd	6ff3cc04b0	Correct arguments order.	2010-06-26 21:44:45 +00:00
tuexen	d27c0f60a0	* Do not dereference a NULL pointer when calling an SCTP send syscall not providing a destination address and using ktrace. * Do not copy out kernel memory when providing sinfo for sctp_recvmsg(). Both bug where reported by Valentin Nechayev. The first bug results in a kernel panic. MFC after: 3 days.	2010-06-26 19:26:20 +00:00
nwhitehorn	ecf1995ac7	Reverse the logic of the if statement that sets the default value of HZ; the list of 1000 Hz platforms was getting unwieldy. Suggested by: marcel	2010-06-24 00:27:20 +00:00
nwhitehorn	da5a28c706	Move default HZ from 100 to 1000 on powerpc. Reviewed by: marcel MFC after: 2 weeks	2010-06-23 23:26:14 +00:00
kib	6375d4e4db	Remove the support for int13 FPU exception reporting on i386. It is believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)	2010-06-23 11:12:58 +00:00
mav	a21b0b9d72	"time lock" is no longer a spin-lock since r209371. Reported by: kib@	2010-06-21 21:15:51 +00:00
ed	76489ac1ea	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.	2010-06-21 09:55:56 +00:00
kib	107ec73aad	Do not report a stack garbage as the old value for debug.ncores sysctl. Reported by: brucec	2010-06-21 09:51:25 +00:00
mav	d1175426d7	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
pjd	b3024a4af9	Backout r207970 for now, it can lead to deadlocks. Reported by: kan MFC after: 3 days	2010-06-17 17:39:51 +00:00
rpaulo	a8c5bafed5	Make DTrace syscall provider work again by including opt_kdtrace.h here.	2010-06-17 17:34:45 +00:00
jh	8a203f841c	- Fix compilation of the subr_unit.c user space test program. - Use %zu for size_t in a few format strings.	2010-06-17 16:12:06 +00:00
avg	9f2d4c3357	lock_profile_release_lock: do not compare unsigned with zero Found by: Coverity Prevent CID: 3660 Reviewed by: jhb MFC after: 2 weeks	2010-06-17 10:15:13 +00:00
ed	70171ee94e	Remove the unit argument from the recently added make_dev_p(). New code that creates character devices shouldn't use device unit numbers, but only si_drv[12] to hold pointer to per-device data. Make this function more future proof by removing the unit number argument. Discussed with: kib	2010-06-17 08:49:31 +00:00
jh	1c0174e29a	Correct the function name in a KASSERT.	2010-06-16 16:02:17 +00:00
jkim	14f08fd627	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
mav	ea954fa396	Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI (FSB interrupts) to be used by non-PCI devices, such as HPET.	2010-06-14 07:10:37 +00:00
kib	bbe91d0e0f	Add another variation of make_dev(9), make_dev_p(9), that is allowed to fail and can return useful error code. Requested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:22:39 +00:00
kib	9e98593ebc	When make_dev_credf(MAKEDEV_WAITOK) is called, use devctl_notify_f(M_WAITOK) for devfs notifications. Suggested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:21:25 +00:00
kib	2605a178f6	Add modifications of devctl_notify(9) functions that take flags. Use flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for the memory allocation. As Warner noted, allowing the functions to sleep might cause reordering of the queued notifications. Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:20:38 +00:00
avg	324886002f	fix a few cases where a string is passed via format argument instead of via %s Most of the cases looked harmless, but this is done for the sake of correctness. In one case it even allowed to drop an intermediate buffer. Found by: clang MFC after: 2 week	2010-06-11 19:27:21 +00:00
jhb	9b74a62d73	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
mdf	09830f0c6f	Add INVARIANTS checking that numfreebufs values are sane. Also add a per-buf flag to catch if a buf is double-counted in the free count. This code was useful to debug an instance where a local patch at Isilon was incorrectly managing numfreebufs for a new buf state. Reviewed by: jeff Approved by: zml (mentor)	2010-06-11 17:03:26 +00:00
ivoras	5a89fd1114	In another move to join with the age of the Fruitbat, increase SYSV shared resources defaults beyond absolute minimums. The new values are chosen mostly by magic. They are still fairly small and will need increasing for large installations (especially SHMMAX). However, they are now enough to e.g. start PostgreSQL installations with ~~300 users and nearly 512 MB of shared buffers. Reviewed by: A short discussion on hackers@	2010-06-11 09:27:33 +00:00
mav	b8bbab8130	Store interrupt trap frame into struct thread. It allows interrupt handler to obtain both trap frame and opaque argument submitted on registrction. After kernel and all drivers get used to it, legacy hack can be removed. Reviewed by: jhb@	2010-06-10 16:14:05 +00:00
ivoras	04624ee0ea	Unconfuse THREAD and SMT flags	2010-06-10 11:48:14 +00:00
ivoras	7937017072	Cosmetic change to XML - less ugly newlines	2010-06-10 11:01:17 +00:00
kib	317abde372	Reorganize the code in bdwrite() which handles move of dirtiness from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks	2010-06-08 17:54:28 +00:00
jhb	72cdd6ef99	Fix a sign bug that caused adaptive spinning in sx_xlock() to not work properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days	2010-06-08 16:17:47 +00:00
mav	4363e5b2ce	Call BUS_PROBE_NOMATCH() when device detached due to driver unload. This allows bus to power-down device when driver unloaded on-flight.	2010-06-07 18:47:53 +00:00
cperciva	4adc6d09d8	Declare ip6 as (struct in6_addr ) instead of (struct in_addr ). This is a harmless bug since we never actually use ip6 as anything other than an opaque pointer. Found with: Coverty Prevent(tm) CID: 4319 MFC after: 1 month	2010-06-04 14:38:24 +00:00
jhb	16dab63fe9	Assert that the thread lock is held in sched_pctcpu() instead of recursively acquiring it. All of the current callers already hold the lock. MFC after: 1 month	2010-06-03 16:02:11 +00:00
trasz	253bf0319d	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3688	2010-06-03 13:45:27 +00:00
trasz	9985f972fd	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3684	2010-06-03 13:43:58 +00:00
trasz	cbfca8b888	The acl_cnt field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3683	2010-06-03 13:41:55 +00:00
kib	2ba33ab98e	Sometimes vnodes share the lock despite being different vnodes on different mount points, e.g. the nullfs vnode and the covered vnode from the lower filesystem. In this case, existing assertion in vop_rename_pre() may be triggered. Check for vnode locks equiality instead of the vnodes itself to not trip over the situation. Submitted by: Mikolaj Golub <to.my.trociny@gmail.com> Tested by: pho MFC after: 2 weeks	2010-06-03 10:20:08 +00:00
alc	24ac89cf14	Minimize the use of the page queues lock for synchronizing access to the page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.	2010-06-02 15:46:37 +00:00
kib	5e1e617f5e	Add a facility to dynamically adjust or unconfigure p1003_1b mib. Use it to allow to tune sem_nsem_max at runtime, only when sem.ko module is present in kernel. Requested and tested by: amdmi3 Reviewed by: jhb MFC after: 3 days	2010-06-02 09:59:05 +00:00
zml	7f5d6a35d6	Revert taskqueue(9) related commits until mdf@ is approved and can resolve issues. This reverts commits r207439, r208623, r208624	2010-06-01 16:04:01 +00:00
zml	cadeb05108	Avoid a wakeup(9) if we can be sure no one is waiting on the task. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:34 +00:00
zml	f1e0737c28	Revert r207439 and solve the problem differently. The task handler ta_func may free the task structure, so no references to its members are valid after the handler has been called. Using a per-queue member and having waits longer than strictly necessary was suggested by jhb. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:28 +00:00
rwatson	c7e8976175	When close() is called on a connected socket pair, SO_ISCONNECTED might be set but be cleared before the call to sodisconnect(). In this case, ENOTCONN is returned: suppress this error rather than returning it to userspace so that close() doesn't report an error improperly. PR: kern/144061 Reported by: Matt Reimer <mreimer at vpop.net>, Nikolay Denev <ndenev at gmail.com>, Mikolaj Golub <to.my.trociny at gmail.com> MFC after: 3 days	2010-05-27 15:27:31 +00:00
attilio	e56433dd50	Add the support for reporting the NOCOREDUMP flag from sysctl_kern_proc_vmmap(). Sponsored by: Sandvine Incorporated Reviewed by: kib, emaste MFC after: 1 week	2010-05-27 08:10:12 +00:00
kib	4f460f2f9a	Allow to use syscallname(9) outside subr_trap.c. MFC after: 1 month	2010-05-26 15:39:43 +00:00
jhb	6caceffefa	Ignore the 'addr' argument passed to PT_STEP (it is required to be '1' for PT_STEP which means "ignore") and PT_DETACH. PR: kern/146167 MFC after: 1 week	2010-05-25 21:32:37 +00:00
alc	54739180f5	Eliminate the acquisition and release of the page queues lock from vfs_busy_pages(). It is no longer needed. Submitted by: kib	2010-05-25 02:26:25 +00:00
alc	32b13ee957	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
mav	48198e3ddd	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
kib	70f08890fc	Fix the double counting of the last process thread td_incruntime on exit, that is done once in thread_exit() and the second time in proc_reap(), by clearing td_incruntime. Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg() instead of ruxagg_locked() and use it from thread_exit(). Diagnosed and tested by: neel MFC after: 3 days	2010-05-24 10:23:49 +00:00
kib	4208ccbe79	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
jhb	cf780ce267	- Adjust the whitespace for the lines that output fields in 'show pcpu' in DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month	2010-05-21 17:17:56 +00:00
jhb	ce208e1f41	Assert that the thread passed to sched_bind() and sched_unbind() is curthread as those routines are only supported for curthread currently. MFC after: 1 month	2010-05-21 17:15:56 +00:00
jhb	b7fc8e97f1	Allow a const char * to be passed as the process name to kproc_kthread_add() without generating a warning. MFC after: 1 month	2010-05-21 17:14:36 +00:00
kib	890c865dcf	Remove PIOLLHUP from the flags used to test for to set exceptfsd fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken. Reported by: Yoshihiko Sarumaru <ysarumaru gmail com> Discussed with: bde PR: ports/140934 MFC after: 2 weeks	2010-05-21 10:36:29 +00:00
alc	f8bed5b288	The page queues lock is no longer required by vm_page_set_invalid(), so eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib	2010-05-18 16:40:29 +00:00
rrs	8ea4ab29a0	This pushes all of JC's patches that I have in place. I am now able to run 32 cores ok.. but I still will hang on buildworld with a NFS problem. I suspect I am missing a patch for the netlogic rge driver. JC check and see if I am missing anything except your core-mask changes Obtained from: JC	2010-05-16 19:43:48 +00:00
bz	c9d1ca826b	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
zml	773cda6040	Add VOP_ADVLOCKPURGE so that the file system is called when purging locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr	2010-05-12 21:24:46 +00:00
pjd	05f836c1c3	When there is no memory or KVA, try to help by reclaiming some vnodes. This helps with 'kmem_map too small' panics. No objections from: kib Tested by: Alexander V. Ribchansky <shurik@zk.informjust.ua> MFC after: 1 week	2010-05-12 16:42:28 +00:00
pjd	f1b200bbcc	I added vfs_lowvnodes event, but it was only used for a short while and now it is totally unused. Remove it. MFC after: 3 days	2010-05-11 22:46:36 +00:00
attilio	4d95c325dd	Right now, WITNESS just blindly pipes all the output to the (TOCONS \| TOLOG) mask even when called from DDB points. That breaks several output, where the most notable is textdump output. Fix this by having configurable callbacks passed to witness_list_locks() and witness_display_spinlock() for printing out datas. Reported by: several broken textdump outputs Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC after: 7 days X-MFC: r207922	2010-05-11 18:24:22 +00:00
attilio	a6a1f012b7	There is not a good reason to have a different prototype for db_printf() when compared to printf(). Unify it by returning the number of characters displayed for db_printf() as well. MFC after: 7 days	2010-05-11 17:01:14 +00:00
attilio	31c196b3b9	Fix a hang introduced in r206878 for kernel compiled with SMP support but being not actual SMP and similar situations by always initializing the smp ipi mutex. Reported by: marius MFC after: 3 days X-MFC: r206878	2010-05-11 15:36:16 +00:00
alc	bc80981f79	Update a comment: It no longer makes sense to talk about the page queues lock here.	2010-05-08 23:01:47 +00:00
alc	40b44f9713	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
kib	77dcee6926	Add MAKEDEV_NOWAIT flag to make_dev_credf(9), to create a device node in a no-sleep context. If resource allocation cannot be done without sleep, make_dev_credf() fails and returns NULL. Reviewed by: jh MFC after: 2 weeks	2010-05-06 19:22:50 +00:00
alc	fecc56fac1	Eliminate page queues locking around most calls to vm_page_free().	2010-05-06 18:58:32 +00:00
trasz	f26ccb52af	Avoid overflow. Submitted by: bde@	2010-05-06 18:52:41 +00:00
trasz	e6f92048fa	Style fixes and removal of unneeded variable. Submitted by: bde@	2010-05-06 18:43:19 +00:00
alc	e6c77ecaea	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
alc	a4eb017f2a	Eliminate a small bit of unneeded code from kern_sendfile(): While kern_sendfile() is running, the file's vm object can't be destroyed because kern_sendfile() increments the vm object's reference count. (Once kern_sendfile() decrements the reference count and returns, the vm object can, however, be destroyed. So, sf_buf_mext() must handle the case where the vm object is destroyed.) Reviewed by: kib	2010-05-06 15:52:08 +00:00
joel	c8dfd5c0cb	Switch to our preferred 2-clause BSD license. Approved by: kmacy	2010-05-05 20:39:02 +00:00
alc	5c7ca3ee73	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib	2010-05-05 18:16:06 +00:00
trasz	402e3baade	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib	2010-05-05 16:44:25 +00:00
kib	a3da7d7e69	Fix a mistake in r207603. td_rux.rux_runtime still needs conversion. Reported and tested by: nwhitehorn Pointy hat to: kib MFC after: 6 days	2010-05-05 16:05:51 +00:00
alc	ea7b6345be	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
alc	c9aaa1e2a2	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
kib	26be0345aa	Fix typo in comment. MFC after: 3 days	2010-05-04 06:06:01 +00:00
kib	e5f4727bbf	Remove a comment that merely repeats code. Submitted by: bde MFC after: 1 week	2010-05-04 06:04:33 +00:00
kib	7ef4b25b49	Use td_rux.rux_runtime for ki_runtime instead of redoing calculation. Submitted by: bde MFC after: 1 week	2010-05-04 06:00:39 +00:00
kib	b13e838a49	Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks information for thread to allow calcru1() (re)use. Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1]. The ruxagg_locked() function no longer clears thread ticks nor td_incruntime. Requested by: attilio [1] Discussed with: attilio, bde Reviewed by: bde Based on submission by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week X-MFC-Note: td_rux shall be moved to the end of struct thread	2010-05-04 05:55:37 +00:00
alc	1923b6ded3	Acquire the page lock around vm_page_unwire() and vm_page_wire(). Reviewed by: kib	2010-05-03 16:41:11 +00:00
alc	387e15c45a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
kib	9c4f2e9ab2	Lock the page around hold_count access. Reviewed by: alc	2010-05-02 19:25:22 +00:00
alc	299c89c6fb	Properly synchronize access to the page's hold_count in vfs_vmio_release(). Reviewed by: kib	2010-05-02 19:10:27 +00:00
alc	f35e97166b	It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.	2010-05-02 17:33:46 +00:00
zec	139551016d	Remove a redundant variable assignment. Reviewed by: bz, rwatson MFC after: 3 days	2010-05-01 18:34:50 +00:00
kib	64dab823a0	Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility function ruxagg_tlock(). Convert the definition of kern_getrusage() to ANSI C. Submitted by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week	2010-05-01 14:46:17 +00:00
zml	3eac0000f0	Handle taskqueue_drain(9) correctly on a threaded taskqueue: taskqueue_drain(9) will not correctly detect whether a task is currently running. The check is against a field in the taskqueue struct, but for a threaded queue with more than one thread, multiple threads can simultaneously be running a task, thus stomping over the tq_running field. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: jhb Approved by: dfr (mentor)	2010-04-30 16:29:05 +00:00
alfred	12d5232340	Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(), use the heap instead. Obtained from: Juniper Networks Reviewed by: jhb	2010-04-30 03:15:00 +00:00
alfred	993bf6ff36	Don't leak core_buf or gzfile if doing a compressed core file and we hit an error condition. Obtained from: Juniper Networks	2010-04-30 03:13:24 +00:00
alfred	20fdc94b9e	Do not set IO_NODELOCKED while writing to vnodes as our consumers do not lock the vnodes. Obtained from: Juniper Networks Reviewed by: jhb	2010-04-30 03:10:53 +00:00
kmacy	1dc1263413	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
kib	a22b32df4a	Remove caddr_t casts. Requested by: bde MFC after: 10 days	2010-04-29 09:55:51 +00:00
avg	2cfe78bdd9	kern_ntptime: drop a comment that became stale after r207359 MFC after: 1 week X-MFC after: r207359	2010-04-29 09:18:36 +00:00

... 2 3 4 5 6 ...

11964 Commits