freebsd-nq

Author	SHA1	Message	Date
Alexander Motin	95d23438dd	Until hardclock() and respectively tc_windup() called first time, system is running on "dummy" time counter. But to function properly in one-shot mode, event timer management code requires working time counter. Slow moving "dummy" time counter delays first hardclock() call by few seconds on my systems, even though timer interrupts were correctly kicking kernel. That causes few seconds delay during boot with one-shot mode enabled. To break this loop, explicitly call tc_windup() first time during initialization process to let it switch to some real time counter.	2010-09-21 08:02:02 +00:00
Edward Tomasz Napierala	4089cc8aa1	First step at adopting FreeBSD to support PSARC/2010/029. This makes acl_is_trivial_np(3) properly recognize the new trivial ACLs. From the user point of view, that means "ls -l" no longer shows plus signs for all the files when running ZFS v28.	2010-09-20 17:10:06 +00:00
Ed Schouten	d1817ed7f3	Just make callout devices and /dev/console force CLOCAL on open(). Instead of adding custom checks to wait for DCD on open(), just modify the termios structure to set CLOCAL. This means SIGHUP is no longer generated when losing DCD as well. Reviewed by: kib@ MFC after: 1 week	2010-09-19 16:35:42 +00:00
Ed Schouten	4b5d5046ab	Ignore DCD handling on /dev/console entirely. This makes /dev/console more fail-safe and prevents a potential console lock-up during boot. Discussed on: stable@ Tested by: koitsu@ MFC after: 1 week	2010-09-19 14:21:39 +00:00
Robert Watson	adb6aa9ab9	With reworking of the socket life cycle in 7.x, the need for a "sotryfree()" was eliminated: all references to sockets are explicitly managed by sorele() and the protocols. As such, garbage collect sotryfree(), and update sofree() comments to make the new world order more clear. MFC after: 3 days Reported by: Anuranjan Shukla <anshukla at juniper dot net>	2010-09-18 11:18:42 +00:00
Andriy Gapon	19b8a6dbc1	kern.sched.topology_spec sysctl: use step of 1 for group levels numeration This is just a cosmetic change for prettier output. 'indent' variable/parameter serves two purposes: it specifies whitespace indentation level and also implies cpu group level/depth. It would have been better to split those two uses, but for now just a simple change. MFC after: 1 week	2010-09-18 11:16:43 +00:00
Alexander Motin	8e860de4bf	When global timer used at SMP system, update nextevent field on BSP before sending IPI to other CPUs. Otherwise, other CPUs will try to honor stale value, programming timer for zero interval. If timer is fast enough, it caused extra interrupt before timer correctly reprogrammed by BSP.	2010-09-18 07:18:30 +00:00
Warner Losh	5ff4999243	By popular demand, kill all the non GIANT related interrupt messages. They are confusing and add little value. Reviewed by: jhb@	2010-09-17 16:05:25 +00:00
Matthew D Fleming	4e6571599b	Re-add r212370 now that the LOR in powerpc64 has been resolved: Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)	2010-09-16 16:13:12 +00:00
Alexander Motin	9aff0c8ff7	Fix panic on NULL dereference possible after r212541.	2010-09-14 10:26:49 +00:00
Alexander Motin	0e18987383	Make kern_tc.c provide minimum frequency of tc_ticktock() calls, required to handle current timecounter wraps. Make kern_clocksource.c to honor that requirement, scheduling sleeps on first CPU for no more then specified period. Allow other CPUs to sleep up to 1/4 second (for any case).	2010-09-14 08:48:06 +00:00
Alexander Motin	4763a8b8c1	Replace spin lock with the set of atomics. It is impractical for one tc_ticktock() call to wait for another's completion -- just skip it.	2010-09-14 04:57:30 +00:00
Alexander Motin	dd9595e7fa	Add some foot shooting protection by checking singlemul value correctness. Rephrase sysctls descriptions. Suggested by: edmaste	2010-09-14 04:48:04 +00:00
Matthew D Fleming	404a593e28	Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn	2010-09-13 18:48:23 +00:00
Andriy Gapon	b7d28b2e0b	bus_add_child: add specialized default implementation that calls panic If a kobj method doesn't have any explicitly provided default implementation, then it is auto-assigned kobj_error_method. kobj_error_method is proper only for methods that return error code, because it just returns ENXIO. So, in the case of unimplemented bus_add_child caller would get (device_t)ENXIO as a return value, which would cause the mistake to go unnoticed, because return value is typically checked for NULL. Thus, a specialized null_add_child is added. It would have sufficied for correctness to return NULL, but this type of mistake was deemed to be rare and serious enough to call panic instead. Watch out for this kind of problem with other kobj methods. Suggested by: jhb, imp MFC after: 2 weeks	2010-09-13 08:34:20 +00:00
Alexander Motin	a157e42516	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.	2010-09-13 07:25:35 +00:00
Alexander Motin	90baf564d2	Do not print "frequency 0 Hz", when frequency is unknown.	2010-09-11 20:18:15 +00:00
Alexander Kabaev	eb262be333	Add missing pointer increment to sbuf_cat.	2010-09-11 19:42:50 +00:00
Konstantin Belousov	9a24dc0760	Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak when mount and update are executed in parallel. Encapsulate syncer vnode deallocation into the helper function vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c. Found and reviewed by: jh (previous version of the patch) Tested by: pho MFC after: 3 weeks	2010-09-11 13:06:06 +00:00
Alexander Motin	b722ad008b	Merge some SCHED_ULE features to SCHED_4BSD: - Teach SCHED_4BSD to inform cpu_idle() about high sleep/wakeup rate to choose optimized handler. In case of x86 it is MONITOR/MWAIT. Also it will be needed to bypass forthcoming idle tick skipping logic to not consume resources on events rescheduling when it won't give any benefits. - Teach SCHED_4BSD to wake up idle CPUs without using IPI. In case of x86, when MONITOR/MWAIT is active, it require just single memory write. This doubles performance on some heavily switching test loads.	2010-09-11 07:08:22 +00:00
Jamie Gritton	f337198db0	Don't exit kern_jail_set without freeing options when enforce_statfs has an illegal value. MFC after: 3 days	2010-09-10 21:45:42 +00:00
Matthew D Fleming	4d369413e1	Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.	2010-09-10 16:42:16 +00:00
Alexander Motin	9f9ad565a1	Do not IPI CPU that is already spinning for load. It doubles effect of spining (comparing to MWAIT) on some heavly switching test loads.	2010-09-10 13:24:47 +00:00
Andriy Gapon	3d844eddb7	bus_add_child: change type of order parameter to u_int This reflects actual type used to store and compare child device orders. Change is mostly done via a Coccinelle (soon to be devel/coccinelle) semantic patch. Verified by LINT+modules kernel builds. Followup to: r212213 MFC after: 10 days	2010-09-10 11:19:03 +00:00
Matthew D Fleming	dd67e2103c	Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk	2010-09-09 18:33:46 +00:00
Matthew D Fleming	4351ba272c	Add drain functionality to sbufs. The drain is a function that is called when the sbuf internal buffer is filled. For kernel sbufs with a drain, the internal buffer will never be expanded. For userland sbufs with a drain, the internal buffer may still be expanded by sbuf_[v]printf(3). Sbufs now have three basic uses: 1) static string manipulation. Overflow is marked. 2) dynamic string manipulation. Overflow triggers string growth. 3) drained string manipulation. Overflow triggers draining. In all cases the manipulation is 'safe' in that overflow is detected and managed. Reviewed by: phk (the previous version)	2010-09-09 17:49:18 +00:00
Matthew D Fleming	01f6f5fcd4	Refactor sbuf code so that most uses of sbuf_extend() are in a new sbuf_put_byte(). This makes it easier to add drain functionality when a buffer would overflow as there are fewer code points. Reviewed by: phk	2010-09-09 16:51:52 +00:00
Rui Paulo	d3555b6fc2	Fix two bugs in DTrace: * when the process exits, remove the associated USDT probes * when the process forks, duplicate the USDT probes. Sponsored by: The FreeBSD Foundation	2010-09-09 09:58:05 +00:00
Pawel Jakub Dawidek	4946fa6791	Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure.	2010-09-09 07:55:13 +00:00
Pawel Jakub Dawidek	7443b79b81	Doing first mount and updating mount points are both handled by the same syscall and the same function, but are very different and share almost no code. To make it easier to read and analyze, split vfs_domount() into vfs_domount_first() and vfs_domount_update(). Reviewed by: kib	2010-09-08 21:00:53 +00:00
Pawel Jakub Dawidek	a34512e3f0	- Log all the problems in devfs_fixup(). - Correct error paths. The system will be useless on devfs_fixup() failure, so why bother? Maybe for the same reason why a dead body is washed and dressed in a nice suit before it is put into a coffin? Maybe system's last will is to panic without any locks held? Reviewed by: kib	2010-09-08 20:56:18 +00:00
Andriy Gapon	3b0620e06c	subr_bus: use hexadecimal representation for bit flags It seems that this format is more custom in our code, and it is more convenient too. Suggested by: jhb No objection: imp MFC after: 1 week	2010-09-08 17:35:06 +00:00
Michael Tuexen	049640c1f0	Implement correct handling of address parameter and sendinfo for SCTP send calls. MFC after: 4 weeks.	2010-09-05 20:13:07 +00:00
Alexander Motin	d89be9509f	Initialize buffer for case of empty string. Happens only on non-refactored platforms.	2010-09-05 06:16:04 +00:00
Andriy Gapon	ef3b7ba04f	struct device: widen type of flags and order fields to u_int Also change int -> u_int for order parameter in device_add_child_ordered. There should not be any ABI change as struct device is private to subr_bus.c and the API change should be compatible. To do: change int -> u_int for order parameter of bus_add_child method and its implementations. The change should also be API compatible, but is a bit more churn. Suggested by: imp, jhb MFC after: 1 week	2010-09-04 17:28:29 +00:00
Matthew D Fleming	181ff3d503	Use a better #if guard. Suggested by pluknet <pluknet at gmail dot com>.	2010-09-03 17:42:17 +00:00
Matthew D Fleming	c05dbe7a54	Style(9) fixes and eliminate the use of min().	2010-09-03 17:42:12 +00:00
Matthew D Fleming	969292fb1b	Fix user-space libsbuf build. Why isn't CTASSERT available to user-space?	2010-09-03 17:23:26 +00:00
Matthew D Fleming	f5a5dc5da8	Fix brain fart when converting an if statement into a KASSERT.	2010-09-03 16:12:39 +00:00
Matthew D Fleming	f4bafab8da	Use math rather than iteration when the desired sbuf size is larger than SBUF_MAXEXTENDSIZE.	2010-09-03 16:09:17 +00:00
Justin T. Gibbs	f03f7a0ca3	Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation MFC after: 1 month	2010-09-02 19:40:28 +00:00
Matthew D Fleming	ba4932b5a2	Fix UP build. MFC after: 2 weeks	2010-09-02 16:23:05 +00:00
Matthew D Fleming	0f7a0ebd59	Fix a bug with sched_affinity() where it checks td_pinned of another thread in a racy manner, which can lead to attempting to migrate a thread that is pinned to a CPU. Instead, have sched_switch() determine which CPU a thread should run on if the current one is not allowed. KASSERT in sched_bind() that the thread is not yet pinned to a CPU. KASSERT in sched_switch() that only migratable threads or those moving due to a sched_bind() are changing CPUs. sched_affinity code came from jhb@. MFC after: 2 weeks	2010-09-01 20:32:47 +00:00
Max Laier	36058c09e4	rmlock(9) two additions and one change/fix: - add rm_try_rlock(). - add RM_SLEEPABLE to use sx(9) as the back-end lock in order to sleep while holding the write lock. - change rm_noreadtoken to a cpu bitmask to indicate which CPUs need to go through the lock/unlock in order to synchronize. As a side effect, this also avoids IPI to CPUs without any readers during rm_wlock. Discussed with: ups@, rwatson@ on arch@ Sponsored by: Isilon Systems, Inc.	2010-09-01 19:50:03 +00:00
Ed Maste	e5ddf11581	As long as we are going to panic anyway, there's no need to hide additional information behind DIAGNOSTIC.	2010-09-01 13:47:11 +00:00
David Xu	137cf33d5e	rescure comments from RELENG_4.	2010-09-01 01:26:07 +00:00
Matthew D Fleming	6d3ed393d6	The realloc case for memguard(9) will copy too many bytes when reallocating to a smaller-sized allocation. Fix this issue. Noticed by: alc Reviewed by: alc Approved by: zml (mentor) MFC after: 3 weeks	2010-08-31 16:57:58 +00:00
David Xu	83b718eb07	If a process is being debugged, skips job control caused by SIGSTOP/SIGCONT signals, because it is managed by debugger, however a normal signal sent to a interruptibly sleeping thread wakes up the thread so it will handle the signal when the process leaves the stopped state. PR: 150138 MFC after: 1 week	2010-08-31 07:15:50 +00:00
Jaakko Heinonen	de478dd4b4	execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege. Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately. PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)	2010-08-30 16:30:18 +00:00
Konstantin Belousov	e7fb66340e	Regen	2010-08-30 14:26:02 +00:00
Konstantin Belousov	8d19559bde	Make the syscalls reserved for AFS usable by OpenAFS port. Submitted by: Benjamin Kaduk <kaduk mit edu> MFC after: 2 weeks	2010-08-30 14:24:44 +00:00
Konstantin Belousov	6d8fedda2c	For some file types, select code registers two selfd structures. E.g., for socket, when specified POLLIN\|POLLOUT in events, you would have one selfd registered for receiving socket buffer, and one for sending. Now, if both events are not ready to fire at the time of the initial scan, but are simultaneously ready after the sleep, pollrescan() would iterate over the pollfd struct twice. Since both times revents is not zero, returned value would be off by one. Fix this by recalculating the return value in pollout(). PR: kern/143029 MFC after: 2 weeks	2010-08-28 17:42:08 +00:00
Pawel Jakub Dawidek	c87f1ad43c	There is a bug in vfs_allocate_syncvnode() failure handling in mount code. Actually it is hard to properly handle such a failure, especially in MNT_UPDATE case. The only reason for the vfs_allocate_syncvnode() function to fail is getnewvnode() failure. Fortunately it is impossible for current implementation of getnewvnode() to fail, so we can assert this and make vfs_allocate_syncvnode() void. This in turn free us from handling its failures in the mount code. Reviewed by: kib MFC after: 1 month	2010-08-28 08:57:15 +00:00
Pawel Jakub Dawidek	646c3b21ae	Run all tasks from a proper context, with proper priority, etc. Reviewed by: jhb MFC after: 1 month	2010-08-28 08:38:03 +00:00
Konstantin Belousov	13561ed4ed	Fix typo. Submitted by: Ben Kaduk <minimarmot gmail com>	2010-08-26 11:20:57 +00:00
Brian Somers	c2d844d814	If we read zero bytes from the directory, early out with ENOENT rather than forging ahead and interpreting garbage buffer content and dirent structures. This change backs out r211684 which was essentially a no-op. MFC after: 1 week	2010-08-25 18:09:51 +00:00
David Xu	df7442533c	If a thread is removed from umtxq while sleeping, reset error code to zero, this gives userland a better indication that a thread needn't to be cancelled.	2010-08-25 03:14:32 +00:00
David Xu	2961a78226	Optimize thr_suspend, if timeout is zero, don't call msleep, just return immediately.	2010-08-24 07:29:55 +00:00
David Xu	baf28b69f4	- According to specification, SI_USER code should only be generated by standard kill(). On other systems, SI_LWP is generated by lwp_kill(). This will allow conforming applications to differentiate between signals generated by standard events and those generated by other implementation events in a manner compatible with existing practice. - Bump __FreeBSD_version	2010-08-24 07:22:24 +00:00
Warner Losh	b3cdb67393	This should really be MACHINE not MACHINE_ARCH, and is this Makefile even used?	2010-08-23 06:22:35 +00:00
Brian Somers	90db41b62b	uio_resid isn't updated by VOP_READDIR for nfs filesystems. Use the uio_offset adjustment instead to calculate a correct *len. Without this change, we run off the end of the directory data we're reading and panic horribly for nfs filesystems. MFC after: 1 week	2010-08-23 05:33:31 +00:00
Rui Paulo	b3d354c9ce	Call the systrace_probe_func() when the error value. Sponsored by: The FreeBSD Foundation	2010-08-22 11:30:49 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
Rui Paulo	258f5a255e	Bump KDTRACE_THREAD_ZERO and use M_ZERO as a malloc flag instead of calling bzero. Sponsored by: The FreeBSD Foundation	2010-08-22 11:09:53 +00:00
Rui Paulo	9f3a1843ed	Fix style issues. Sponsored by: The FreeBSD Foundation	2010-08-22 11:08:18 +00:00
David Xu	b274870405	make sure thread lock is locked.	2010-08-20 23:51:34 +00:00
John Baldwin	3634d5b241	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days	2010-08-20 19:46:50 +00:00
David Xu	c6aa908d9c	If thread set a TDP_WAKEUP for itself, clears the flag and returns EINTR immediately, this is used for implementing reliable pthread cancellation.	2010-08-20 04:28:30 +00:00
John Baldwin	8c7a92bd4a	Remove unused KTRACE includes.	2010-08-19 16:41:27 +00:00
John Baldwin	ba50d5975d	There isn't really a need to hold the ktrace mutex just to read the value of p_traceflag that is stored in the kinfo_proc structure. It is still racey even with the lock and the code will read a consistent snapshot of the flag without the lock.	2010-08-19 16:40:30 +00:00
John Baldwin	2b3fb61569	Fix a whitespace nit and remove a questioning comment. STAILQ_CONCAT() does require the STAILQ the existing list is being added to to already be initialized (it is CONCAT() vs MOVE()).	2010-08-19 16:38:58 +00:00
John Baldwin	fe41d17ab2	Keep the process locked when calling ktrops() or ktrsetchildren() instead of dropping the lock only to immediately reacquire it.	2010-08-17 21:34:19 +00:00
Konstantin Belousov	ee235befcb	Supply some useful information to the started image using ELF aux vectors. In particular, provide pagesize and pagesizes array, the canary value for SSP use, number of host CPUs and osreldate. Tested by: marius (sparc64) MFC after: 1 month	2010-08-17 08:55:45 +00:00
Pawel Jakub Dawidek	3d336cd03b	Simplify taskqueue_drain() by using proved macros.	2010-08-13 19:20:35 +00:00
Justin T. Gibbs	5ef8fb65f9	Allow interrupt driven config hooks to be registered from config hook callbacks. Interrupt driven configuration hooks serve two purposes: they are a mechanism for registering for a callback that is invoked once interrupt services are available, and they hold off root device selection so long as any configuration hooks are still active. Before this change, it was not possible to safely register additional hooks from the context of a configuration hook callback. The need for this feature arises when interrupts are required to discover new devices (e.g. access to the XenStore to find para-virtualized devices) which in turn also require the ability to hold off root device selection until some lengthy, interrupt driven, configuration task has completed (e.g. Xen front/back device driver negotiation). More specifically, the mutex protecting the list of active configuration hooks is never held during a callback, and static information is used to ensure proper ordering and only a single callback to each hook even when faced with registration or removal of a hook during an active run. Sponsored by: Spectra Logic Corporation MFC after: 1 week.	2010-08-12 19:50:40 +00:00
Justin T. Gibbs	74ec46dff4	Properly indent a continue statement. No functional changes.	2010-08-12 19:26:27 +00:00
Jung-uk Kim	116a77bda7	Add the half of time-of-day clock resolution when we adjust system time from time-of-day clock or vice versa. For x86 systems, RTC resolution is one second and we used to lose up to one second whenever we initialize system time from RTC or write system time back to RTC. With this change, margin of error per conversion is roughly between -0.5 and +0.5 second rather than between -1 and 0 second. Note that it does not take care of errors from getnanotime(9) (which is up to 1/hz second) or CLOCK_GETTIME() latency. These are just too expensive to correct and it is not worthy of the cost.	2010-08-12 17:17:05 +00:00
Jung-uk Kim	0674ea6f69	Provide description for 'machdep.disable_rtc_set' sysctl. Clean up style(9) nits. Remove a redundant return statement and an unnecessary variable.	2010-08-12 16:13:24 +00:00
Konstantin Belousov	3beb1b723f	The buffers b_vflags field is not always properly protected by bufobj lock. If b_bufobj is not NULL, then bufobj lock should be held when manipulating the flags. Not doing this sometimes leaves BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to stuck indefinitely in "getbuf" sleep, waiting for background write to finish which is not actually performed. Add BO_LOCK() in the cases where it was missed. In collaboration with: pho Tested by: bz Reviewed by: jeff MFC after: 1 month	2010-08-12 08:36:23 +00:00
Matthew D Fleming	e3813573bd	Rework memguard(9) to reserve significantly more KVA to detect use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month	2010-08-11 22:10:37 +00:00
Ivan Voras	af7326d405	Fix (hopefully) the spelling of "queuing." Submitted by: bf1783 at gmail com	2010-08-09 23:32:37 +00:00
Ivan Voras	e98c5c7813	Bumping the read-ahead count once more, to value equivalent to 512 KiB on most system, based on benchmark results on a low-end fibre channel SAN under VMWare: vfs.read_max read performance 8 (historical default) 83 MB/s 16 (recent bump) 131 MB/s 32 (this version) 152 MB/s 64 157 MB/s (results are +/- 3 MB/s) As read-ahead is heuristic, based on past IO requests, it shouldn't be problematic. The new default is still smaller then in other OSes.	2010-08-09 22:56:10 +00:00
Ivan Voras	dd8c13d589	Elaborate on how hirunningspace was chosen.	2010-08-09 22:22:46 +00:00
Gavin Atkinson	a0c87b747c	Add descriptions to a handful of sysctl nodes. PR: kern/148580 Submitted by: Galimov Albert <wtfcrap mail.ru> MFC after: 1 week	2010-08-09 14:48:31 +00:00
Attilio Rao	2d8b420b9f	The r208165 fixed a bug related to unsigned integer overflowing for the number of CPUs detection. However, that was not mention at all, the problem was not reported, the patch has not been MFCed and the fix is mostly improper. Fix the original overflow (caused when 32 CPUs must be detected) by just using a different mathematical computation (it also makes more explicit the size of operands involved, which is good in the moment waiting for a more complete support for a large number of CPUs). PR: kern/148698 Submitted by: Joe Landers <jlanders at vmware dot com> Tested by: gianni MFC after: 10 days	2010-08-09 00:23:57 +00:00
Jamie Gritton	4affa14c81	Back out r210974. Any convenience of not typing "persist" is outweighed by the possibility of unintended partially-formed jails.	2010-08-08 23:22:55 +00:00
Ivan Voras	27f11235b9	To help with sequential read UFS performance on modern systems, increase the vfs.read_max default. For most systems this means going from 128 KiB to 256 KiB, which is still very conservative and lower than what most other operating systems use, but as a sane default should not interfere much with existing systems. For systems with RAID volumes and/or virtualization envirnments, where read performance is very important, increasing this sysctl tunable to 32 or even more will demonstratively yield additional performance benefits. If MAXPHYS ever gets bumped up, it will probably be a good idea to slave read_max to it.	2010-08-07 18:30:10 +00:00
Michael Tuexen	af9ba7d805	Fix a bug where MSG_TRUNC was not returned in all necessary cases for SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could not be copied to the application. If some data was left in the last mbuf, it was correctly discarded, but MSG_TRUNC was not set. Reviewed by: bz MFC after: 3 weeks	2010-08-07 17:57:58 +00:00
Jamie Gritton	f4aad87394	Implicitly make a new jail persistent if it's set not to attach. MFC after: 3 days	2010-08-06 22:04:18 +00:00
John Baldwin	d9d8d1449d	Add a new ipi_cpu() function to the MI IPI API that can be used to send an IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month	2010-08-06 15:36:59 +00:00
Christian S.J. Peron	ea235a1449	Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixes the virtualization detection successfully disabling the clflush instruction. This fixes insta-panics for XEN hvm users when the hw.clflush_disable tunable is -1 or 0 (-1 by default). Discussed with: jhb	2010-08-06 15:04:40 +00:00
Konstantin Belousov	6c5e633cd6	Add "show cdev" ddb command. In collaboration with: pho MFC after: 1 month	2010-08-06 09:44:01 +00:00
Konstantin Belousov	3979450b4c	Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month	2010-08-06 09:42:15 +00:00
Alan Cox	3b156706c4	In order for MAXVNODES_MAX to be an "int" on powerpc and sparc, we must cast PAGE_SIZE to an "int". (Powerpc and sparc, unlike the other architectures, define PAGE_SIZE as a "long".) Submitted by: Andreas Tobler	2010-08-04 05:09:02 +00:00
Alan Cox	1d7fe4b515	Update the "desiredvnodes" calculation. In particular, make the part of the calculation that is based on the kernel's heap size more conservative. Hopefully, this will eliminate the need for MAXVNODES_MAX, but for the time being set MAXVNODES_MAX to a large value. Reviewed by: jhb@ MFC after: 6 weeks	2010-08-02 21:33:36 +00:00
Rui Paulo	55820e298a	Bump the witness pendlist to 768 to accomodate the increased number of spinlocks.	2010-07-29 16:13:26 +00:00
Matthew D Fleming	d7854da193	Add MALLOC_DEBUG_MAXZONES debug malloc(9) option to use multiple uma zones for each malloc bucket size. The purpose is to isolate different malloc types into hash classes, so that any buffer overruns or use-after-free will usually only affect memory from malloc types in that hash class. This is purely a debugging tool; by varying the hash function and tracking which hash class was corrupted, the intersection of the hash classes from each instance will point to a single malloc type that is being misused. At this point inspection or memguard(9) can be used to catch the offending code. Add MALLOC_DEBUG_MAXZONES=8 to -current GENERIC configuration files. The suggestion to have this on by default came from Kostik Belousov on -arch. This code is based on work by Ron Steinke at Isilon Systems. Reviewed by: -arch (mostly silence) Reviewed by: zml Approved by: zml (mentor)	2010-07-28 15:36:12 +00:00
Alan Cox	a14a949872	The interpreter name should no longer be treated as a buffer that can be overwritten. (This change should have been included in r210545.) Submitted by: kib	2010-07-28 04:47:40 +00:00
Alan Cox	2af6e14d39	Introduce exec_alloc_args(). The objective being to encapsulate the details of the string buffer allocation in one place. Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string. Reviewed by: kib	2010-07-27 17:31:03 +00:00
Alan Cox	9e4e511499	Change the order in which the file name, arguments, environment, and shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space. Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string. Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.) Reviewed by: kib	2010-07-25 17:43:38 +00:00
Alan Cox	69a8f9e3d1	Eliminate a little bit of duplicated code.	2010-07-23 18:58:27 +00:00
Andriy Gapon	676799a00d	completely ignore zero-sized elf sections in modules of elf object type (amd64) Current code doesn't check size of elf sections and may perform needless actions of zero-sized memory allocation and similar. The bigger issue is that alignment requirement of a zero-sized section gets effectively applied to the next section if it has smaller alignment requirement. But other tools, like gdb and consequently kgdb, completely ignore zero-sized sections and thus may map symbols to addresses differently. Zero-sized sections are not typical in general. Their typical (only, even) cause in FreeBSD modules is inline assembly that creates custom sections which is found in pcpu.h and vnet.h. Mere inclusion of one of those header files produces a custom section in elf output. If there is no actual use for the section in a given module, then the section remains empty. Better solution is to avoid creating zero-sized sections altogether, which is in plans. Preloaded modules are handled in boot code (load_elf_obj.c), while dynamically loaded modules are handled by kernel (link_elf_obj.c). Based on code by: np MFC after: 3 weeks	2010-07-23 17:07:51 +00:00
Andriy Gapon	dac509311f	cpufreq: allocate long-lived buffer for handling of sysctl requests At present the cpufreq sysctl handler for current level setting would allocate and deallocate a temporary buffer of 24KB even to handle a read-only query. This puts unnecessary load on memory subsystem when current level is checked frequently, e.g. when the likes of powerd and system monitoring software are running. Change the strategy to allocating a long-lived buffer for handling the requests. Reviewed by: njl MFC after: 2 weeks	2010-07-23 16:46:42 +00:00
Ivan Voras	984c64736c	Make lorunningspace catch up with hirunningspace. While there, add comment about the magic numbers. Prodded by: alc	2010-07-23 12:30:29 +00:00
Matthew D Fleming	033459c8f1	Remove unused variable that snuck in during development. Approved by: zml (mentor)	2010-07-22 17:23:43 +00:00
Matthew D Fleming	242ed5d96c	Fix taskqueue_drain(9) to not have false negatives. For threaded taskqueues, more than one task can be running simultaneously. Also make taskqueue_run(9) static to the file, since there are no consumers in the base kernel and the function signature needs to change with this fix. Remove mention of taskqueue_run(9) and taskqueue_run_fast(9) from the taskqueue(9) man page. Reviewed by: jhb Approved by: zml (mentor)	2010-07-22 16:41:09 +00:00
Konstantin Belousov	87d45a0392	When compat32 binary asks for the value of hw.machine_arch, report the name of 32bit sibling architecture instead of the host one. Do the same for hw.machine on amd64. Add a safety belt debug.adaptive_machine_arch sysctl, to turn the substitution off. Reviewed by: jhb, nwhitehorn MFC after: 2 weeks	2010-07-22 09:13:49 +00:00
Edward Tomasz Napierala	175389cff2	Remove spurious '/*-' marks and fix some other style problems. Submitted by: bde@	2010-07-22 05:42:29 +00:00
Alexander Motin	e88f9fb47f	Use proper sysctl type (quad) for et_frequency. It fixes output on sparc64.	2010-07-21 12:23:49 +00:00
Attilio Rao	4e55157fa4	Probabilly defaulting to KTR_GEN is not the right decision when KTR_MASK is not defined at all because KTR_GEN is still a valid class and some traces may fit in. Default to 0, instead, and block any tracing. As long as this is a POLA violation (some thirdy-part code, even if that may be a questionable choice, could be rely on that feature) a MFC possibility might be carefully evaluated. Sponsored by: Sandvine Incorporated	2010-07-21 10:14:04 +00:00
Alexander Motin	599cf0f197	Fix several un-/signedness bugs of r210290 and r210293. Add one more check.	2010-07-20 15:48:29 +00:00
Ivan Voras	b089a17737	Fix expression style. Prodded by: jhb	2010-07-20 13:59:51 +00:00
Alexander Motin	51636352b6	Extend timer driver API to report also minimal and maximal supported period lengths. Make MI wrapper code to validate periods in request. Make kernel clock management code to honor these hardware limitations while choosing hz, stathz and profhz values.	2010-07-20 10:58:56 +00:00
David Xu	212bc4b337	Fix function name in error messages.	2010-07-20 02:23:12 +00:00
Edward Tomasz Napierala	1a996ed1d8	Revert r210225 - turns out I was wrong; the "/*-" is not license-only thing; it's also used to indicate that the comment should not be automatically rewrapped. Explained by: cperciva@	2010-07-18 20:57:53 +00:00
Edward Tomasz Napierala	805cc58ac0	The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright occurences from sys/sys/ and sys/kern/.	2010-07-18 20:23:10 +00:00
Edward Tomasz Napierala	eea4ac8b3f	Remove outdated comment and move part of it into more applicable place.	2010-07-18 19:29:12 +00:00
Ivan Voras	1de98e0687	In keeping with the Age-of-the-fruitbat theme, scale up hirunningspace on machines which can clearly afford the memory. This is a somewhat conservative version of the patch - more fine tuning may be necessary. Idea from: Thread on hackers@ Discussed with: alc	2010-07-18 10:15:33 +00:00
John Baldwin	f2a664ac97	Retire td_syscalls now that it is no longer needed.	2010-07-15 20:24:37 +00:00
Ivan Voras	611daf7e62	A cosmetic change - don't output empty <flags>.	2010-07-15 13:46:30 +00:00
Alexander Motin	43fe7d458a	Rename timeevents.c to kern_clocksource.c. Suggested by: jhb@	2010-07-14 18:43:27 +00:00
John Baldwin	a3052d6e08	- Document layout of KTR_STRUCT payload in a comment. - Simplify ktrstruct() calling convention by having ktrstruct() use strlen() rather than requiring the caller to hand-code the length of constant strings. MFC after: 1 month	2010-07-14 17:38:01 +00:00
Alexander Motin	28ab822d8a	Move timeevents.c to MI code, as it is not x86-specific. I already have it working on Marvell ARM SoCs, and it would be nice to unify timer code between more platforms.	2010-07-14 13:31:27 +00:00
Colin Percival	32a8b1d832	Correctly copy the M_RDONLY flag when duplicating a reference to an mbuf external buffer. Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-10:07.mbuf	2010-07-13 02:45:17 +00:00
Jung-uk Kim	4a82f10889	Use type-specific inline function imax() instead of deprecated macro MAX(). Prodded by: bde	2010-07-12 15:32:45 +00:00
Alan Cox	2882388376	Change the implementation of vm_hold_free_pages() so that it performs at most one call to pmap_qremove(), and thus one TLB shootdown, instead of one call and TLB shootdown per page. Simplify the interface to vm_hold_free_pages(). MFC after: 3 weeks	2010-07-11 20:11:44 +00:00
Alexander Motin	3bc5958c0e	Remove interval validation from cpu_tick_calibrate(). As I found, check was needed at preliminary version of the patch, where number of CPU ticks was divided strictly on 16 seconds. Final code instead uses real interval duration, so precise interval should not be important. Same time aliasing issues around second boundary causes false positives, periodically logging useless "t_delta ... too long/short" messages when HZ set below 256.	2010-07-11 16:47:45 +00:00
Alan Cox	b99348e5ea	Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently, the maintenance of vm_pageout_deficit can be localized to just two places: vm_page_alloc() and vm_pageout_scan(). This change also corrects an off-by-one error in the maintenance of vm_pageout_deficit. Historically, the buffer cache functions, allocbuf() and vm_hold_load_pages(), have not taken into account that vm_page_alloc() already increments vm_pageout_deficit by one. Reviewed by: kib	2010-07-09 19:38:30 +00:00
John Baldwin	e113db82af	Accidentally committed an older version of this comment rather than the final one.	2010-07-09 13:59:53 +00:00
John Baldwin	07b183388a	Refine a comment. Reviewed by: bde	2010-07-09 13:53:25 +00:00
Jaakko Heinonen	831aa555de	Remove redundant high >= 0. Reported by: rstone	2010-07-09 10:57:55 +00:00
Jung-uk Kim	4624e08a59	Implement optional 'precision' for numbers. Previously, it was parsed but ignored. Some third-party modules (e.g., APCICA) prefer this format over zero padding flag '0'.	2010-07-08 22:13:23 +00:00
John Baldwin	fc8cca02c7	- Various style and whitespace fixes. - Make sugid_coredump and kern_logsigexit private to kern_sig.c. Submitted by: bde (partially) MFC after: 1 month	2010-07-08 19:15:26 +00:00
Jaakko Heinonen	501812f2c5	Assert that low and high are >= 0. The allocator doesn't support the negative range.	2010-07-08 16:53:19 +00:00
Attilio Rao	631cb86f11	- Simplify logic in handling ticks wrap-up - Fix a bug where thread may be in sleeping state but the wchan won't be set, leading to an empty container for sleepq_type(). [0] Sponsored by: Sandvine Incorporated [0] Submitted by: Bryan Venteicher <bryanv at daemoninthecloset dot org> MFC after: 3 days X-MFC: 209577	2010-07-07 12:00:11 +00:00
Konstantin Belousov	aa81ae08e9	In revoke(), verify that VCHR vnode indeed belongs to devfs. Found and tested by: pho MFC after: 1 week	2010-07-06 18:20:49 +00:00
Ed Schouten	822eb2b050	Fix a race condition, where a TTY could be destroyed twice. There are special cases where tty_rel_free() can be called twice in a row, namely when closing and revoking the TTY at the same moment. Only call destroy_dev_sched_cb() once. Reported by: Jeremie Le Hen MFC after: 1 week	2010-07-06 08:56:34 +00:00
Konstantin Belousov	5f195aa32e	Add the ability for the allocflag argument of the vm_page_grab() to specify the increment of vm_pageout_deficit when sleeping due to page shortage. Then, in allocbuf(), the code to allocate pages when extending vmio buffer can be replaced by a call to vm_page_grab(). Suggested and reviewed by: alc MFC after: 2 weeks	2010-07-05 21:13:32 +00:00
Jaakko Heinonen	13c02cbb18	Extend the kernel unit number allocator for allocating specific unit numbers. This change adds a new function alloc_unr_specific() which returns the requested unit number if it is free. If the number is already allocated or out of the range, -1 is returned. Update alloc_unr(9) manual page accordingly and add a MLINK for alloc_unr_specific(9). Discussed on: freebsd-hackers	2010-07-05 16:23:55 +00:00
Konstantin Belousov	34a39b7b1f	Obey sv_syscallnames bounds in syscallname(). Reported and tested by: pho	2010-07-04 18:16:17 +00:00
Konstantin Belousov	8a26007903	Extend ptrace(PT_LWPINFO) to report siginfo for the signal that caused debugee stop. The change should keep the ABI. Take care of compat32. Discussed with: davidxu, jhb MFC after: 2 weeks	2010-07-04 11:48:30 +00:00
Alan Cox	41890423b6	Use vm_page_next() instead of vm_page_lookup() in exec_map_first_page() because vm_page_next() is faster.	2010-07-02 15:50:30 +00:00
John Baldwin	fc0de8f0b6	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.	2010-06-30 18:03:42 +00:00
John Baldwin	418a27e99e	Update comment for tdsignal() -> tdsendsignal() rename. Forgot to include this in 209592.	2010-06-30 18:00:45 +00:00
Alan Cox	f4b9ace4f8	Improve bufdone_finish()'s handling of the bogus page. Specifically, if one or more mappings to the bogus page must be replaced, call pmap_qenter() just once. Previously, pmap_qenter() was called for each mapping to the bogus page. MFC after: 3 weeks	2010-06-30 04:52:42 +00:00
John Baldwin	7a6f3d7890	Send SIGPIPE to the thread that issued the offending system call rather than to the entire process. Reported by: Anit Chakraborty Reviewed by: kib, deischen (concept) MFC after: 1 week	2010-06-29 20:44:19 +00:00
John Baldwin	ad6eec7b9e	Tweak the in-kernel API for sending signals to threads: - Rename tdsignal() to tdsendsignal() and make it private to kern_sig.c. - Add tdsignal() and tdksignal() routines that mirror psignal() and pksignal() except that they accept a thread as an argument instead of a process. They send a signal to a specific thread rather than to an individual process. Reviewed by: kib	2010-06-29 20:41:52 +00:00
Doug Barton	d748aee076	If i is going to be used in the loop unconditionally the declaration has to be unconditional as well. Conical head covering to: kib	2010-06-29 01:04:24 +00:00
Konstantin Belousov	13cedde2cb	Regenerate	2010-06-28 18:17:21 +00:00
Konstantin Belousov	0d9d996d39	Despite system call deregistration drains the threads executing System V shm syscalls, and initial check for the number of allocated segments in the module deinitialization code, the following might happen: after the check for active segment, while waiting for threads to leave some other syscall, shmget(2) is called. Then, we can end up with the shared segment that cannot be detached since sysvshm module is unloaded. Prevent the leak by rechecking and disclaiming a reference to the vm object owned by sysvshm module, that might have grown during the drain. Tested by: pho Reviewed by: jhb MFC after: 1 month	2010-06-28 18:12:42 +00:00
Konstantin Belousov	153ac44cf6	Count number of threads that enter and leave dynamically registered syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading. Reviewed by: jhb Tested by: pho MFC after: 1 month	2010-06-28 18:06:46 +00:00
Attilio Rao	b2488fc159	Fix a lock leak in the deadlock resolver in case the ticks counter wrapped up. Sponsored by: Sandvine Incorporated Submitted by: pluknet <pluknet at gmail dot com> Reported by: Anton Yuzhaninov <citrin at citrin dot ru> Reviewed by: jhb MFC after: 3 days	2010-06-28 17:45:00 +00:00
Jaakko Heinonen	bc96d3d17a	Correct a comment typo.	2010-06-27 12:19:09 +00:00
Pawel Jakub Dawidek	3297cdd096	Correct arguments order.	2010-06-26 21:44:45 +00:00
Michael Tuexen	e1c97831ec	* Do not dereference a NULL pointer when calling an SCTP send syscall not providing a destination address and using ktrace. * Do not copy out kernel memory when providing sinfo for sctp_recvmsg(). Both bug where reported by Valentin Nechayev. The first bug results in a kernel panic. MFC after: 3 days.	2010-06-26 19:26:20 +00:00
Nathan Whitehorn	bcebf6a165	Reverse the logic of the if statement that sets the default value of HZ; the list of 1000 Hz platforms was getting unwieldy. Suggested by: marcel	2010-06-24 00:27:20 +00:00
Nathan Whitehorn	e864acd42f	Move default HZ from 100 to 1000 on powerpc. Reviewed by: marcel MFC after: 2 weeks	2010-06-23 23:26:14 +00:00
Konstantin Belousov	699d648aab	Remove the support for int13 FPU exception reporting on i386. It is believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)	2010-06-23 11:12:58 +00:00
Alexander Motin	6519968e59	"time lock" is no longer a spin-lock since r209371. Reported by: kib@	2010-06-21 21:15:51 +00:00
Ed Schouten	60ae52f785	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.	2010-06-21 09:55:56 +00:00
Konstantin Belousov	c51050129f	Do not report a stack garbage as the old value for debug.ncores sysctl. Reported by: brucec	2010-06-21 09:51:25 +00:00
Alexander Motin	875b8844be	Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.	2010-06-20 21:33:29 +00:00
Pawel Jakub Dawidek	d32ef791eb	Backout r207970 for now, it can lead to deadlocks. Reported by: kan MFC after: 3 days	2010-06-17 17:39:51 +00:00
Rui Paulo	f05a947676	Make DTrace syscall provider work again by including opt_kdtrace.h here.	2010-06-17 17:34:45 +00:00
Jaakko Heinonen	24e8eaf191	- Fix compilation of the subr_unit.c user space test program. - Use %zu for size_t in a few format strings.	2010-06-17 16:12:06 +00:00
Andriy Gapon	e7154e7ef1	lock_profile_release_lock: do not compare unsigned with zero Found by: Coverity Prevent CID: 3660 Reviewed by: jhb MFC after: 2 weeks	2010-06-17 10:15:13 +00:00
Ed Schouten	2e983ace8f	Remove the unit argument from the recently added make_dev_p(). New code that creates character devices shouldn't use device unit numbers, but only si_drv[12] to hold pointer to per-device data. Make this function more future proof by removing the unit number argument. Discussed with: kib	2010-06-17 08:49:31 +00:00
Jaakko Heinonen	8fa17b7953	Correct the function name in a KASSERT.	2010-06-16 16:02:17 +00:00
Jung-uk Kim	547d94bde3	Implement flexible BPF timestamping framework. - Allow setting format, resolution and accuracy of BPF time stamps per listener. Previously, we were only able to use microtime(9). Now we can set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command. Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP command. Document all supported options in bpf(4) and their uses. - Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'. The new time stamp has both 64-bit second and fractional parts. bpf_xhdr has this time stamp instead of 'struct timeval' for bh_tstamp. The new structures let us use bh_tstamp of same size on both 32-bit and 64-bit platforms without adding additional shims for 32-bit binaries. On 64-bit platforms, size of BPF header does not change compared to bpf_hdr as its members are already all 64-bit long. On 32-bit platforms, the size may increase by 8 bytes. For backward compatibility, struct bpf_hdr with struct timeval is still the default header unless new time stamp format is explicitly requested. However, the behaviour may change in the future and all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now. - Add experimental support for tagging mbufs with time stamps from a lower layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs. The time stamps must be uptime in 'struct bintime' format as binuptime(9) and getbinuptime(9) do. Reviewed by: net@	2010-06-15 19:28:44 +00:00
Alexander Motin	93fc07b434	Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI (FSB interrupts) to be used by non-PCI devices, such as HPET.	2010-06-14 07:10:37 +00:00
Konstantin Belousov	f1bb758d4b	Add another variation of make_dev(9), make_dev_p(9), that is allowed to fail and can return useful error code. Requested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:22:39 +00:00
Konstantin Belousov	76d43557d8	When make_dev_credf(MAKEDEV_WAITOK) is called, use devctl_notify_f(M_WAITOK) for devfs notifications. Suggested by: jh Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:21:25 +00:00
Konstantin Belousov	bebc339116	Add modifications of devctl_notify(9) functions that take flags. Use flags to specify M_WAITOK/M_NOWAIT. M_WAITOK allows devctl to sleep for the memory allocation. As Warner noted, allowing the functions to sleep might cause reordering of the queued notifications. Reviewed by: imp, jh MFC after: 3 weeks	2010-06-12 13:20:38 +00:00
Andriy Gapon	1bdfff2252	fix a few cases where a string is passed via format argument instead of via %s Most of the cases looked harmless, but this is done for the sake of correctness. In one case it even allowed to drop an intermediate buffer. Found by: clang MFC after: 2 week	2010-06-11 19:27:21 +00:00
John Baldwin	3aa6d94e0c	Update several places that iterate over CPUs to use CPU_FOREACH().	2010-06-11 18:46:34 +00:00
Matthew D Fleming	d19511c357	Add INVARIANTS checking that numfreebufs values are sane. Also add a per-buf flag to catch if a buf is double-counted in the free count. This code was useful to debug an instance where a local patch at Isilon was incorrectly managing numfreebufs for a new buf state. Reviewed by: jeff Approved by: zml (mentor)	2010-06-11 17:03:26 +00:00
Ivan Voras	c1e34abff8	In another move to join with the age of the Fruitbat, increase SYSV shared resources defaults beyond absolute minimums. The new values are chosen mostly by magic. They are still fairly small and will need increasing for large installations (especially SHMMAX). However, they are now enough to e.g. start PostgreSQL installations with ~~300 users and nearly 512 MB of shared buffers. Reviewed by: A short discussion on hackers@	2010-06-11 09:27:33 +00:00
Alexander Motin	1f255bd340	Store interrupt trap frame into struct thread. It allows interrupt handler to obtain both trap frame and opaque argument submitted on registrction. After kernel and all drivers get used to it, legacy hack can be removed. Reviewed by: jhb@	2010-06-10 16:14:05 +00:00
Ivan Voras	a401f2d098	Unconfuse THREAD and SMT flags	2010-06-10 11:48:14 +00:00
Ivan Voras	5368befb66	Cosmetic change to XML - less ugly newlines	2010-06-10 11:01:17 +00:00
Konstantin Belousov	8d4a7be84d	Reorganize the code in bdwrite() which handles move of dirtiness from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks	2010-06-08 17:54:28 +00:00
John Baldwin	8545538b6a	Fix a sign bug that caused adaptive spinning in sx_xlock() to not work properly. Among other things it did not drop Giant while spinning leading to livelocks. Reviewed by: rookie, kib, jmallett MFC after: 3 days	2010-06-08 16:17:47 +00:00
Alexander Motin	e7d83347c0	Call BUS_PROBE_NOMATCH() when device detached due to driver unload. This allows bus to power-down device when driver unloaded on-flight.	2010-06-07 18:47:53 +00:00
Colin Percival	3beefaed5e	Declare ip6 as (struct in6_addr ) instead of (struct in_addr ). This is a harmless bug since we never actually use ip6 as anything other than an opaque pointer. Found with: Coverty Prevent(tm) CID: 4319 MFC after: 1 month	2010-06-04 14:38:24 +00:00
John Baldwin	3da35a0a52	Assert that the thread lock is held in sched_pctcpu() instead of recursively acquiring it. All of the current callers already hold the lock. MFC after: 1 month	2010-06-03 16:02:11 +00:00
Edward Tomasz Napierala	ce9d79aa61	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3688	2010-06-03 13:45:27 +00:00
Edward Tomasz Napierala	019b32dabd	The 'acl_cnt' field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3684	2010-06-03 13:43:58 +00:00
Edward Tomasz Napierala	c977cdf961	The acl_cnt field is unsigned; no point in checking if it's >= 0. Found with: Coverity Prevent CID: 3683	2010-06-03 13:41:55 +00:00
Konstantin Belousov	882da14c3d	Sometimes vnodes share the lock despite being different vnodes on different mount points, e.g. the nullfs vnode and the covered vnode from the lower filesystem. In this case, existing assertion in vop_rename_pre() may be triggered. Check for vnode locks equiality instead of the vnodes itself to not trip over the situation. Submitted by: Mikolaj Golub <to.my.trociny@gmail.com> Tested by: pho MFC after: 2 weeks	2010-06-03 10:20:08 +00:00
Alan Cox	c8fa870982	Minimize the use of the page queues lock for synchronizing access to the page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.	2010-06-02 15:46:37 +00:00
Konstantin Belousov	3286375480	Add a facility to dynamically adjust or unconfigure p1003_1b mib. Use it to allow to tune sem_nsem_max at runtime, only when sem.ko module is present in kernel. Requested and tested by: amdmi3 Reviewed by: jhb MFC after: 3 days	2010-06-02 09:59:05 +00:00
Zachary Loafman	121e802b07	Revert taskqueue(9) related commits until mdf@ is approved and can resolve issues. This reverts commits r207439, r208623, r208624	2010-06-01 16:04:01 +00:00
Zachary Loafman	911de7741d	Avoid a wakeup(9) if we can be sure no one is waiting on the task. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:34 +00:00
Zachary Loafman	6e86cdb85c	Revert r207439 and solve the problem differently. The task handler ta_func may free the task structure, so no references to its members are valid after the handler has been called. Using a per-queue member and having waits longer than strictly necessary was suggested by jhb. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, jhb	2010-05-28 18:15:28 +00:00
Robert Watson	e35973e4b8	When close() is called on a connected socket pair, SO_ISCONNECTED might be set but be cleared before the call to sodisconnect(). In this case, ENOTCONN is returned: suppress this error rather than returning it to userspace so that close() doesn't report an error improperly. PR: kern/144061 Reported by: Matt Reimer <mreimer at vpop.net>, Nikolay Denev <ndenev at gmail.com>, Mikolaj Golub <to.my.trociny at gmail.com> MFC after: 3 days	2010-05-27 15:27:31 +00:00
Attilio Rao	937912ea04	Add the support for reporting the NOCOREDUMP flag from sysctl_kern_proc_vmmap(). Sponsored by: Sandvine Incorporated Reviewed by: kib, emaste MFC after: 1 week	2010-05-27 08:10:12 +00:00
Konstantin Belousov	b2318c2860	Allow to use syscallname(9) outside subr_trap.c. MFC after: 1 month	2010-05-26 15:39:43 +00:00
John Baldwin	0bfbf4d220	Ignore the 'addr' argument passed to PT_STEP (it is required to be '1' for PT_STEP which means "ignore") and PT_DETACH. PR: kern/146167 MFC after: 1 week	2010-05-25 21:32:37 +00:00
Alan Cox	e98d019d3c	Eliminate the acquisition and release of the page queues lock from vfs_busy_pages(). It is no longer needed. Submitted by: kib	2010-05-25 02:26:25 +00:00
Alan Cox	567e51e18c	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)	2010-05-24 14:26:57 +00:00
Alexander Motin	dbd55f3ff0	- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.	2010-05-24 11:40:49 +00:00
Konstantin Belousov	41fd9c6369	Fix the double counting of the last process thread td_incruntime on exit, that is done once in thread_exit() and the second time in proc_reap(), by clearing td_incruntime. Use the opportunity to revert to the pre-RUSAGE_THREAD exporting of ruxagg() instead of ruxagg_locked() and use it from thread_exit(). Diagnosed and tested by: neel MFC after: 3 days	2010-05-24 10:23:49 +00:00
Konstantin Belousov	afe1a68827	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month	2010-05-23 18:32:02 +00:00
John Baldwin	e826ef1ec4	- Adjust the whitespace for the lines that output fields in 'show pcpu' in DDB so that all the fields line up. - Print out the tid of the per-CPU idlethread instead of the pid since the idle process is now shared across all idle threads. MFC after: 1 month	2010-05-21 17:17:56 +00:00
John Baldwin	1d7830edd5	Assert that the thread passed to sched_bind() and sched_unbind() is curthread as those routines are only supported for curthread currently. MFC after: 1 month	2010-05-21 17:15:56 +00:00
John Baldwin	07969f1d4d	Allow a const char * to be passed as the process name to kproc_kthread_add() without generating a warning. MFC after: 1 month	2010-05-21 17:14:36 +00:00
Konstantin Belousov	61e53a389f	Remove PIOLLHUP from the flags used to test for to set exceptfsd fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken. Reported by: Yoshihiko Sarumaru <ysarumaru gmail com> Discussed with: bde PR: ports/140934 MFC after: 2 weeks	2010-05-21 10:36:29 +00:00
Alan Cox	aa12e8b71d	The page queues lock is no longer required by vm_page_set_invalid(), so eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib	2010-05-18 16:40:29 +00:00
Randall Stewart	4542827d4d	This pushes all of JC's patches that I have in place. I am now able to run 32 cores ok.. but I still will hang on buildworld with a NFS problem. I suspect I am missing a patch for the netlogic rge driver. JC check and see if I am missing anything except your core-mask changes Obtained from: JC	2010-05-16 19:43:48 +00:00
Bjoern A. Zeeb	793f71bf2e	Fix an issue with the dynamic pcpu/vnet data allocators. We cannot expect that modspace is the last entry in the linker set and thus that modspace + possible extra space up to PAGE_SIZE would be contiguous. For the moment do not support more than _MODMIN space and ignore the extra space (). (*) We know how to get it back but it'll need testing. Discussed with: jeff, rwatson (briefly) Reviewed by: jeff Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 4 days	2010-05-14 21:11:58 +00:00
Zachary Loafman	7fd32ea923	Add VOP_ADVLOCKPURGE so that the file system is called when purging locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr	2010-05-12 21:24:46 +00:00
Pawel Jakub Dawidek	408a7c5093	When there is no memory or KVA, try to help by reclaiming some vnodes. This helps with 'kmem_map too small' panics. No objections from: kib Tested by: Alexander V. Ribchansky <shurik@zk.informjust.ua> MFC after: 1 week	2010-05-12 16:42:28 +00:00
Pawel Jakub Dawidek	c60c36a745	I added vfs_lowvnodes event, but it was only used for a short while and now it is totally unused. Remove it. MFC after: 3 days	2010-05-11 22:46:36 +00:00
Attilio Rao	98332c8c71	Right now, WITNESS just blindly pipes all the output to the (TOCONS \| TOLOG) mask even when called from DDB points. That breaks several output, where the most notable is textdump output. Fix this by having configurable callbacks passed to witness_list_locks() and witness_display_spinlock() for printing out datas. Reported by: several broken textdump outputs Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> MFC after: 7 days X-MFC: r207922	2010-05-11 18:24:22 +00:00
Attilio Rao	3caaaae046	There is not a good reason to have a different prototype for db_printf() when compared to printf(). Unify it by returning the number of characters displayed for db_printf() as well. MFC after: 7 days	2010-05-11 17:01:14 +00:00
Attilio Rao	de6648745c	Fix a hang introduced in r206878 for kernel compiled with SMP support but being not actual SMP and similar situations by always initializing the smp ipi mutex. Reported by: marius MFC after: 3 days X-MFC: r206878	2010-05-11 15:36:16 +00:00
Alan Cox	7d1d2ef60a	Update a comment: It no longer makes sense to talk about the page queues lock here.	2010-05-08 23:01:47 +00:00
Alan Cox	3c4a24406b	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.	2010-05-08 20:34:01 +00:00
Konstantin Belousov	d2ba618a63	Add MAKEDEV_NOWAIT flag to make_dev_credf(9), to create a device node in a no-sleep context. If resource allocation cannot be done without sleep, make_dev_credf() fails and returns NULL. Reviewed by: jh MFC after: 2 weeks	2010-05-06 19:22:50 +00:00
Alan Cox	eb00b276ab	Eliminate page queues locking around most calls to vm_page_free().	2010-05-06 18:58:32 +00:00
Edward Tomasz Napierala	77dda2b96f	Avoid overflow. Submitted by: bde@	2010-05-06 18:52:41 +00:00
Edward Tomasz Napierala	307d88b787	Style fixes and removal of unneeded variable. Submitted by: bde@	2010-05-06 18:43:19 +00:00
Alan Cox	f0c0d3998d	Remove page queues locking from all sf_buf_mext()-like functions. The page lock now suffices. Fix a couple nearby style violations.	2010-05-06 17:43:41 +00:00
Alan Cox	52683078a2	Eliminate a small bit of unneeded code from kern_sendfile(): While kern_sendfile() is running, the file's vm object can't be destroyed because kern_sendfile() increments the vm object's reference count. (Once kern_sendfile() decrements the reference count and returns, the vm object can, however, be destroyed. So, sf_buf_mext() must handle the case where the vm object is destroyed.) Reviewed by: kib	2010-05-06 15:52:08 +00:00
Joel Dahl	8e0ad55abb	Switch to our preferred 2-clause BSD license. Approved by: kmacy	2010-05-05 20:39:02 +00:00
Alan Cox	5ac59343be	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib	2010-05-05 18:16:06 +00:00
Edward Tomasz Napierala	b5f770bd86	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib	2010-05-05 16:44:25 +00:00
Konstantin Belousov	213c077f62	Fix a mistake in r207603. td_rux.rux_runtime still needs conversion. Reported and tested by: nwhitehorn Pointy hat to: kib MFC after: 6 days	2010-05-05 16:05:51 +00:00
Alan Cox	e3ef0d2fcf	Push down the acquisition of the page queues lock into vm_page_unwire(). Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib	2010-05-05 03:45:46 +00:00
Alan Cox	a7283d3213	Add page locking to the vm_page_cow* functions. Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib	2010-05-04 15:55:41 +00:00
Konstantin Belousov	9182554ae9	Fix typo in comment. MFC after: 3 days	2010-05-04 06:06:01 +00:00
Konstantin Belousov	603a4d7f41	Remove a comment that merely repeats code. Submitted by: bde MFC after: 1 week	2010-05-04 06:04:33 +00:00
Konstantin Belousov	03d13670c2	Use td_rux.rux_runtime for ki_runtime instead of redoing calculation. Submitted by: bde MFC after: 1 week	2010-05-04 06:00:39 +00:00
Konstantin Belousov	bed4c52416	Implement RUSAGE_THREAD. Add td_rux to keep extended runtime and ticks information for thread to allow calcru1() (re)use. Rename ruxagg()->ruxagg_locked(), ruxagg_tlock()->ruxagg() [1]. The ruxagg_locked() function no longer clears thread ticks nor td_incruntime. Requested by: attilio [1] Discussed with: attilio, bde Reviewed by: bde Based on submission by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week X-MFC-Note: td_rux shall be moved to the end of struct thread	2010-05-04 05:55:37 +00:00
Alan Cox	c5a648516e	Acquire the page lock around vm_page_unwire() and vm_page_wire(). Reviewed by: kib	2010-05-03 16:41:11 +00:00
Alan Cox	913814935a	This is the first step in transitioning responsibility for synchronizing access to the page's wire_count from the page queues lock to the page lock. Submitted by: kmacy	2010-05-03 05:41:50 +00:00
Konstantin Belousov	a0b8e597e5	Lock the page around hold_count access. Reviewed by: alc	2010-05-02 19:25:22 +00:00
Alan Cox	139a0de7f1	Properly synchronize access to the page's hold_count in vfs_vmio_release(). Reviewed by: kib	2010-05-02 19:10:27 +00:00
Alan Cox	b88b6c9d80	It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(), to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.	2010-05-02 17:33:46 +00:00
Marko Zec	a83baab6e4	Remove a redundant variable assignment. Reviewed by: bz, rwatson MFC after: 3 days	2010-05-01 18:34:50 +00:00
Konstantin Belousov	3087dc40a9	Extract thread_lock()/ruxagg()/thread_unlock() fragment into utility function ruxagg_tlock(). Convert the definition of kern_getrusage() to ANSI C. Submitted by: Alexander Krizhanovsky <ak natsys-lab com> MFC after: 1 week	2010-05-01 14:46:17 +00:00
Zachary Loafman	1dac222419	Handle taskqueue_drain(9) correctly on a threaded taskqueue: taskqueue_drain(9) will not correctly detect whether a task is currently running. The check is against a field in the taskqueue struct, but for a threaded queue with more than one thread, multiple threads can simultaneously be running a task, thus stomping over the tq_running field. Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: jhb Approved by: dfr (mentor)	2010-04-30 16:29:05 +00:00
Alfred Perlstein	b7402d8269	Avoid allocating MAXHOSTNAMELEN bytes on the stack in expand_name(), use the heap instead. Obtained from: Juniper Networks Reviewed by: jhb	2010-04-30 03:15:00 +00:00
Alfred Perlstein	fba6b1af2e	Don't leak core_buf or gzfile if doing a compressed core file and we hit an error condition. Obtained from: Juniper Networks	2010-04-30 03:13:24 +00:00
Alfred Perlstein	feb112c552	Do not set IO_NODELOCKED while writing to vnodes as our consumers do not lock the vnodes. Obtained from: Juniper Networks Reviewed by: jhb	2010-04-30 03:10:53 +00:00
Kip Macy	2965a45315	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib	2010-04-30 00:46:43 +00:00
Konstantin Belousov	e68d26fd5d	Remove caddr_t casts. Requested by: bde MFC after: 10 days	2010-04-29 09:55:51 +00:00
Andriy Gapon	4f27c5edfe	kern_ntptime: drop a comment that became stale after r207359 MFC after: 1 week X-MFC after: r207359	2010-04-29 09:18:36 +00:00
Andriy Gapon	5c7e270fcd	periodically save system time to hardware time-of-day clock This is done in kern_ntptime, perhaps not the best place. This is done using resettodr(). Some features: - make save period configurable via tunable and sysctl - period of zero disables saving, setting a non-zero period re-enables it or reschedules it - do saving only if system clock is ntp-synchronized - save on shutdown Discussed with: des, Peter Jeremy <peterjeremy@acm.org> X-Maybe: save time near seconds boundary for better precision MFC after: 2 weeks	2010-04-29 09:02:46 +00:00
Andriy Gapon	9a9ae42a43	kern_ntptime: abstract time error check into a function ... to avoid code duplication MFC after: 1 week	2010-04-29 09:02:21 +00:00

... 3 4 5 6 7 ...

12054 Commits