freebsd-dev

Author	SHA1	Message	Date
Brooks Davis	30f8de5ad0	MFP4: Change 221669 by bz@bz_zenith on 2013/02/01 12:26:04 Run the initialization for polling earlier along with INTRs so that we can put network interface into polling mode by default if DEVICE_POLLING is compiled in and no interrupts are available. MFC after: 3 days Sponsored by: DARPA/AFRL	2013-10-22 22:03:01 +00:00
Alexander Motin	9e8bd2acf2	Remove global device lock acquisition from dev_relthread(), replacing it with atomics on per-device data.	2013-10-22 10:40:26 +00:00
Alexander Motin	40ea77a036	Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months	2013-10-22 08:22:19 +00:00
Alexander Motin	18093155a3	Add comments that taskqueue_enqueue_locked() returns without the lock.	2013-10-21 21:16:50 +00:00
Konstantin Belousov	9110db818a	Add a resource limit for the total number of kqueues available to the user. Kqueue now saves the ucred of the allocating thread, to correctly decrement the counter on close. Under some specific and not real-world use scenario for kqueue, it is possible for the kqueues to consume memory proportional to the square of the number of the filedescriptors available to the process. Limit allows administrator to prevent the abuse. This is kernel-mode side of the change, with the user-mode enabling commit following. Reported and tested by: pho Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-10-21 16:44:53 +00:00
Konstantin Belousov	44f3b9c787	Print more useful information about the transfer that trigger the assertion. Other data is available with ddb command 'show pginfo'. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-10-21 16:17:46 +00:00
Alexander Motin	8160afdab1	MFprojects/camlock r256619: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods). Move mapping removal before calling bio_done() method. I believe that it is very wrong to do anything to BIO after reporting completion. kib@ thinks it was done for some forgotten now case when bio_done() method needed mapped buffer. But 1) if BIO was sent as unmapped, then IMO done() should be called in the same way; 2) IMO there is no guatantee that buffer will be mapped at this point at all, for example, if all underlying stack supports unmapped I/O, so bio_done() handler can not expect that.	2013-10-21 06:44:55 +00:00
Gleb Smirnoff	a9355dbe82	Revert r256587. Requested by: zec	2013-10-18 11:26:40 +00:00
Ed Maste	ec7935bfef	Error out on failure to open specified config file	2013-10-16 17:03:46 +00:00
Alexander Motin	77a30af6f8	MFprojects/camlock r256370: - Take BIO lock in biodone() only when there is no completion callback set and so we should wake up thread waiting in biowait(). - Remove msleep() timeout from biowait(). It was added 11 years ago, when there was no locks used, and it should not be needed any more.	2013-10-16 09:56:40 +00:00
Alexander Motin	6d545f4c8d	MFprojects/camlock r254763: Move tq_enqueue() call out of the queue lock for known handlers (actually I have found no others in the base system). This reduces queue lock hold time and congestion spinning under active multithreaded enqueuing.	2013-10-16 09:52:59 +00:00
Alexander Motin	1d1e92f102	MFprojects/camlock r254685: Remove TQ_FLAGS_PENDING flag, softly duplicating queue emptiness status.	2013-10-16 09:48:23 +00:00
Alexander Motin	e431d66c04	MFprojects/camlock r254905: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used.	2013-10-16 09:12:40 +00:00
Gleb Smirnoff	348298b1a3	For VIMAGE kernels store vnet in the struct task, and set vnet context during task processing. Reported & tested by: mm	2013-10-16 05:02:01 +00:00
Konstantin Belousov	eda6009c04	Add a sysctl kern.disallow_high_osrel which disables executing the images compiled on the world with higher major version number than the high version number of the booted kernel. Default to disable. Sponsored by: The FreeBSD Foundation Discussed with: bapt MFC after: 1 week	2013-10-15 06:38:40 +00:00
Konstantin Belousov	cd4dd444dd	By default, allow up to SSIZE_MAX i/o for non-devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 month X-MFC-note: stable/10 only	2013-10-15 06:35:22 +00:00
Konstantin Belousov	bf3e483b44	Similar to debug.iosize_max_clamp sysctl, introduce devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week	2013-10-15 06:33:10 +00:00
Mark Murray	cc4d059c03	Merge from project branch. Uninteresting commits are trimmed. Refactor of /dev/random device. Main points include: * Userland seeding is no longer used. This auto-seeds at boot time on PC/Desktop setups; this may need some tweeking and intelligence from those folks setting up embedded boxes, but the work is believed to be minimal. * An entropy cache is written to /entropy (even during installation) and the kernel uses this at next boot. * An entropy file written to /boot/entropy can be loaded by loader(8) * Hardware sources such as rdrand are fed into Yarrow, and are no longer available raw. ------------------------------------------------------------------------ r256240 \| des \| 2013-10-09 21:14:16 +0100 (Wed, 09 Oct 2013) \| 4 lines Add a RANDOM_RWFILE option and hide the entropy cache code behind it. Rename YARROW_RNG and FORTUNA_RNG to RANDOM_YARROW and RANDOM_FORTUNA. Add the RANDOM_* options to LINT. ------------------------------------------------------------------------ r256239 \| des \| 2013-10-09 21:12:59 +0100 (Wed, 09 Oct 2013) \| 2 lines Define RANDOM_PURE_RNDTEST for rndtest(4). ------------------------------------------------------------------------ r256204 \| des \| 2013-10-09 18:51:38 +0100 (Wed, 09 Oct 2013) \| 2 lines staticize struct random_hardware_source ------------------------------------------------------------------------ r256203 \| markm \| 2013-10-09 18:50:36 +0100 (Wed, 09 Oct 2013) \| 2 lines Wrap some policy-rich code in 'if NOTYET' until we can thresh out what it really needs to do. ------------------------------------------------------------------------ r256184 \| des \| 2013-10-09 10:13:12 +0100 (Wed, 09 Oct 2013) \| 2 lines Re-add /dev/urandom for compatibility purposes. ------------------------------------------------------------------------ r256182 \| des \| 2013-10-09 10:11:14 +0100 (Wed, 09 Oct 2013) \| 3 lines Add missing include guards and move the existing ones out of the implementation namespace. ------------------------------------------------------------------------ r256168 \| markm \| 2013-10-08 23:14:07 +0100 (Tue, 08 Oct 2013) \| 10 lines Fix some just-noticed problems: o Allow this to work with "nodevice random" by fixing where the MALLOC pool is defined. o Fix the explicit reseed code. This was correct as submitted, but in the project branch doesn't need to set the "seeded" bit as this is done correctly in the "unblock" function. o Remove some debug ifdeffing. o Adjust comments. ------------------------------------------------------------------------ r256159 \| markm \| 2013-10-08 19:48:11 +0100 (Tue, 08 Oct 2013) \| 6 lines Time to eat crow for me. I replaced the sx_* locks that Arthur used with regular mutexes; this turned out the be the wrong thing to do as the locks need to be sleepable. Revert this folly. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> (In original diff) ------------------------------------------------------------------------ r256138 \| des \| 2013-10-08 12:05:26 +0100 (Tue, 08 Oct 2013) \| 10 lines Add YARROW_RNG and FORTUNA_RNG to sys/conf/options. Add a SYSINIT that forces a reseed during proc0 setup, which happens fairly late in the boot process. Add a RANDOM_DEBUG option which enables some debugging printf()s. Add a new RANDOM_ATTACH entropy source which harvests entropy from the get_cyclecount() delta across each call to a device attach method. ------------------------------------------------------------------------ r256135 \| markm \| 2013-10-08 07:54:52 +0100 (Tue, 08 Oct 2013) \| 8 lines Debugging. My attempt at EVENTHANDLER(multiuser) was a failure; use EVENTHANDLER(mountroot) instead. This means we can't count on /var being present, so something will need to be done about harvesting /var/db/entropy/... . Some policy now needs to be sorted out, and a pre-sync cache needs to be written, but apart from that we are now ready to go. Over to review. ------------------------------------------------------------------------ r256094 \| markm \| 2013-10-06 23:45:02 +0100 (Sun, 06 Oct 2013) \| 8 lines Snapshot. Looking pretty good; this mostly works now. New code includes: * Read cached entropy at startup, both from files and from loader(8) preloaded entropy. Failures are soft, but announced. Untested. * Use EVENTHANDLER to do above just before we go multiuser. Untested. ------------------------------------------------------------------------ r256088 \| markm \| 2013-10-06 14:01:42 +0100 (Sun, 06 Oct 2013) \| 2 lines Fix up the man page for random(4). This mainly removes no-longer-relevant details about HW RNGs, reseeding explicitly and user-supplied entropy. ------------------------------------------------------------------------ r256087 \| markm \| 2013-10-06 13:43:42 +0100 (Sun, 06 Oct 2013) \| 6 lines As userland writing to /dev/random is no more, remove the "better than nothing" bootstrap mode. Add SWI harvesting to the mix. My box seeds Yarrow by itself in a few seconds! YMMV; more to follow. ------------------------------------------------------------------------ r256086 \| markm \| 2013-10-06 13:40:32 +0100 (Sun, 06 Oct 2013) \| 11 lines Debug run. This now works, except that the "live" sources haven't been tested. With all sources turned on, this unlocks itself in a couple of seconds! That is no my box, and there is no guarantee that this will be the case everywhere. * Cut debug prints. * Use the same locks/mutexes all the way through. * Be a tad more conservative about entropy estimates. ------------------------------------------------------------------------ r256084 \| markm \| 2013-10-06 13:35:29 +0100 (Sun, 06 Oct 2013) \| 5 lines Don't use the "real" assembler mnemonics; older compilers may not understand them (like when building CURRENT on 9.x). # Submitted by: Konstantin Belousov <kostikbel@gmail.com> ------------------------------------------------------------------------ r256081 \| markm \| 2013-10-06 10:55:28 +0100 (Sun, 06 Oct 2013) \| 12 lines SNAPSHOT. Simplify the malloc pools; We only need one for this device. Simplify the harvest queue. Marginally improve the entropy pool hashing, making it a bit faster in the process. Connect up the hardware "live" source harvesting. This is simplistic for now, and will need to be made rate-adaptive. All of the above passes a compile test but needs to be debugged. ------------------------------------------------------------------------ r256042 \| markm \| 2013-10-04 07:55:06 +0100 (Fri, 04 Oct 2013) \| 25 lines Snapshot. This passes the build test, but has not yet been finished or debugged. Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item) ------------------------------------------------------------------------ r255319 \| markm \| 2013-09-06 18:51:52 +0100 (Fri, 06 Sep 2013) \| 4 lines Yarrow wants entropy estimations to be conservative; the usual idea is that if you are certain you have N bits of entropy, you declare N/2. ------------------------------------------------------------------------ r255075 \| markm \| 2013-08-30 18:47:53 +0100 (Fri, 30 Aug 2013) \| 4 lines Remove short-lived idea; thread to harvest (eg) RDRAND enropy into the usual harvest queues. It was a nifty idea, but too heavyweight. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> ------------------------------------------------------------------------ r255071 \| markm \| 2013-08-30 12:42:57 +0100 (Fri, 30 Aug 2013) \| 4 lines Separate out the Software RNG entropy harvesting queue and thread into its own files. # Submitted by: Arthur Mesh <arthurmesh@gmail.com> ------------------------------------------------------------------------ r254934 \| markm \| 2013-08-26 20:07:03 +0100 (Mon, 26 Aug 2013) \| 2 lines Remove the short-lived namei experiment. ------------------------------------------------------------------------ r254928 \| markm \| 2013-08-26 19:35:21 +0100 (Mon, 26 Aug 2013) \| 2 lines Snapshot; Do some running repairs on entropy harvesting. More needs to follow. ------------------------------------------------------------------------ r254927 \| markm \| 2013-08-26 19:29:51 +0100 (Mon, 26 Aug 2013) \| 15 lines Snapshot of current work; 1) Clean up namespace; only use "Yarrow" where it is Yarrow-specific or close enough to the Yarrow algorithm. For the rest use a neutral name. 2) Tidy up headers; put private stuff in private places. More could be done here. 3) Streamline the hashing/encryption; no need for a 256-bit counter; 128 bits will last for long enough. There are bits of debug code lying around; these will be removed at a later stage. ------------------------------------------------------------------------ r254784 \| markm \| 2013-08-24 14:54:56 +0100 (Sat, 24 Aug 2013) \| 39 lines 1) example (partially humorous random_adaptor, that I call "EXAMPLE") * It's not meant to be used in a real system, it's there to show how the basics of how to create interfaces for random_adaptors. Perhaps it should belong in a manual page 2) Move probe.c's functionality in to random_adaptors.c * rename random_ident_hardware() to random_adaptor_choose() 3) Introduce a new way to choose (or select) random_adaptors via tunable "rngs_want" It's a list of comma separated names of adaptors, ordered by preferences. I.e.: rngs_want="yarrow,rdrand" Such setting would cause yarrow to be preferred to rdrand. If neither of them are available (or registered), then system will default to something reasonable (currently yarrow). If yarrow is not present, then we fall back to the adaptor that's first on the list of registered adaptors. 4) Introduce a way where RNGs can play a role of entropy source. This is mostly useful for HW rngs. The way I envision this is that every HW RNG will use this functionality by default. Functionality to disable this is also present. I have an example of how to use this in random_adaptor_example.c (see modload event, and init function) 5) fix kern.random.adaptors from kern.random.adaptors: yarrowpanicblock to kern.random.adaptors: yarrow,panic,block 6) add kern.random.active_adaptor to indicate currently selected adaptor: root@freebsd04:~ # sysctl kern.random.active_adaptor kern.random.active_adaptor: yarrow # Submitted by: Arthur Mesh <arthurmesh@gmail.com> Submitted by: Dag-Erling Smørgrav <des@FreeBSD.org>, Arthur Mesh <arthurmesh@gmail.com> Reviewed by: des@FreeBSD.org Approved by: re (delphij) Approved by: secteam (des,delphij)	2013-10-12 12:57:57 +00:00
John Baldwin	d251e7006b	Ignore attempts to set the nmbcluster sysctls to their current value rather than failing with an error. Reviewed by: andre Approved by: re (delphij) MFC after: 2 weeks	2013-10-10 16:11:34 +00:00
Mark Murray	72acff0f07	MFC - tracking commit.	2013-10-09 21:03:34 +00:00
Konstantin Belousov	acb9d2c7f0	The device vnodes are often unlocked when bread() or bwrite() is called. This probably should be fixed eventually, but for now it is not needed to try to flush such vnodes from the buffer allocation context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-09 18:45:01 +00:00
Konstantin Belousov	2c1531e746	Do not flush buffers when the v_object of the passed vnode does not really belong to it. Such vnodes, with the pointers to other vnodes v_objects, are typically instantiated by the bypass filesystems. Invalidating mappings of other vnode pages and the pages is wrong, since reclamation of the upper vnode does not imply that lower vnode is reclaimed too. One of the consequences of the improper reclamation was destruction of the wired mappings of the lower vnode pages, triggering miscellaneous assertions in the VM system. Reported by: John Marshall <john.marshall@riverwillow.com.au> Tested by: John Marshall <john.marshall@riverwillow.com.au>, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-09 18:43:29 +00:00
Konstantin Belousov	1744fe5048	When growing the file descriptor table, new larger memory chunk is allocated, but the old table is kept around to handle the case of threads still performing unlocked accesses to it. Grow the table exponentially instead of increasing its size by sizeof(long) * 8 chunks when overflowing. This mode significantly reduces the total memory use for the processes consuming large numbers of the file descriptors which open them one by one. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-09 18:41:35 +00:00
Konstantin Belousov	3625bde45d	Reduce code duplication, introduce the getmaxfd() helper to calculate the max filedescriptor index. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-10-09 18:39:44 +00:00
Mark Murray	371cbaafa8	MFC - tracking commit	2013-10-09 17:41:47 +00:00
Gleb Smirnoff	1d2df300e9	- Substitute sbdrop_internal() with sbcut_internal(). The latter doesn't free mbufs, but return chain of free mbufs to a caller. Caller can either reuse them or return to allocator in a batch manner. - Implement sbdrop()/sbdrop_locked() as a wrapper around sbcut_internal(). - Expose sbcut_locked() for outside usage. Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)	2013-10-09 11:57:53 +00:00
Dag-Erling Smørgrav	db3fcaf970	Add YARROW_RNG and FORTUNA_RNG to sys/conf/options. Add a SYSINIT that forces a reseed during proc0 setup, which happens fairly late in the boot process. Add a RANDOM_DEBUG option which enables some debugging printf()s. Add a new RANDOM_ATTACH entropy source which harvests entropy from the get_cyclecount() delta across each call to a device attach method.	2013-10-08 11:05:26 +00:00
Mark Murray	6e818c871f	Debugging. My attempt at EVENTHANDLER(multiuser) was a failure; use EVENTHANDLER(mountroot) instead. This means we can't count on /var being present, so something will need to be done about harvesting /var/db/entropy/... . Some policy now needs to be sorted out, and a pre-sync cache needs to be written, but apart from that we are now ready to go. Over to review.	2013-10-08 06:54:52 +00:00
Mark Murray	1a3c1f06dd	Snapshot. Looking pretty good; this mostly works now. New code includes: * Read cached entropy at startup, both from files and from loader(8) preloaded entropy. Failures are soft, but announced. Untested. * Use EVENTHANDLER to do above just before we go multiuser. Untested.	2013-10-06 22:45:02 +00:00
Mark Murray	12babbf219	MFC - tracking commit	2013-10-06 09:37:57 +00:00
Konstantin Belousov	505cdd82bf	Remove the uipc_cow.c file, which is not used since the zero copy sockets removal. Noted by: alc Sponsored by: The FreeBSD Foundation Approved by: re (delphij)	2013-10-06 06:57:28 +00:00
Alan Cox	61083fcc61	Tidy up kmeminit(): Since r245575, 'nmbclusters' is calculated after kmeminit() runs, so it contributes nothing to 'vm_kmem_size'; update a comment to reflect that r254025 replaced the kmem submap with the kmem arena. Reviewed by: kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division	2013-10-05 18:53:03 +00:00
Mark Murray	3c59587daa	MFC - tracking commit.	2013-10-04 07:00:59 +00:00
Mark Murray	f02e47dc1e	Snapshot. This passes the build test, but has not yet been finished or debugged. Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)	2013-10-04 06:55:06 +00:00
Sean Bruno	d3baefa809	Change len checks for fstypelen and fspathlen to be against absolute len not strlen as they are not strings. Discovered by GSOC student, Mike Ma <mikemandarine@gmail.com> during his fuse.glusterfs port to FreeBSD. Final patch from mckusick@ Submitted by: mckusick@ Approved by: re (hrs) MFC after: 2 weeks	2013-10-03 22:52:03 +00:00
Konstantin Belousov	432e79fc33	When helping the bufdaemon from the buffer allocation context, there is no sense to walk the whole dirty buffer queue. We are only interested in, and can operate on, the buffers owned by the current vnode [1]. Instead of calling generic queue flush routine, do VOP_FSYNC() if possible. Holding the dirty buffer queue lock in the bufdaemon, without dropping it, can cause starvation of buffer writes from other threads. This is esp. easy to reproduce on the big memory machines, where large files are written, causing almost all dirty buffers accumulating in several big files, which vnodes are locked by writers. Bufdaemon cannot flush any buffer, but is iterating over the whole dirty queue continuously. Since dirty queue mutex is not dropped, bufdone() in g_up thread is starved, usually deadlocking the machine [2]. Mitigate this by dropping the queue lock after the vnode is locked, allowing other queue lock contenders to make a progress. Discussed with: Jeff [1] Reported by: pho [2] Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (hrs)	2013-10-02 06:00:34 +00:00
Konstantin Belousov	d6498b153e	When printing the vnode information from ddb, print the lengths of the dirty and clean buffer queues. Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)	2013-10-01 20:18:33 +00:00
Konstantin Belousov	fe39412e99	For vunref(), try to upgrade the vnode lock if the function was called with the vnode shared-locked. If upgrade succeeded, the inactivation can be done immediately, instead of being postponed. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)	2013-09-29 18:07:14 +00:00
Konstantin Belousov	ac34145005	Reimplement r255797 using LK_TRYUPGRADE. The r255797 was: Increase the chance of the buffer write from the bufdaemon helper context to succeed. If the locked vnode which owns the buffer to be written is shared locked, try the non-blocking upgrade of the lock to exclusive. PR: kern/178997 Reported and tested by: Klaus Weber <fbsd-bugs-2013-1@unix-admin.de> Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)	2013-09-29 18:04:57 +00:00
Konstantin Belousov	7c6fe80353	Add LK_TRYUPGRADE operation for lockmgr(9), which attempts to atomically upgrade shared lock to exclusive. On failure, error is returned and lock is not dropped in the process. Tested by: pho (previous version) No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)	2013-09-29 18:02:23 +00:00
John-Mark Gurney	da9442ef43	it must be the last member, not might... Reviewed by: attilio Approved by: re (delphij, gjb)	2013-09-26 17:55:04 +00:00
Konstantin Belousov	9d2abcd01a	Do not allow negative timeouts for kqueue timers, check for the negative timeout both before and after the conversion to sbintime_t. For periodic kqueue timer, convert zero timeout into 1ms, to avoid interrupt storm on fast event timers. Reported and tested by: pho Discussed with: mav Reviewed by: davide Sponsored by: The FreeBSD Foundation Approved by: re (marius)	2013-09-26 13:17:31 +00:00
Konstantin Belousov	27884e3bd1	Acquire a hold reference on the vnode when a knote is instantiated. Otherwise, knote keeps a pointer to a vnode which could become invalid any time. Reported by: many Tested by: Patrick Lamaiziere <patfbsd@davenulle.org> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-09-26 13:14:51 +00:00
Davide Italiano	1b0c144fc2	Make the callout arithmetic more robust adding checks for overflow. Without these, if the timeout value passed is "large enough", the value of the sum of it and other factors (e.g. current time as returned by sbinuptime() or 'precision' argument) might result in a negative number. This negative number is then passed to eventtimers(4), which causes et_start() routine to load et_min_period into eventtimer, making the CPU where the thread is stuck forever in timer interrupt handler routine. This is now avoided rounding to INT64_MAX the timeout period in case of overflow. Reported by: kib, pho Discussed with: kib, mav Tested by: pho (stress2 suite, kevent7.sh scenario) Approved by: re (kib)	2013-09-26 10:06:50 +00:00
Attilio Rao	57a9eeb4ed	Avoid memory accesses reordering which can result in fget_unlocked() seeing a stale fd_ofiles table once fd_nfiles is already updated, resulting in OOB accesses. Approved by: re (kib) Sponsored by: EMC / Isilon storage division Reported and tested by: pho Reviewed by: benno	2013-09-25 13:37:52 +00:00
Alexander Motin	ea4af9c09a	Make load average sampling asynchronous to hardclock ticks. This improves measurement of load caused by time-related events still using hardclock. For example, without this change dummynet, scheduling events each hardclock tick, was always miscounted as load of 1. There is still aliasing with events delayed by the new precision mechanism, but it probably can't be avoided without moving this sampling from using callout to some lower-level code or handling it in some other special way. Reviewed by: davide Approved by: re (marius)	2013-09-24 07:03:16 +00:00
Dag-Erling Smørgrav	0f7bc112c0	Always request zeroed memory, in case we're dumb enough to leak it later. Approved by: re (gjb)	2013-09-22 23:47:56 +00:00
Konstantin Belousov	12af71a69f	Revert r255797. The LK_UPGRADE \| LK_NOWAIT drops the lock. Approved by: re (marius, implicit)	2013-09-22 20:29:03 +00:00
Konstantin Belousov	19f6a6a1ca	Pre-acquire the filedesc sx when a possibility exists that the later code could need to remove a kqueue from the filedesc list. Global lock is already locked, which causes sleepable after non-sleepable lock acquisition. Reported and tested by: pho Reviewed by: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2013-09-22 19:54:47 +00:00
Konstantin Belousov	d1f8ca485d	Increase the chance of the buffer write from the bufdaemon helper context to succeed. If the locked vnode which owns the buffer to be written is shared locked, try the non-blocking upgrade of the lock to exclusive. PR: kern/178997 Reported and tested by: Klaus Weber <fbsd-bugs-2013-1@unix-admin.de> Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)	2013-09-22 19:23:48 +00:00

1 2 3 4 5 ...

13493 Commits