freebsd-dev

Author	SHA1	Message	Date
Attilio Rao	e946b94934	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
Attilio Rao	ac6b769be9	Give mutex(9) the ability to recurse on a per-instance basis. Now the MTX_RECURSE flag can be passed to the mtx_*_flag() calls. This helps in cases we want to narrow down to specific calls the possibility to recurse for some locks. Sponsored by: EMC / Isilon storage division Reviewed by: jeff, alc Tested by: pho	2013-08-09 11:24:29 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Edward Tomasz Napierala	8ddc3590cc	Don't dereference null pointer should acl_alloc() be passed M_NOWAIT and allocation failed. Nothing in the tree passed M_NOWAIT. Obtained from: mjg MFC after: 1 month	2013-08-09 08:40:31 +00:00
Scott Long	f510415d84	Add a helpful message that can help point to why a sysctl tree removal failed Obtained from: Netflix MFC after: 3 days	2013-08-09 01:04:44 +00:00
Ryan Stone	08a42caa50	Allow drivers to return BUS_PROBE_NOWILDCARD from their attach routine to match devices where the driver class was fixed but the unit number was wildcarded. This better matches the documented behaviour in DEVICE_PROBE(9). Reviewed by: imp	2013-08-08 19:30:49 +00:00
John Baldwin	5b596f0f5f	Don't emit a spurious EVFILT_PROC event with no fflags set on process exit if NOTE_EXIT is not being monitored. The rationale is that a listener should only get an event for exit() if they registered interest via NOTE_EXIT. This matches the behavior on OS X. - Don't save the exit status on process exit unless NOTE_EXIT is being monitored. - Add an internal EV_DROP flag that requests kqueue_scan() to free the knote without signalling it to userland and use this when a process exits but the fflags in the knote is zero. Reviewed by: jmg MFC after: 1 month	2013-08-07 19:56:35 +00:00
Kevin Lo	3de1bd9502	Remove unsigned comparison < 0 Found by: LLVM Reviewed by: luigi	2013-08-07 07:22:56 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Konstantin Belousov	456597e7bd	Do not override the ENOENT error for the empty path, or EFAULT errors from copyins, with the relative lookup check. Discussed with: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-08-05 19:42:03 +00:00
Attilio Rao	be99683637	Revert r253939: We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib	2013-08-05 08:55:35 +00:00
Attilio Rao	3b6714cacb	The page hold mechanism is fast but it has couple of fallouts: - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho	2013-08-04 21:07:24 +00:00
Attilio Rao	878a788734	Remove unnecessary soft busy of the page before to do vn_rdwr() in kern_sendfile() which is unnecessary. The page is already wired so it will not be subjected to pagefault. The content cannot be effectively protected as it is full of races already. Multiple accesses to the same indexes are serialized through vn_rdwr(). Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: pho	2013-08-04 15:56:19 +00:00
Marcel Moolenaar	90aa031bf1	Add a tunable for the default timeout.	2013-08-03 04:25:25 +00:00
Gleb Smirnoff	977c7043eb	Remove extra zeroing after M_ZERO allocation.	2013-08-02 13:06:49 +00:00
Konstantin Belousov	1f3ad93be7	Remove unused malloc type. Requested by: alc MFC after: 1 week	2013-08-01 12:55:41 +00:00
Ian Lepore	6cbd933b37	Changes to allow using BOOTP_NFSROOT and mounting an nfs root filesystem other than the one specified by the BOOTP server. This configures NFS using the BOOTP protocol while also respecting other root-path options such as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT boot option. It allows you to override the root path provided by the server, or to supply a root path when the server provides IP configuration but no root path info. This maintains the historical BOOTP_NFSROOT behavior of panicking on a failure to mount the root path provided by the server, unless you've provided an alternative via the ROOTDEVNAME kernel option or by setting vfs.root.mountfrom. The behavior of panicking when given no other options is preserved because it amounts to a bit of a retry loop that could eventually recover from a transient network or server problem. The user can now override the root path from loader(8) even if the kernel is compiled with BOOTP_NFSROOT. If vfs.root.mountfrom is set in the environment it is used unconditionally -- it always overrides the BOOTP info. If it begins with [old]nfs: then the BOOTP code uses it instead of the server-provided info. If it specifies some other filesystem then the bootp code will not panic like it used to and the code in vfs_mountroot.c will invoke the right filesystem to do the mount. If the kernel is compiled with the ROOTDEVNAME option, then that name is used by the BOOTP code if either * The server doesn't provide a pathname. * The boothowto flags include RB_DFLTROOT. The latter allows the user to compile in alternate path in ROOTDEVNAME such as ufs:/dev/da0s1a and boot from that path by setting boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8). The one thing not provided here is automatic failover from a server-provided path to a compiled-in one without the user manually requesting that. The code just isn't currently structured in a way that makes that possible with a lot of rewrite. I think the ability to set vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server doesn't provide a name covers the most common needs. A set of patches submitted by Lars Eggert provided the part I couldn't figure out by myself when I tried to do this last year; many thanks. Reviewed by: rodrigc	2013-07-31 19:14:00 +00:00
Scott Long	fcd9ff2c67	Another fix for r253823; retain the default of 1 readahead block for sendfile. Submitted by: glebius Obtained from: Netflix MFC after: 3 days	2013-07-31 15:55:01 +00:00
Scott Long	de925dd31f	Fix r253823. Some WIP patches snuck in. Submitted by: zont	2013-07-30 23:50:09 +00:00
Scott Long	fc4a5f052b	Create a knob, kern.ipc.sfreadahead, that allows one to tune the amount of readahead that sendfile() will do. Default remains the same. Obtained from: Netflix MFC after: 3 days	2013-07-30 23:26:05 +00:00
Konstantin Belousov	2c38cc792e	When creation of the v_pollinfo raced and our instance of vpollinfo must be destroyed, knlist_clear() and seldrain() calls could be avoided, since vpollinfo was not used. More, the knlist_clear() calling protocol requires the knlist locked, which is not true at the call site. Split the destruction into the helper destroy_vpollinfo_free(), and call it when raced, instead of destroy_vpollinfo(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-07-28 06:59:29 +00:00
John Baldwin	5e3a17c0b9	Use VMFS_OPTIMAL_SPACE instead of VMFS_ALIGNED_SPACE in shm_map().	2013-07-24 20:34:25 +00:00
Marcel Moolenaar	ba7f4cdc97	Further restrict the MAC addresses that we use for UUID generation to those that are universally administered. While it is possible to add locally administered MAC addresses, it's unclear whether those are (expected) to be more unique than random multicast MAC addresses or not. With many U-Boot configurations assigning fixed and non-official MAC addresses to ethernet ports and without setting the 'X' flag, this change may have very little value in the embedded (development) space. Uniqueness of the universally administered addresses is non- existent on the (H/W) bench and questionable under the (S/W) desk. In short: this change is aimed at production environments...	2013-07-24 18:13:43 +00:00
Marcel Moolenaar	8ff6ca1e08	In uuid_ether_add(), avoid false positives due to the limited type used to hold the sum of the bytes of the MAC address. While here, rename the variable that holds the sum from 'c' to 'sum'. Pointed out by: thompsa@	2013-07-24 16:22:27 +00:00
Andriy Gapon	785797c341	rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST Also directly call swapper() at the end of mi_startup instead of relying on swapper being the last thing in sysinits order. Rationale: - "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage - "scheduler" was misleading, the function swaps in the swapped out processes - another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be invoked depending on its relative order with scheduler; this was not obvious and the bug actually used to exist Reviewed by: kib (ealier version) MFC after: 14 days	2013-07-24 09:45:31 +00:00
Gleb Smirnoff	9e3cc17647	Remove unused argument from vmem_add1(). Reviewed by: jeff	2013-07-24 08:02:56 +00:00
Marcel Moolenaar	ef1f916971	Decouple the UUID generator from network interfaces by having MAC addresses added to the UUID generator using uuid_ether_add(). The UUID generator keeps an arbitrary number of MAC addresses, under the assumption that they are rarely removed (= uuid_ether_del()). This achieves the following: 1. It brings up closer to having the network stack as a loadable module. 2. It allows the UUID generator to filter MAC addresses for best results (= highest chance of uniqeness). 3. MAC addresses can come from anywhere, irrespactive of whether it's used for an interface or not. A side-effect of the change is that when no MAC addresses have been added, a random multicast MAC address is created once and re-used if needed. Previusly, when a random MAC address was needed, it was created for every call. Thus, a change in behaviour is introduced for when no MAC addresses exist. Obtained from: Juniper Networks, Inc.	2013-07-24 04:24:21 +00:00
Gleb Smirnoff	e28a647db6	Revert r249590 and in case if mp_ncpus isn't initialized use MAXCPU. This allows us to init counter zone at early stage of boot. Reviewed by: kib Tested by: Lytochkin Boris <lytboris gmail.com>	2013-07-23 11:16:40 +00:00
Mateusz Guzik	b1051d92fe	Remove cr_prison NULL check from proc_to_reap. Userspace processes always have a prison. MFC after: 2 weeks	2013-07-22 02:07:15 +00:00
Mateusz Guzik	462314b3f9	Remove duplicate assertion from tdsendsignal. MFC after: 2 weeks	2013-07-22 00:44:37 +00:00
Konstantin Belousov	643ee87175	Implement compat32 wrappers for the ktimer_* syscalls. Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:43:52 +00:00
Konstantin Belousov	058262c69a	Wrap kmq_notify(2) for compat32 to properly consume struct sigevent32 argument. Reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:40:30 +00:00
Konstantin Belousov	9731998997	Move the convert_sigevent32() utility function into freebsd32_misc.c for consumption outside the vfs_aio.c. For SIGEV_THREAD_ID and SIGEV_SIGNAL notification delivery methods, also copy in the sigev_value, since librt event pumping loop compares note generation number with the value passed through sigev_value. Tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-21 19:33:48 +00:00
Konstantin Belousov	d31e4b3a58	id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2). Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180652 Sponsored by: The FreeBSD Foundation	2013-07-20 13:39:41 +00:00
John Baldwin	ff74a3fa6b	Be more aggressive in using superpages in all mappings of objects: - Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for vm_map_find() that will try to alter the alignment of a mapping to match any existing superpage mappings of the object being mapped. If no suitable address range is found with the necessary alignment, vm_map_find() will fall back to using the simple first-fit strategy (VMFS_ANY_SPACE). - Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE. Reviewed by: alc (earlier version) MFC after: 2 weeks	2013-07-19 19:06:15 +00:00
Konstantin Belousov	a8b0523ae7	Clear the vnode knotes before destroying vpollinfo. Reported and tested by: Patrick Lamaiziere <patfbsd@davenulle.org> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-07-17 10:56:21 +00:00
Gleb Smirnoff	59b9c4f289	Nuke mbstat. It wasn't used for mbuf statistics since FreeBSD 5. Now that r253351 moved sendfile() stats to a separate struct, the last field used in mbstat is m_mcfail, which is updated, but never read or obtained from userland.	2013-07-15 12:18:36 +00:00
Andrey V. Elsukov	05d1f5bce0	Introduce new structure sfstat for collecting sendfile's statistics and remove corresponding fields from struct mbstat. Use PCPU counters and SFSTAT_INC() macro for update these statistics. Discussed with: glebius	2013-07-15 06:16:57 +00:00
Craig Rodrigues	719fb72517	PR: 168520 170096 Submitted by: adrian, zec Fix multiple kernel panics when VIMAGE is enabled in the kernel. These fixes are based on patches submitted by Adrian Chadd and Marko Zec. (1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling device_attach(). This fixes multiple VIMAGE related kernel panics when trying to attach Bluetooth or USB Ethernet devices because curthread->td_vnet is NULL. (2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking interfaces, especially USB Ethernet devices. (3) Use VNET_DOMAIN_SET() in ng_btsocket.c (4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics when detaching Netgraph nodes.	2013-07-15 01:32:55 +00:00
Konstantin Belousov	982d771242	Assert that runningbufspace does not underflow. Sponsored by: The FreeBSD Foundation	2013-07-13 19:36:18 +00:00
Konstantin Belousov	da4ca6c8ab	There is no need to count waiters for the runningbufspace. Sponsored by: The FreeBSD Foundation	2013-07-13 19:34:34 +00:00
Konstantin Belousov	c7c536c7f5	Allow to call clock_gettime() on the clock id for zombie process. Reported by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180496 Sponsored by: The FreeBSD Foundation	2013-07-13 19:32:50 +00:00
Andre Oppermann	bc4a1b8ccd	Make use of the fact that uma_zone_set_max(9) already returns the rounded limit making a call to uma_zone_get_max(9) unnecessary. MFC after: 1 day	2013-07-11 12:53:13 +00:00
Andre Oppermann	e0c00adda2	Fix style issues, a typo in "kern.ipc.nmbufs" and correctly plave and expose the value of the tunable maxmbufmem as "kern.ipc.maxmbufmem" through sysctl. Reported by: smh MFC after: 1 day	2013-07-11 12:46:35 +00:00
Konstantin Belousov	92e5367354	Do not invalidate page of the B_NOCACHE buffer or buffer after an I/O error if any user wired mappings exist. Doing the invalidation destroys the user wiring. The change is the temporal measure to close the bug, the more proper fix is to delegate the invalidation of the page to upper layers always. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-07-11 05:36:26 +00:00
Marcel Moolenaar	8939c0693c	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. This change differs from r253224 in the following way: o The vfs_mounted handler is called before mountcheckdirs() and with newdp locked. vp is unlocked. o The event handlers are declared in <sys/eventhandler.h> and not in <sys/mount.h>. The <sys/mount.h> header is used in user land code that pretends to be kernel code and as such creates a very convoluted environment. It's hard to untangle. Submitted by: stevek@juniper.net Discussed with: pjd@ Obtained from: Juniper Networks, Inc.	2013-07-10 15:35:25 +00:00
Konstantin Belousov	cc3d8c35f5	There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2013-07-09 20:49:32 +00:00
Andriy Gapon	d6fc869ebd	should_yield: protect from td_swvoltick being uninitialized or too stale The distance between ticks and td_swvoltick should be calculated as an unsigned number. Previously we could end up comparing a negative number with hogticks in which case should_yield() would give incorrect answer. We should probably ensure that td_swvoltick is properly initialized. Sponsored by: HybridCluster MFC after: 5 days	2013-07-09 09:01:44 +00:00
Andriy Gapon	4633a4c379	namecache sdt: freebsd doesn't support structured characters yet :-) MFC after: 7 days	2013-07-09 08:58:34 +00:00
John Baldwin	c64bc3a076	Fix build with INVARIANT_SUPPORT enabled but not INVARIANTS. Reported by: "Matthew D. Fuller" <fullermd@over-yonder.net>	2013-07-08 21:17:20 +00:00
Alfred Perlstein	d7b5c50b92	Make kassert_printf use __printflike. Fix associated errors/warnings while I'm here. Requested by: avg	2013-07-07 21:39:37 +00:00
Jamie Gritton	1e7df84305	Make the comments a little more clear about PRIV_KMEM_*, explicitly referring to /dev/[k]mem and noting it's about opening the files rather than actually reading and writing. Reviewed by: jmallett	2013-07-06 00:10:52 +00:00
Jamie Gritton	c71e336230	Add new privileges, PRIV_KMEM_READ and PRIV_KMEM_WRITE, used in opening /dev/kmem and /dev/mem (in addition to traditional file permission checks). PRIV_KMEM_READ is different from other PRIV_* checks in that it's allowed by default. Reviewed by: kib, mckusick	2013-07-05 21:31:16 +00:00
Alfred Perlstein	3eebd44d0c	The change in r236456 (atomic_store_rel not locked) exposed a bug in the ithread code where we could lose ithread interrupts if intr_event_schedule_thread() is called while the ithread is already running. Effectively memory writes could be ordered incorrectly such that the unatomic write of 0 to ithd->it_need (in ithread_loop) could be delayed until after it was set to be triggered again by intr_event_schedule_thread(). This was particularly a big problem for CAM because CAM optimizes scheduling of ithreads such that it only signals camisr() when it queues to an empty queue. This means that additional completion events would not unstick CAM and quickly lead to a complete lockup of the CAM stack. To fix this use atomics in all places where ithd->it_need is accessed. Submitted by: delphij, mav Obtained from: TrueOS, iXsystems MFC After: 1 week	2013-07-04 05:53:05 +00:00
Mateusz Guzik	a82a370603	Fix receiving fd over unix socket broken in r247740. If n fds were passed, it would receive the first one n times. Reported by: Shawn Webb <lattera@gmail.com>, koobs, gleb Tested by: koobs, gleb Reviewed by: pjd	2013-07-02 07:36:04 +00:00
Mikolaj Golub	9e89077c65	Plug up the lock lock leakage when exporting to a short buffer. Reported by: Alexander Leidinger Submitted by: mjg MFC after: 1 week	2013-07-01 03:27:14 +00:00
Konstantin Belousov	70a7dd5d5b	Fix issues with zeroing and fetching the counters, on x86 and ppc64. Issues were noted by Bruce Evans and are present on all architectures. On i386, a counter fetch should use atomic read of 64bit value, otherwise carry from the increment on other CPU could be lost for the given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not available on the machine, it cannot be SMP and it is enough to disable preemption around read to avoid the split read. On x86 the counter increment is not atomic on purpose, which makes it possible for the store of the incremented result to override just zeroed per-cpu slot. The effect would be a counter going off by arbitrary value after zeroing. Perform the counter zeroing on the same processor which does the increments, making the operations mutually exclusive. On i386, same as for the fetching, if the cmpxchg8b is not available, machine is not SMP and we disable preemption for zeroing. PowerPC64 is treated the same as amd64. For other architectures, the changes made to allow the compilation to succeed, without fixing the issues with zeroing or fetching. It should be possible to handle them by using the 64bit loads and stores atomic WRT preemption (assuming the architectures also converted from using critical sections to proper asm). If architecture does not provide the facility, using global (spin) mutex would be non-optimal but working solution. Noted by: bde Sponsored by: The FreeBSD Foundation	2013-07-01 02:48:27 +00:00
Mateusz Guzik	f15ba03632	acct: create a special plimit object and set it for exiting processes instead of allocating new one each time All limits are set to RLIM_INFINITY which sould be ok (even though we care only about RLIMT_FSIZE in this case). MFC after: 1 week	2013-06-30 19:08:06 +00:00
Mateusz Guzik	4a3c4f41e9	acct: reduce code duplication by using acct_disable as cleanup for failed kproc_create MFC after: 1 week	2013-06-30 13:17:37 +00:00
Peter Wemm	0a943e59c9	Revert accidental commit. Pointy hat to: peter	2013-06-29 05:05:57 +00:00
Peter Wemm	76665df9bf	Help out gcc. clang understands. sys_generic.c:1510: warning: 'precision' may be used uninitialized *** [sys_generic.o] Error code 1	2013-06-29 04:35:04 +00:00
Davide Italiano	7563068847	Correct the comment above _sleep() function which still mentions 'timo' instead of 'sbintime_t'. Reported by: kan	2013-06-28 21:04:15 +00:00
Davide Italiano	237abf0c56	- Trim an unused and bogus Makefile for mount_smbfs. - Reconnect with some minor modifications, in particular now selsocket() internals are adapted to use sbintime units after recent'ish calloutng switch.	2013-06-28 21:00:08 +00:00
Mateusz Guzik	07bd8bf929	Remove duplicate NULL check in kern_proc_filedesc_out. No functional changes. MFC after: 1 week	2013-06-28 18:32:46 +00:00
Mikolaj Golub	6359d169ef	Rework r252313: The filedesc lock may not be dropped unconditionally before exporting fd to sbuf: fd might go away during execution. While it is ok for DTYPE_VNODE and DTYPE_FIFO because the export is from a vrefed vnode here, for other types it is unsafe. Instead, drop the lock in export_fd_to_sb(), after preparing data in memory and before writing to sbuf. Spotted by: mjg Suggested by: kib Review by: kib MFC after: 1 week	2013-06-28 18:07:41 +00:00
Ryan Stone	4a7d0bfcaa	Correct a bug that prevented deadlkres from (almost) ever firing. deadlkres was using a reversed test to check whether ticks had rolled over. This meant that deadlkres could only fire after ticks had rolled over. This test was actually unnecessary as deadlkres only ever took the difference of ticks values which is safe even in the presence of ticks rollover. Remove the tests entirely. Now deadlkres will properly fire after a lock has been held after the timeout period. MFC after: 1 month	2013-06-28 15:55:30 +00:00
Jeff Roberson	5f51836645	- Add a general purpose resource allocator, vmem, from NetBSD. It was originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines. Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-06-28 03:51:20 +00:00
John Baldwin	e35ce1f271	Make detaching drivers from PCI devices more robust. While here, fix a bug where a PCI device would be powered down if it failed to probe, but not when its driver was detached (e.g. via kldunload). - Add a new helper method resource_list_release_active() which forcefully releases any active resources of a specified type from a resource list. - Add a bus_child_detached method for the PCI bus driver which forces any active resources to be released (and whines to the console if it finds any) and then powers the device down. - Call pci_child_detached() if we fail to probe a device when a driver is kldloaded. This isn't perfect but can avoid leaking resources from a probe() routine in the kldload case. Reviewed by: imp, brooks MFC after: 1 month	2013-06-27 20:21:54 +00:00
Mikolaj Golub	bd973910c8	To avoid LOR, always drop the filedesc lock before exporting fd to sbuf. Reviewed by: kib MFC after: 3 days	2013-06-27 19:14:03 +00:00
John Baldwin	b5fb43e572	A few mostly cosmetic nits to aid in debugging: - Call lock_init() first before setting any lock_object fields in lock init routines. This way if the machine panics due to a duplicate init the lock's original state is preserved. - Somewhat similarly, don't decrement td_locks and td_slocks until after an unlock operation has completed successfully.	2013-06-25 20:23:08 +00:00
John Baldwin	cd32bd7ad1	Several improvements to rmlock(9). Many of these are based on patches provided by Isilon. - Add an rm_assert() supporting various lock assertions similar to other locking primitives. Because rmlocks track readers the assertions are always fully accurate unlike rw_assert() and sx_assert(). - Flesh out the lock class methods for rmlocks to support sleeping via condvars and rm_sleep() (but only while holding write locks), rmlock details in 'show lock' in DDB, and the lc_owner method used by dtrace. - Add an internal destroyed cookie so that API functions can assert that an rmlock is not destroyed. - Make use of rm_assert() to add various assertions to the API (e.g. to assert locks are held when an unlock routine is called). - Give RM_SLEEPABLE locks their own lock class and always use the rmlock's own lock_object with WITNESS. - Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping while holding a read lock on an rmlock. Submitted by: andre Obtained from: EMC/Isilon	2013-06-25 18:44:15 +00:00
Lawrence Stewart	0963c8e431	When a previous call to sbsndptr() leaves sb->sb_sndptroff at the start of an mbuf that was fully consumed by the previous call, the mbuf ptr returned by the current call ends up being the previous mbuf in the sb chain to the one that contains the data we want. This does not cause any observable issues because the mbuf copy routines happily walk the mbuf chain to get to the data at the moff offset, which in this case means they effectively skip over the mbuf returned by sbsndptr(). We can't adjust sb->sb_sndptr during the previous call for this case because the next mbuf in the chain may not exist yet. We therefore need to detect the condition and make the adjustment during the current call. Fix by detecting the special case of moff being at the start of the next mbuf in the chain and adjust the required accounting variables accordingly. Reviewed by: andre MFC after: 2 weeks	2013-06-19 03:08:01 +00:00
Lawrence Stewart	ec41a9a1bd	The fix committed in r250951 replaced the reported panic with a deadlock... gold star for me. EVENTHANDLER_DEREGISTER() attempts to acquire the lock which is held by the event handler framework while executing event handler functions, leading to deadlock. Move EVENTHANDLER_DEREGISTER() to alq_load_handler() and thus deregister the ALQ shutdown_pre_sync handler at module unload time, which takes care of the originally reported panic and fixes the deadlock introduced in r250951. Reported by: Luiz Otavio O Souza MFC after: 3 days X-MFC with: 250951	2013-06-17 09:49:07 +00:00
Ed Schouten	2381f6ef8c	Change callout use counter to use C11 atomics. In order to get some coverage of C11 atomics in kernelspace, switch at least one piece of code in kernelspace to use C11 atomics instead of <machine/atomic.h>. While there, slightly improve the code by adding an assertion to prevent the use count from going negative.	2013-06-16 09:30:35 +00:00
Lawrence Stewart	38f080cb04	Move hhook's per-vnet initialisation to an earlier SYSINIT SI_SUB stage to ensure all per-vnet related hhook initialisation is completed prior to any virtualised hhook points attempting registration. vnet_register_sysinit() requires that a stage later than SI_SUB_VNET be chosen. There are no per-vnet initialisors in the source tree at this time which run earlier than SI_SUB_INIT_IF. A quick audit of non-virtualised SYSINITs indicates there are no subsystems pre SI_SUB_MBUF that would likely be interested in registering a virtualised hhook point. Settle on SI_SUB_MBUF as hhook's per-vnet initialisation stage as it's the first overtly network-related initilisation stage to run after SI_SUB_VNET. If a subsystem that initialises earlier than SI_SUB_MBUF ends up wanting to register virtualised hhook points in future, hhook's use of SI_SUB_MBUF will need to be revisited and would probably warrant creating a dedicated SI_SUB_HHOOK which runs immediately after SI_SUB_VNET. MFC after: 1 week	2013-06-15 10:08:34 +00:00
Lawrence Stewart	933a8bff73	Cleanup and simplification in khelp_{register\|deregister}_helper(). No functional changes. MFC after: 1 week	2013-06-15 06:45:17 +00:00
Lawrence Stewart	58261d30e1	Add a private KPI between hhook and khelp that allows khelp modules to insert hook functions into hhook points which register after the modules were loaded - potentially useful during boot or if hhook points are dynamically registered. MFC after: 1 week	2013-06-15 05:57:29 +00:00
Lawrence Stewart	b1f53277ec	Internalise handling of virtualised hook points inside hhook_{add\|remove}_hook_lookup() so that khelp (and other potential API consumers) do not have to care when they attempt to (un)hook a particular hook point identified by id and type. Reviewed by: scottl MFC after: 1 week	2013-06-15 04:03:40 +00:00
Lawrence Stewart	bfe72a58e2	Fix a major oversight in r251732 which causes non-VIMAGE kernels to trigger a KASSERT during TCP hhook registration at boot. Virtualised hook points only require extra housekeeping and sanity checking when "options VIMAGE" is present. Reported by: bdrewery,jh,dhw Tested by: dhw MFC after: 1 week X-MFC with: 251732	2013-06-14 18:11:21 +00:00
Lawrence Stewart	601d4c7543	Add support for non-virtualised hhook points, which are uniquely identified by type and id, as compared to virtualised hook points which are now uniquely identified by type, id and a vid (which for vimage is the pointer to the vnet that the hhook resides in). All hhook_head structs for both virtualised and non-virtualised hook points coexist in hhook_head_list, and a separate list is maintained for hhook points within each vnet to simplify some vimage-related housekeeping. Reviewed by: scottl MFC after: 1 week	2013-06-14 04:10:34 +00:00
Lawrence Stewart	86241d89a9	Fix a potential NULL-pointer dereference that would trigger if the hhook registration site did not provide storage for a copy of the hhook_head struct. MFC after: 3 days	2013-06-14 02:25:40 +00:00
Jeff Roberson	17a2737732	- Add a BIT_FFS() macro and use it to replace cpusetffs_obj() Discussed with: attilio Sponsored by: EMC / Isilon Storage Division	2013-06-13 20:46:03 +00:00
Konstantin Belousov	1d7466bca4	Fix two issues with the spin loops in the umtx(2) implementation. - When looping, check for the pending suspension. Otherwise, other usermode thread which races with the looping one, could try to prevent the process from stopping or exiting. - Add missed checks for the faults from casuword*(). The code is structured in a way which makes the loops exit if the specified address is invalid, since both fuword() and casuword() return -1 on the fault. But if the address is mapped readonly, the typical value read by fuword() is different from -1, while casuword() returns -1. Absent the checks for casuword() faults, this is interpreted as the race with other thread and causes non-interruptible spinning in the kernel. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-06-13 09:33:22 +00:00
Marcel Moolenaar	4612275fdb	Revert r251590. It unexpectedly broke the build and there were some questions on locking. As part of commit-bit grooming, I'd like Steve to handle this, but can't leave things broken in the mean time.	2013-06-10 15:22:27 +00:00
Marcel Moolenaar	8c7ca16f63	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. Submitted by: stevek@juniper.net Obtained from: Juniper Networks, Inc	2013-06-09 23:51:26 +00:00
Gleb Smirnoff	8d1aa3c6b4	aio_mlock() added: - Regen for r251526. - Bump __FreeBSD_version.	2013-06-08 13:30:13 +00:00
Gleb Smirnoff	6160e12c10	Add new system call - aio_mlock(). The name speaks for itself. It allows to perform the mlock(2) operation, which can consume a lot of time, under control of aio(4). Reviewed by: kib, jilles Sponsored by: Nginx, Inc.	2013-06-08 13:27:57 +00:00
Gleb Smirnoff	f95c13db04	Separate LIO_SYNC processing into a separate function aio_process_sync(), and rename aio_process() into aio_process_rw(). Reviewed by: kib Sponsored by: Nginx, Inc.	2013-06-08 13:02:43 +00:00
John Baldwin	c9813d0a37	Do not compare the existing mask of a cpuset with a new mask when changing the mask of a cpuset. Also, change the cpuset's mask before updating the masks of all children. Previously changing a cpuset's mask first required setting the mask to a super-set of both the old and new masks and then changing it a second time to the new mask.	2013-06-06 14:43:19 +00:00
Alan Cox	27a18d6a23	Don't busy the page unless we are likely to release the object lock. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2013-06-06 06:17:20 +00:00
Jeff Roberson	ba39d89bc9	- Consolidate duplicate code into support functions. - Split the bqlock into bqclean and bqdirty locks. - Only acquire the wakeup synchronization locks when we cross a threshold requiring them. - Restructure the way flushbufqueues() targets work so they are more smp friendly and sane. Reviewed by: kib Discussed with: mckusick, attilio Sponsored by: EMC / Isilon Storage Division M vfs_bio.c	2013-06-05 23:53:00 +00:00
Gleb Smirnoff	82e825c4c9	Improve r250890, so that we stop processing of a message with zero descriptors as early as possible, and assert that number of descriptors is positive in unp_freerights(). Reviewed by: mjg, pjd, jilles	2013-06-04 11:19:08 +00:00
John Baldwin	24150d37d3	- Fix a couple of inverted panic messages for shared/exclusive mismatches of a lock within a single thread. - Fix handling of interlocks in WITNESS by properly requiring the interlock to be held exactly once if it is specified.	2013-06-03 17:41:11 +00:00
John Baldwin	95d28652af	- Handle the recursed/not recursed flags with RA_RLOCKED in rw_assert(). - Tweak a panic message.	2013-06-03 17:38:57 +00:00
Konstantin Belousov	d39116f5d5	Be more generous when donating the current thread time to the owner of the vnode lock while iterating over the free vnode list. Instead of yielding, pause for 1 tick. The change is reported to help in some virtualized environments. Submitted by: Roger Pau Monn? <roger.pau@citrix.com> Discussed with: jilles Tested by: pho MFC after: 2 weeks	2013-06-03 17:36:43 +00:00
Konstantin Belousov	1e65d73c74	Do not map the shared page COW. If the process wired its address space, fork(2) would cause shadowing of the physical object and copying of the shared page into private copy, effectively preventing updates for the exported timehands structure and stopping the clock. Specify the maximum allowed permissions for the page to be read and execute, preventing write from the user mode. Reported and tested by: <huanghwh@yahoo.com> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-06-03 04:32:53 +00:00
Konstantin Belousov	92fab43f7f	When auto-sizing the buffer cache, limit the amount of physical memory used as the estimation of size, to 32GB. This provides around 100K of buffer headers and corresponding KVA for buffer map at the peak. Sizing the cache larger is not useful, also resulting in the wasting and exhausting of KVA for large machines. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation	2013-06-03 04:16:48 +00:00
Alan Cox	39a4cd0cec	Reduce the scope of the VM object locking in brelse(). In my tests, this change reduced the total number of VM object lock acquisitions by brelse() by 74%. Sponsored by: EMC / Isilon Storage Division	2013-06-02 16:18:03 +00:00
Marius Strobl	0ad17e4b32	Move an assertion to the right spot; only bus_dmamap_load_mbuf(9) requires a pkthdr being present but that's not the case for either _bus_dmamap_load_mbuf_sg() or bus_dmamap_load_mbuf_sg(9). Reported by: sbruno MFC after: 1 week	2013-06-01 11:42:47 +00:00
John Baldwin	3d4c503cf0	Style fixes to vn_ioctl(). Suggested by: bde	2013-05-31 16:15:22 +00:00

1 2 3 4 5 ...

13382 Commits