freebsd-skq

Author	SHA1	Message	Date
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Konstantin Belousov	e8a4a618cf	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks	2013-03-14 20:18:12 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Marius Strobl	105421ff81	Merge r247814 from x86 modulo whitespace bug: Turn on the CTL disable tunable by default. This will allow GENERIC configurations to boot on small memory boxes, but not require end users who want to use CTL to recompile their kernel. They can simply set kern.cam.ctl.disable=0 in loader.conf.	2013-03-08 13:11:45 +00:00
Gavin Atkinson	9ec80eff4c	Correct two spelling mistakes in a comment.	2013-03-07 13:24:49 +00:00
Marius Strobl	e8aabc79db	- Revert the part of r247601 which turned the overtemperature and power fail interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep lock, which filters shouldn't do. It also seems that kern_reboot(9) still may require Giant to be hold. - Correct an incorrect argument to shutdown_nice(9). Submitted by: bde	2013-03-02 13:08:13 +00:00
Marius Strobl	562799bb30	Revert the part of r247600 which turned the overtemperature and power fail interrupt shutdown handlers into filters. Shutdown_nice(9) acquires a sleep lock, which filters shouldn't do. It also seems that kern_reboot(9) still may require Giant to be hold. Submitted by: bde	2013-03-02 13:04:58 +00:00
Marius Strobl	11be09b056	- Apparently, it's no longer a problem to call shutdown_nice(9) from within an interrupt filter (some other drivers in the tree do the same). So change the overtemperature and power fail interrupts from handlers in order to code and get rid of a !INTR_MPSAFE handlers. - Mark unused parameters as such. - Use NULL instead of 0 for pointers. MFC after: 1 week	2013-03-02 00:41:51 +00:00
Marius Strobl	7e026d15d5	- While Netra X1 generally show no ill effects when registering a power fail interrupt handler, there seems to be either a broken batch of them or a tendency to develop a defect which causes this interrupt to fire inadvertedly. Given that apart from this problem these machines work just fine, add a tunable allowing the setup of the power fail interrupt to be disabled. While at it, remove the DEBUGGER_ON_POWERFAIL compile time option and make that behavior also selectable via the newly added tunable. - Apparently, it's no longer a problem to call shutdown_nice(9) from within an interrupt filter (some other drivers in the tree do the same). So change the power fail interrupt from an handler in order to simplify the code and get rid of a !INTR_MPSAFE handler. - Use NULL instead of 0 for pointers. MFC after: 1 week	2013-03-02 00:37:31 +00:00
Marius Strobl	22f19117ed	- In sbbc_pci_attach() just pass the already obtained bus tag and handle instead of acquiring these anew. - Use NULL instead of 0 for pointers. MFC after: 1 week	2013-03-01 20:36:59 +00:00
Marius Strobl	1a5a52f1f9	- Remove an unused header. - Use NULL instead of 0 for pointers. - Let ofw_pcib_probe() return BUS_PROBE_DEFAULT instead of 0 so specialized PCI-PCI-bridge drivers may attach instead. - Add WARs for PLX Technology PEX 8114 bridges and PEX 8532 switches. Ideally, these should live in MI code but at least for the latter we're missing the necessary infrastructure there. MFC after: 1 week	2013-03-01 20:34:02 +00:00
Alexander Motin	fdc5dd2d2f	MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.	2013-02-28 13:46:03 +00:00
Attilio Rao	dc1558d1cd	Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2013-02-27 18:12:13 +00:00
Attilio Rao	590f9303e5	Merge from vmobj-rwlock branch: Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h. Sponsored by: EMC / Isilon storage division Tested by: pho Reviewed by: alc	2013-02-26 01:00:11 +00:00
Konstantin Belousov	dd0b4fb6d5	Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)	2013-02-12 16:57:20 +00:00
Konstantin Belousov	00efd852cc	The 'end' word was missed in the comment. MFC after: 3 days	2013-02-08 15:52:20 +00:00
Eitan Adler	4752ed3d7f	Remove support for plip from the GENERIC kernel as no systems in the last 10 years require this support. Discussed with: db Discussed with: kib Reviewed by: imp Reviewed by: jhb Reviewed by: -hackers Approved by: cperciva (mentor)	2013-02-01 20:17:11 +00:00
Marius Strobl	718c2b5b11	Revert the part of r239864 which removed obtaining the SMP mutex around reading registers from other CPUs. As it turns out, the hardware doesn't really like concurrent IPI'ing causing adverse effects. Also the thought deadlock when using this spin lock here and the targeted CPU(s) are also holding or in case of nested locks can't actually happen. This is due to the fact that on sparc64, spinlock_enter() only raises the PIL but doesn't disable interrupts completely. Thus direct cross calls as used for the register reading (and all other MD IPI needs) still will be executed by the targeted CPU(s) in that case. MFC after: 3 days	2013-01-23 22:52:20 +00:00
Marius Strobl	369109e254	Revert bogus part of r241740. Reported by: Michael Moll MFC after: 3 days	2013-01-03 23:12:08 +00:00
Konstantin Belousov	0dcbedfa61	Enable the UFS quotas for big-iron GENERIC kernels. Discussed with: mckusick MFC after: 2 weeks	2013-01-03 19:03:41 +00:00
Dag-Erling Smørgrav	36fca20f10	As discussed on -current last October, remove the firewire drivers from GENERIC.	2013-01-03 14:30:24 +00:00
Marius Strobl	d464a3641d	Revert r237842 and switch back to SCHED_ULE. All problems I encountered with the latter have been fixed with r241780. MFC after: 3 days	2012-12-16 20:54:07 +00:00
Konstantin Belousov	43f48b65c0	Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h to vm/vm_phys.h, where it belongs. Requested and reviewed by: alc MFC after: 2 weeks	2012-11-16 05:55:56 +00:00
Jeff Roberson	28d91af30f	- Implement run-time expansion of the KTR buffer via sysctl. - Implement a function to ensure that all preempted threads have switched back out at least once. Use this to make sure there are no stale references to the old ktr_buf or the lock profiling buffers before updating them. Reviewed by: marius (sparc64 parts), attilio (earlier patch) Sponsored by: EMC / Isilon Storage Division	2012-11-15 00:51:57 +00:00
Konstantin Belousov	b32ecf44bc	Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks	2012-11-14 20:01:40 +00:00
Dimitry Andric	29658c96ce	Remove duplicate const specifiers in many drivers (I hope I got all of them, please let me know if not). Most of these are of the form: static const struct bzzt_type { [...list of members...] } const bzzt_devs[] = { [...list of initializers...] }; The second const is unnecessary, as arrays cannot be modified anyway, and if the elements are const, the whole thing is const automatically (e.g. it is placed in .rodata). I have verified this does not change the binary output of a full kernel build (except for build timestamps embedded in the object files). Reviewed by: yongari, marius MFC after: 1 week	2012-11-05 19:16:27 +00:00
Attilio Rao	cfedf924d3	Rework the known rwlock to benefit about staying on their own cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris	2012-11-03 23:03:14 +00:00
Marius Strobl	4cf9e21ed8	- Give PIL_PREEMPT the lowest priority just above low/stray interrupts. The reason for this is that the SPARC v9 architecture allows nested interrupts of higher priority/level than that of the current interrupt to occur (and we can't just entirely bypass this model, also, at least for tick interrupts, this also wouldn't be wise). However, when a preemption interrupt interrupts another interrupt of lower priority, f.e. PIL_ITHREAD, and that one in turn is nested by a third interrupt, f.e. PIL_TICK, with SCHED_ULE the execution of interrupts higher than PIL_PREEMPT may be migrated to another CPU. In particular, tl1_ret(), which is responsible for restoring the state of the CPU prior to entry to the interrupt based on the (also migrated) trap frame, then is run on a CPU which actually didn't receive the interrupt in question, causing an inappropriate processor interrupt level to be "restored". In turn, this causes interrupts of the first level, i.e. PIL_ITHREAD in the above scenario, to be blocked on the target of the migration until the correct PIL happens to be restored again on that CPU again. Making PIL_PREEMPT the lowest real priority, this effectively prevents this scenario from happening, as preemption interrupts no longer can interrupt any other interrupt besides stray ones (which is no issue). Thanks to attilio@ and especially mav@ for helping me to understand this problem at the 201208DevSummit. - Give PIL_STOP (which is also used for IPI_STOP_HARD, given that there's no real equivalent to NMIs on SPARC v9) the highest possible priority just below the hardwired PIL_TICK, so it has a chance to interrupt more things. MFC after: 1 week	2012-10-20 12:07:48 +00:00
Marius Strobl	f79440de00	- Remove an unused header. - Don't waste a delay slot. MFC after: 3 days	2012-10-19 17:12:55 +00:00
Marius Strobl	6a91a98054	Let SCHED_ULE give affinity to the CPU the tick interrupt triggered on when running tick_process(), similarly to what the x86 equivalents of this function do, however employing the less racy sequence also used in intr_event_handle(). MFC after: 3 days	2012-10-19 13:32:37 +00:00
Attilio Rao	3a4730256a	Add an unified macro to deny ability from the compiler to reorder instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks	2012-10-09 14:32:30 +00:00
Attilio Rao	af2bdacafb	Reverts r234074,234105,234564,234723,234989,235231-235232 and part of r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius	2012-10-09 12:22:43 +00:00
Alan Cox	e4b8a2fc5a	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.	2012-09-28 05:30:59 +00:00
Eitan Adler	96240c89f0	Correct double "the the" Approved by: cperciva MFC after: 3 days	2012-09-14 21:28:56 +00:00
Attilio Rao	324e57150d	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week	2012-09-08 18:27:11 +00:00
Gavin Atkinson	915ae29a17	Prevent indent(1) from reformatting this comment, as it contains a formatting-sensitive table.	2012-09-07 08:18:06 +00:00
Marius Strobl	9e8100e77c	Add a global MD macro for the VIS block size instead of duplicating it and using magic values all over the place. MFC after: 1 week	2012-08-31 11:15:01 +00:00
Marius Strobl	bf38cf8ab3	- Unlike cache invalidation and TLB demapping IPIs, reading registers from other CPUs doesn't require locking so get rid of it. As the latter is used for the timecounter on certain machine models, using a spin lock in this case can lead to a deadlock with the upcoming callout(9) rework. - Merge r134227/r167250 from x86: Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB demapping IPIs. - Mark some unused function arguments as such. MFC after: 1 week	2012-08-29 16:56:50 +00:00
Glen Barber	67944c4572	Grammar fix: s/NIC's/NICs/ MFC after: 3 days	2012-08-26 01:21:02 +00:00
Marius Strobl	55afc4edf1	Merge r236494 from x86: Isolate the global TTE list lock from data and other locks to prevent false sharing within the cache. MFC after: 3 days	2012-08-05 22:03:13 +00:00
Marius Strobl	5f42fa1793	Switch back to the 4BSD scheduler for now. There is some more or less recent regression with ULE, causing processes to get stuck in getblk as well as interrupt handler execution delays to rise above the command timeout of mpt(4). MFC after: 3 days	2012-06-30 14:55:36 +00:00
Kenneth D. Merry	e36042794f	Now that the mps(4) driver is endian-safe, add it to the powerpc and sparc64 GENERIC config files. MFC after: 3 days	2012-06-28 20:48:24 +00:00
Alan Cox	e30df26e7b	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.	2012-06-27 03:45:25 +00:00
Andrew Turner	74dc547e24	Make the wchar_t type machine dependent. This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an unsigned short with the former preferred. Because of this requirement we need to move the definition of __wchar_t to a machine dependent header. It also cleans up the macros defining the limits of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine dependent header then using them to define WCHAR_MIN and WCHAR_MAX respectively. Discussed with: bde	2012-06-24 04:15:58 +00:00
Konstantin Belousov	aea810386d	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:06:40 +00:00
Konstantin Belousov	232aa31fb9	Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to timekeeping information. MFC after: 1 week	2012-06-22 06:38:31 +00:00
Alan Cox	6031c68de4	The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks	2012-06-16 18:56:19 +00:00
Alan Cox	b10ed4a911	Replace all uses of the vm page queues lock by a r/w lock that is private to this pmap.c. This new r/w lock is used primarily to synchronize access to the TTE lists. However, it will be used in a somewhat unconventional way. As finer-grained TTE list locking is added to each of the pmap functions that acquire this r/w lock, its acquisition will be changed from write to read, enabling concurrent execution of the pmap functions with finer-grained locking. Reviewed by: attilio Tested by: flo MFC after: 10 days	2012-05-29 01:52:38 +00:00
Marius Strobl	57e42723d8	Merge from x86: r232521 Exclude USB drivers (except umass and ukbd) from main kernel image.	2012-05-25 14:52:05 +00:00
Bjoern A. Zeeb	920b965865	MFp4 bz_ipv6_fast: in_cksum.h required ip.h to be included for struct ip. To be able to use some general checksum functions like in_addword() in a non-IPv4 context, limit the (also exported to user space) IPv4 specific functions to the times, when the ip.h header is present and IPVERSION is defined (to 4). We should consider more general checksum (updating) functions to also allow easier incremental checksum updates in the L3/4 stack and firewalls, as well as ponder further requirements by certain NIC drivers needing slightly different pseudo values in offloading cases. Thinking in terms of a better "library". Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days	2012-05-24 22:00:48 +00:00

1 2 3 4 5 ...

1985 Commits