freebsd-skq

Author	SHA1	Message	Date
John Baldwin	64414cc00f	Update the I/O MMU in bhyve when PCI devices are added and removed. When the I/O MMU is active in bhyve, all PCI devices need valid entries in the DMAR context tables. The I/O MMU code does a single enumeration of the available PCI devices during initialization to add all existing devices to a domain representing the host. The ppt(4) driver then moves pass through devices in and out of domains for virtual machines as needed. However, when new PCI devices were added at runtime either via SR-IOV or HotPlug, the I/O MMU tables were not updated. This change adds a new set of EVENTHANDLERS that are invoked when PCI devices are added and deleted. The I/O MMU driver in bhyve installs handlers for these events which it uses to add and remove devices to the "host" domain. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7667	2016-09-06 20:17:54 +00:00
John Baldwin	db4b3cdad8	Remove remnants of PERFMON and I586_PMC_GUPROF from amd64. These options were never fully ported over from i386.	2016-09-06 19:25:32 +00:00
John Baldwin	5fb03c3780	Leave ppt devices in the host domain when they are not attached to a VM. This allows a pass through device to be reset to a normal device driver on the host and reused on the host. ppt devices are now always active in some I/O MMU domain when the I/O MMU is active, either the host domain or the domain of a VM they are attached to. Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7666	2016-09-06 18:53:17 +00:00
Mark Johnston	dbbaf04f1e	Remove support for idle page zeroing. Idle page zeroing has been disabled by default on all architectures since r170816 and has some bugs that make it seemingly unusable. Specifically, the idle-priority pagezero thread exacerbates contention for the free page lock, and yields the CPU without releasing it in non-preemptive kernels. The pagezero thread also does not behave correctly when superpage reservations are enabled: its target is a function of v_free_count, which includes reserved-but-free pages, but it is only able to zero pages belonging to the physical memory allocator. Reviewed by: alc, imp, kib Differential Revision: https://reviews.freebsd.org/D7714	2016-09-03 20:38:13 +00:00
Alan Cox	53aadae680	As an optimization to the machine-independent layer, change the machine- dependent pmap_ts_referenced() so that it updates the page's dirty field if a modified bit is found while counting reference bits. This opportunistic update can be performed at low cost and can eliminate the need for some future calls to pmap_is_modified() by the machine- independent layer. Reviewed by: kib, markj MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7722	2016-09-01 15:57:44 +00:00
Bruce Evans	ef209971e9	Shorten banal comments about zeroing and copying pages. Don't give implementation details that last echoed the code 15-20 years ago. But add a detail about pagezero() on i386. Switch from Mach style to BSD style.	2016-08-29 14:38:31 +00:00
Bruce Evans	1a5735873e	On amd64, declare sse2_pagezero() and start using it again, but only for zeroing pages in idle where nontemporal writes are clearly best. This is almost a no-op since zeroing in idle works does nothing good and is off by default. Fix END() statement forgotten in previous commit. Align the loop in sse2_pagezero(). Since it writes to main memory, the loop doesn't have to be very carefully written to keep up. Unrolling it was considered useless or harmful and was not done on i386, but that was too careless. Timing for i386: the loop was not unrolled at all, and moved only 4 bytes/iteration. So on a 2GHz CPU, it needed to run at 2 cycles/ iteration to keep up with a memory speed of just 4GB/sec. But when it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/ iteration so it gave a maximum speed of 2.67GB/sec and couldn't even keep up with PC3200 memory. Fix the alignment so that it keep up with 4GB/sec memory, and unroll once to get nearer to 8GB/sec. Further unrolling might be useless or harmful since it would prevent the loop fitting in 16-bytes. My test system with an old CPU and old DDR1 only needed 5+ GB/sec. My test system with a new CPU and DDR3 doesn't need any changes to keep up ~16GB/sec. Timing for amd64: with 8-byte accesses and newer faster CPUs it is easy to reach 16GB/sec but not so easy to go much faster. The alignment doesn't matter much if the CPU is not very old. The loop was already unrolled 4 times, but needs 32 bytes and uses a fancy method that doesn't work for 2-way unrolling in 16 bytes. Just align it to 32-bytes.	2016-08-29 13:07:21 +00:00
Bruce Evans	537a47a1ba	Restore the nontemporal pagezero() under the name sse2_pagezero() (the same name as for i386). It is not reconnected yet. Which method is better is too machine-dependent and system-dependent to replace the old method unconditionally.	2016-08-29 06:07:43 +00:00
John Baldwin	ffe1b10d95	Enable I/O MMU when PCI pass through is first used. Rather than enabling the I/O MMU when the vmm module is loaded, defer initialization until the first attempt to pass a PCI device through to a guest. If the I/O MMU fails to initialize or is not present, than fail the attempt to pass a PCI device through to a guest. The hw.vmm.force_iommu tunable has been removed since the I/O MMU is no longer enabled during boot. However, the I/O MMU support can be disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent use of the I/O MMU on any systems where it is buggy. Reviewed by: grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D7448	2016-08-26 20:15:22 +00:00
Ed Schouten	22f2f875ad	Make execution of 32-bit CloudABI executables work on amd64. A nice thing about requiring a vDSO is that it makes it incredibly easy to provide full support for running 32-bit processes on 64-bit systems. Instead of letting the kernel be responsible for composing/decomposing 64-bit arguments across multiple registers/stack slots, all of this can now be done in the vDSO. This means that there is no need to provide duplicate copies of certain system calls, like the sys_lseek() and freebsd32_lseek() we have for COMPAT_FREEBSD32. This change imports a new vDSO from the CloudABI repository that has automatically generated code in it that copies system call arguments into a buffer, padding them to eight bytes and zero-extending any pointers/size_t arguments. After returning from the kernel, it does the inverse: extracting return values, in the process truncating pointers/size_t values to 32 bits. Obtained from: https://github.com/NuxiNL/cloudabi	2016-08-24 10:51:33 +00:00
Ed Schouten	48734c99d3	Convert pointers obtained from the threadattr_t structure with TO_PTR(). In all of these source files, the userspace pointer size corresponds with the kernelspace pointer size, meaning that casting directly works. As I'm planning on making 32-bit execution on 64-bit systems work as well, use TO_PTR() here as well, so that the changes between source files remain minimal.	2016-08-24 10:13:18 +00:00
John Baldwin	a47632d45b	Fix build for !SMP kernels after the Xen MSIX workaround. Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014 Reported by: Glyn Grinstead <glyn@grinstead.org> MFC after: 3 days	2016-08-22 21:23:17 +00:00
John Baldwin	c1c9764296	Remove the si(4) driver and sicontrol(8) for Specialix serial cards. The si(4) driver supported multiport serial adapters for ISA, EISA, and PCI buses. This driver does not use bus_space, instead it depends on direct use of the pointer returned by rman_get_virtual(). It is also still locked by Giant and calls for patch testing to convert it to use bus_space were unanswered. Relnotes: yes	2016-08-19 21:14:27 +00:00
Konstantin Belousov	b44d5b4a49	The pmap_delayed_invl_wait() function blocks on turnstile, it does not spin, in the committed version. Remove stray '*' in the text. Sponsored by: The FreeBSD Foundation. MFC after: 3 days	2016-08-11 12:37:11 +00:00
Ed Schouten	13b4b4df98	Provide the CloudABI vDSO to its executables. CloudABI executables already provide support for passing in vDSOs. This functionality is used by the emulator for OS X to inject system call handlers. On FreeBSD, we could use it to optimize calls to gettimeofday(), etc. Though I don't have any plans to optimize any system calls right now, let's go ahead and already pass in a vDSO. This will allow us to simplify the executables, as the traditional "syscall" shims can be removed entirely. It also means that we gain more flexibility with regards to adding and removing system calls. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D7438	2016-08-10 21:02:41 +00:00
Konstantin Belousov	e42f8233fc	Unconditionally perform checks that FPU region was entered, when #NM exception is caught in kernel mode. There are third-party modules which trigger the issue, and since the problem causes usermode state corruption at least, panic in production kernels as well. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-10 13:44:03 +00:00
John Baldwin	ad1d96ca7a	Don't permit mappings of invalid physical addresses on amd64 via /dev/mem. Discussed with: kib	2016-08-04 17:55:23 +00:00
John Baldwin	2de70600fa	Correct assertion on vcpuid argument to vm_gpa_hold(). PR: 208168 Submitted by: Dave Cameron <daverabbitz@ihug.co.nz> Reviewed by: grehan MFC after: 1 month	2016-08-03 15:20:10 +00:00
Konstantin Belousov	fa03524a9f	Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no difference between files. For pc98, put x86/mp_x86.c into the same place as used by i386 file list. Fix typo in comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 13:51:53 +00:00
Mateusz Guzik	b7ff4d59de	amd64: implement pagezero using rep stos The current implementation uses non-temporal writes. This turns out to be detrimental to performance if the page is used shortly after, which is the typical case with page faults. Switch to rep stos. Reviewed by: kib MFC after: 1 week	2016-07-31 11:34:08 +00:00
Brooks Davis	40018b91dd	Don't create pointless backups of generated files in "make sysent". Any sensible workflow will include a revision control system from which to restore the old files if required. In normal usage, developers just have to clean up the mess. Reviewed by: jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7353	2016-07-28 21:29:04 +00:00
Alexander Motin	fb112f72a8	Add more UEFI/e820 memory types from latest specifications. This is only cosmetics. MFC after: 2 weeks	2016-07-24 09:15:11 +00:00
Alexander Motin	4cefe96c6d	Increase number of I/O APIC pins from 24 to 32 to give PCI up to 16 IRQs. Move HPET to the top of the supported 0-31 range. Proposed by: jhb@, grehan@	2016-07-14 14:35:25 +00:00
Andriy Gapon	7a946127ef	remove a stray change from r302834 MFC after: 3 weeks X-MFC with: r302834	2016-07-14 11:13:26 +00:00
Andriy Gapon	a2d87b79cf	fix-up for configuration of AMD Family 10h processors borrowed from Linux http://lxr.free-electrons.com/source/arch/x86/kernel/cpu/amd.c#L643 BIOS may configure Family 10h processors to convert WC+ cache type to CD. That can hurt performance of guest VMs using nested paging. Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D6059	2016-07-14 11:03:05 +00:00
Eric Badger	fdb6320d45	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
Roger Pau Monné	302244700f	xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. Sponsored by: Citrix Systems R&D MFC after: 3 days Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D7148	2016-07-12 08:43:09 +00:00
Dmitry Chagin	97d06da692	Fix a copy/paste bug introduced during X86_64 Linuxulator work. FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation use READ_IMPLIES_EXEC flag, introduced in r302515. While here move common part of mmap() and mprotect() code to the files in compat/linux to reduce code dupcliation between Linuxulator's. Reported by: Johannes Jost Meixner, Shawn Webb MFC after: 1 week XMFC with: r302515, r302516	2016-07-10 08:22:04 +00:00
Dmitry Chagin	ab231b83ea	Regen for r302215 (Linux personality).	2016-07-10 08:17:16 +00:00
Dmitry Chagin	23e8912c60	Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.	2016-07-10 08:15:50 +00:00
Ed Schouten	d96aeddf2f	Don't forget to set sa->narg for CloudABI system calls. It turns out that this value is not used within the system call code under normal conditions, except when using tracing tools like ktrace. If we forget to set this value, it is set to random garbage. This may cause ktrace to hang indefinitely, making it impossible to kill. Reported by: Michael Plass PR: 210800 MFC before: 11.0-RELEASE	2016-07-08 20:09:21 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Konstantin Belousov	5c2cf81845	Update comments for the MD functions managing contexts for new threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:05:44 +00:00
Konstantin Belousov	b9c8a54dc3	Do not access pv_table array for fictitious pages, since the array does not cover the dynamically registered ficititious ranges, and fictitious pages mappings are not promoted. Offer a dummy struct md_page to fetch constant superpage pv list generation to satisfy logic. Also, by initializing the pv_dummy pv_list to empty, we can remove several explicit PG_FICTITIOUS tests. Reported and tested by: Michael Butler <imb@protected-networks.net> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6728 Approved by: re (hrs)	2016-06-13 03:45:08 +00:00
Konstantin Belousov	fc0924b9a4	Avoid spurious EINVAL in amd64 pmap_change_attr(). Do not try to change attributes for DMAP when working on a mapping which is not covered by the DMAP. This was reported on real system where a BAR of a device (NTB) was mapped outside the PCI window. Reported and tested by: mav Reviewed by: jhb, mav Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6668	2016-06-05 17:11:23 +00:00
Konstantin Belousov	4230ac2fa5	In pmap_advise(), avoid leaking DI start for EPT pmaps which needs A/D emulation. Assert that syscalls do not leak DI. Reported by: gjb Sponsored by: The FreeBSD Foundation	2016-05-27 18:45:11 +00:00
Jung-uk Kim	1eb6b86a44	Both Clang and GCC cannot generate efficient reserve_pv_entries(). http://docs.freebsd.org/cgi/mid.cgi?552BFEB2.8040407 Re-implement it entirely in inline assembly not to let compilers do silly spilling to memory. For non-POPCNT case, use newly added bit_count(3). Reported by: alc Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D6541	2016-05-25 23:06:52 +00:00
Jung-uk Kim	22d9c132f6	Document POPCNT erratum for 6th Generation Intel Core processors.	2016-05-23 23:00:47 +00:00
Dmitry Chagin	5437e1d103	Add macro to convert errno and use it when appropriate. MFC after: 1 week	2016-05-22 12:46:34 +00:00
Dmitry Chagin	f26a190f65	Regen after r300359 (struct l_sched_param removal). MFC after: 1 week	2016-05-21 08:03:13 +00:00
Dmitry Chagin	8cc96fb43a	Correct an argument param of linux_sched_* system calls as a struct l_sched_param does not defined due to it's nature. MFC after: 1 week	2016-05-21 08:01:14 +00:00
Konstantin Belousov	0bfad8e4a3	Check for overflow and return EINVAL if detected. Backport this and r300305 to i386. PR: 209661 Reported and reviewed by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-05-20 19:50:32 +00:00
Konstantin Belousov	ae76e30131	Use unsigned type for the loop index to make overflow checks effective. PR: 209661 Reported by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-05-20 15:32:48 +00:00
Eitan Adler	cef367e6a1	Don't repeat the the word 'the' (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)	2016-05-17 12:52:31 +00:00
Sepherosa Ziehau	dfdc9a05c6	atomic: Add testandclear on i386/amd64 Reviewed by: kib Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6381	2016-05-16 07:19:33 +00:00
Konstantin Belousov	56e61f57b0	Eliminate pvh_global_lock from the amd64 pmap. The only current purpose of the pvh lock was explained there On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote: > Let me lay out one example for you in detail. Suppose that we have > three processors and two of these processors are actively using the same > pmap. Now, one of the two processors sharing the pmap performs a > pmap_remove(). Suppose that one of the removed mappings is to a > physical page P. Moreover, suppose that the other processor sharing > that pmap has this mapping cached with write access in its TLB. Here's > where the trouble might begin. As you might expect, the processor > performing the pmap_remove() will acquire the fine-grained lock on the > PV list for page P before destroying the mapping to page P. Moreover, > this processor will ensure that the vm_page's dirty field is updated > before releasing that PV list lock. However, the TLB shootdown for this > mapping may not be initiated until after the PV list lock is released. > The processor performing the pmap_remove() is not problematic, because > the code being executed by that processor won't presume that the mapping > is destroyed until the TLB shootdown has completed and pmap_remove() has > returned. However, the other processor sharing the pmap could be > problematic. Specifically, suppose that the third processor is > executing the page daemon and concurrently trying to reclaim page P. > This processor performs a pmap_remove_all() on page P in preparation for > reclaiming the page. At this instant, the PV list for page P may > already be empty but our second processor still has a stale TLB entry > mapping page P. So, changes might still occur to the page after the > page daemon believes that all mappings have been destroyed. (If the PV > entry had still existed, then the pmap lock would have ensured that the > TLB shootdown completed before the pmap_remove_all() finished.) Note, > however, the page daemon will know that the page is dirty. It can't > possibly mistake a dirty page for a clean one. However, without the > current pvh global locking, I don't think anything is stopping the page > daemon from starting the laundering process before the TLB shootdown has > completed. > > I believe that a similar example could be constructed with a clean page > P' and a stale read-only TLB entry. In this case, the page P' could be > "cached" in the cache/free queues and recycled before the stale TLB > entry is flushed. TLBs for addresses with updated PTEs are always flushed before pmap lock is unlocked. On the other hand, amd64 pmap code does not always flushes TLBs before PV list locks are unlocked, if previously PTEs were cleared and PV entries removed. To handle the situations where a thread might notice empty PV list but third thread still having access to the page due to TLB invalidation not finished yet, introduce delayed invalidation. Comparing with the pvh_global_lock, DI does not block entered thread when pmap_remove_all() or pmap_remove_write() (callers of pmap_delayed_invl_wait()) are executed in parallel. But _invl_wait() callers are blocked until all previously noted DI blocks are leaved, thus ensuring that neccessary TLB invalidations were performed before returning from pmap_remove_all() or pmap_remove_write(). See comments for detailed description of the mechanism, and also for the explanations why several pmap methods, most important pmap_enter(), do not need DI protection. Reviewed by: alc, jhb (turnstile KPI usage) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5747	2016-05-14 23:35:11 +00:00
Alan Cox	d3ffaee8e6	Eliminate an unused #include. For a brief period of time, _unrhdr.h was used to implement PCID support on amd64. Reviewed by: kib	2016-05-13 20:14:41 +00:00
Konstantin Belousov	aa3ec63e02	Add locking annotations to amd64 struct md_page members. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-10 09:58:51 +00:00
John Baldwin	8d791e5af1	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).	2016-05-09 20:50:21 +00:00
Dmitry Chagin	8521d01a71	Add a forgotten in r283424 .eh_frame section with CFI & FDE records to allow stack unwinding through signal handler. Reported by: Dmitry Sivachenko MFC after: 2 weeks	2016-05-09 07:38:47 +00:00

1 2 3 4 5 ...

7282 Commits