freebsd-skq

Author	SHA1	Message	Date
royger	844ce8697a	xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. Sponsored by: Citrix Systems R&D MFC after: 3 days Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D7148	2016-07-12 08:43:09 +00:00
dchagin	c93d4a7bde	Fix a copy/paste bug introduced during X86_64 Linuxulator work. FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation use READ_IMPLIES_EXEC flag, introduced in r302515. While here move common part of mmap() and mprotect() code to the files in compat/linux to reduce code dupcliation between Linuxulator's. Reported by: Johannes Jost Meixner, Shawn Webb MFC after: 1 week XMFC with: r302515, r302516	2016-07-10 08:22:04 +00:00
dchagin	7acd3da18d	Regen for r302215 (Linux personality).	2016-07-10 08:17:16 +00:00
dchagin	50efd461d3	Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.	2016-07-10 08:15:50 +00:00
ed	887bfdc0a4	Don't forget to set sa->narg for CloudABI system calls. It turns out that this value is not used within the system call code under normal conditions, except when using tracing tools like ktrace. If we forget to set this value, it is set to random garbage. This may cause ktrace to hang indefinitely, making it impossible to kill. Reported by: Michael Plass PR: 210800 MFC before: 11.0-RELEASE	2016-07-08 20:09:21 +00:00
nwhitehorn	89d01c24d1	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
kib	496a3b1f65	Update comments for the MD functions managing contexts for new threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:05:44 +00:00
kib	7e7c56668a	Do not access pv_table array for fictitious pages, since the array does not cover the dynamically registered ficititious ranges, and fictitious pages mappings are not promoted. Offer a dummy struct md_page to fetch constant superpage pv list generation to satisfy logic. Also, by initializing the pv_dummy pv_list to empty, we can remove several explicit PG_FICTITIOUS tests. Reported and tested by: Michael Butler <imb@protected-networks.net> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6728 Approved by: re (hrs)	2016-06-13 03:45:08 +00:00
kib	d80c39f48c	Avoid spurious EINVAL in amd64 pmap_change_attr(). Do not try to change attributes for DMAP when working on a mapping which is not covered by the DMAP. This was reported on real system where a BAR of a device (NTB) was mapped outside the PCI window. Reported and tested by: mav Reviewed by: jhb, mav Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6668	2016-06-05 17:11:23 +00:00
kib	b049cb19c0	In pmap_advise(), avoid leaking DI start for EPT pmaps which needs A/D emulation. Assert that syscalls do not leak DI. Reported by: gjb Sponsored by: The FreeBSD Foundation	2016-05-27 18:45:11 +00:00
jkim	45ae491494	Both Clang and GCC cannot generate efficient reserve_pv_entries(). http://docs.freebsd.org/cgi/mid.cgi?552BFEB2.8040407 Re-implement it entirely in inline assembly not to let compilers do silly spilling to memory. For non-POPCNT case, use newly added bit_count(3). Reported by: alc Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D6541	2016-05-25 23:06:52 +00:00
jkim	63911f4577	Document POPCNT erratum for 6th Generation Intel Core processors.	2016-05-23 23:00:47 +00:00
dchagin	791b4b1122	Add macro to convert errno and use it when appropriate. MFC after: 1 week	2016-05-22 12:46:34 +00:00
dchagin	d8b7958da0	Regen after r300359 (struct l_sched_param removal). MFC after: 1 week	2016-05-21 08:03:13 +00:00
dchagin	4f09fcdf95	Correct an argument param of linux_sched_* system calls as a struct l_sched_param does not defined due to it's nature. MFC after: 1 week	2016-05-21 08:01:14 +00:00
kib	515614230c	Check for overflow and return EINVAL if detected. Backport this and r300305 to i386. PR: 209661 Reported and reviewed by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-05-20 19:50:32 +00:00
kib	b357843884	Use unsigned type for the loop index to make overflow checks effective. PR: 209661 Reported by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-05-20 15:32:48 +00:00
eadler	156fd4834a	Don't repeat the the word 'the' (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)	2016-05-17 12:52:31 +00:00
sephe	6babf96582	atomic: Add testandclear on i386/amd64 Reviewed by: kib Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6381	2016-05-16 07:19:33 +00:00
kib	b4ddfd6167	Eliminate pvh_global_lock from the amd64 pmap. The only current purpose of the pvh lock was explained there On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote: > Let me lay out one example for you in detail. Suppose that we have > three processors and two of these processors are actively using the same > pmap. Now, one of the two processors sharing the pmap performs a > pmap_remove(). Suppose that one of the removed mappings is to a > physical page P. Moreover, suppose that the other processor sharing > that pmap has this mapping cached with write access in its TLB. Here's > where the trouble might begin. As you might expect, the processor > performing the pmap_remove() will acquire the fine-grained lock on the > PV list for page P before destroying the mapping to page P. Moreover, > this processor will ensure that the vm_page's dirty field is updated > before releasing that PV list lock. However, the TLB shootdown for this > mapping may not be initiated until after the PV list lock is released. > The processor performing the pmap_remove() is not problematic, because > the code being executed by that processor won't presume that the mapping > is destroyed until the TLB shootdown has completed and pmap_remove() has > returned. However, the other processor sharing the pmap could be > problematic. Specifically, suppose that the third processor is > executing the page daemon and concurrently trying to reclaim page P. > This processor performs a pmap_remove_all() on page P in preparation for > reclaiming the page. At this instant, the PV list for page P may > already be empty but our second processor still has a stale TLB entry > mapping page P. So, changes might still occur to the page after the > page daemon believes that all mappings have been destroyed. (If the PV > entry had still existed, then the pmap lock would have ensured that the > TLB shootdown completed before the pmap_remove_all() finished.) Note, > however, the page daemon will know that the page is dirty. It can't > possibly mistake a dirty page for a clean one. However, without the > current pvh global locking, I don't think anything is stopping the page > daemon from starting the laundering process before the TLB shootdown has > completed. > > I believe that a similar example could be constructed with a clean page > P' and a stale read-only TLB entry. In this case, the page P' could be > "cached" in the cache/free queues and recycled before the stale TLB > entry is flushed. TLBs for addresses with updated PTEs are always flushed before pmap lock is unlocked. On the other hand, amd64 pmap code does not always flushes TLBs before PV list locks are unlocked, if previously PTEs were cleared and PV entries removed. To handle the situations where a thread might notice empty PV list but third thread still having access to the page due to TLB invalidation not finished yet, introduce delayed invalidation. Comparing with the pvh_global_lock, DI does not block entered thread when pmap_remove_all() or pmap_remove_write() (callers of pmap_delayed_invl_wait()) are executed in parallel. But _invl_wait() callers are blocked until all previously noted DI blocks are leaved, thus ensuring that neccessary TLB invalidations were performed before returning from pmap_remove_all() or pmap_remove_write(). See comments for detailed description of the mechanism, and also for the explanations why several pmap methods, most important pmap_enter(), do not need DI protection. Reviewed by: alc, jhb (turnstile KPI usage) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5747	2016-05-14 23:35:11 +00:00
alc	dcd0f3bb88	Eliminate an unused #include. For a brief period of time, _unrhdr.h was used to implement PCID support on amd64. Reviewed by: kib	2016-05-13 20:14:41 +00:00
kib	05241d701e	Add locking annotations to amd64 struct md_page members. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-10 09:58:51 +00:00
jhb	6bae79f884	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).	2016-05-09 20:50:21 +00:00
dchagin	c62a8ee2d0	Add a forgotten in r283424 .eh_frame section with CFI & FDE records to allow stack unwinding through signal handler. Reported by: Dmitry Sivachenko MFC after: 2 weeks	2016-05-09 07:38:47 +00:00
jhb	eb663acb54	Native PCI-express HotPlug support. PCI-express HotPlug support is implemented via bits in the slot registers of the PCI-express capability of the downstream port along with an interrupt that triggers when bits in the slot status register change. This is implemented for FreeBSD by adding HotPlug support to the PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges representing downstream ports on HotPlug slots. The PCI-PCI bridge driver registers an interrupt handler to receive HotPlug events. It also uses the slot registers to determine the current HotPlug state and drive an internal HotPlug state machine. For simplicty of implementation, the PCI-PCI bridge device detaches and deletes the child PCI device when a card is removed from a slot and creates and attaches a PCI child device when a card is inserted into the slot. The PCI-PCI bridge driver provides a bus_child_present which claims that child devices are present on HotPlug-capable slots only when a card is inserted. Rather than requiring a timeout in the RC for config accesses to not-present children, the pcib_read/write_config methods fail all requests when a card is not present (or not yet ready). These changes include support for various optional HotPlug capabilities such as a power controller, mechanical latch, electro-mechanical interlock, indicators, and an attention button. It also includes support for devices which require waiting for command completion events before initiating a subsequent HotPlug command. However, it has only been tested on ExpressCard systems which support surprise removal and have none of these optional capabilities. PCI-express HotPlug support is conditional on the PCI_HP option which is enabled by default on arm64, x86, and powerpc. Reviewed by: adrian, imp, vangyzen (older versions) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D6136	2016-05-05 22:26:23 +00:00
alc	52f3fcfa90	Explain why pmap_copy(), pmap_enter_pde(), and pmap_enter_quick_locked() call pmap_invalidate_page() even though they are not destroying a leaf- level page table entry. Eliminate some bogus white-space characters in a comment. Reviewed by: kib	2016-05-04 17:54:13 +00:00
pfg	7f85f79cce	sys/amd64: Small spelling fixes. No functional change.	2016-05-03 22:13:04 +00:00
pfg	826c10b2f3	vmm(4): Small spelling fixes. Reviewed by: grehan	2016-05-03 22:07:18 +00:00
jhb	c71e075efb	Revert bus_get_cpus() for now. I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.	2016-05-03 01:17:40 +00:00
jhb	2da46e01a0	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519	2016-05-02 18:00:38 +00:00
jhb	050f1049b2	Move 'device pci' for the PCI bus driver to the MI NOTES file. The PCI bus was already listed in all of the MD NOTES files and the driver should at least compile on all platforms.	2016-04-29 23:53:55 +00:00
avg	08fbecbeed	fix missing variable in r298736 Pointyhat to: avg Reported by: Ivan Klymenko <fidaj@ukr.net> MFC after: 2 weeks X-MFC with: r298736	2016-04-28 09:40:24 +00:00
avg	f68c6e4879	ensure that initial local apic id is sane on AMD 10h systems Summary: The Initial Local APIC ID is returned by CPUID function 1 (in EBX). On AMD Family 10h systems the way that ID is built is controlled by an MSR bit (InitApicIdCpuIdLo). BKDG instructs BIOS to set it in a certain way, but a BIOS can be buggy. In that case the ID can confuse tools that use it, e.g. hwloc. For example, on a system that I own real Local APIC IDs are configured as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0. See: https://github.com/open-mpi/hwloc/issues/183 Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6060	2016-04-28 08:29:57 +00:00
cem	241a3b76d8	AMD64 pmap: Use howmany() macro Use param.h howmany() instead of hand-rolled version. Sponsored by: EMC / Isilon Storage Division	2016-04-24 21:35:01 +00:00
pfg	b4106812fd	Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.	2016-04-22 16:57:42 +00:00
pfg	729533413f	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
pfg	be4082c832	X86: use our nitems() macro when it is avaliable through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:41:46 +00:00
cem	98188ed5c2	Add 4Kn kernel dump support (And 4Kn minidump support, but only for amd64.) Make sure all I/O to the dump device is of the native sector size. To that end, we keep a native sector sized buffer associated with dump devices (di->blockbuf) and use it to pad smaller objects as needed (e.g. kerneldumpheader). Add dump_write_pad() as a convenience API to dump smaller objects with zero padding. (Rather than pull in NPM leftpad, we wrote our own.) Savecore(1) has been updated to deal with these dumps. The format for 512-byte sector dumps should remain backwards compatible. Minidumps for other architectures are left as an exercise for the reader. PR: 194279 Submitted by: ambrisko@ Reviewed by: cem (earlier version), rpokala Tested by: rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5848	2016-04-15 17:45:12 +00:00
sephe	3d59317312	hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus Submitted by: Jun Su <junsu microsoft com> Reviewed by: jhb, kib, sephe Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5910	2016-04-15 02:20:18 +00:00
jhb	b5f76666d8	Expose doreti as a global symbol on amd64 and i386. doreti provides the common code path for returning from interrupt andlers on x86. Exposing doreti as a global symbol allows kernel modules to include low-level interrupt handlers instead of requiring all low-level handlers to be statically compiled into the kernel. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: kib	2016-04-13 17:37:31 +00:00
jhb	2ce9aa06e4	Enable DEVICE_NUMA with up to 8 domains by default on amd64. 8 memory domains should handle a quad-socket board with dual-domain processors. Reviewed by: kib Relnotes: maybe? Differential Revision: https://reviews.freebsd.org/D5893	2016-04-12 21:23:44 +00:00
avg	f7d20d3734	re-enable AMD Topology extension on certain models if disabled by BIOS Some BIOSes disable AMD Topology extension on AMD Family 15h notebook processors. We re-enable the extension, so that we can properly discover core and cache topology. Linux seems to do the same. Reported by: Johannes Dieterich <dieterich.joh@gmail.com> Reviewed by: jhb, kib Tested by: Johannes Dieterich <dieterich.joh@gmail.com> (earlier version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5883	2016-04-12 13:30:39 +00:00
avg	73eedd1d08	[amd64] dtrace_invop handler is to be called only for kernel exceptions DTrace-related exceptions in userland code are handled elsewhere. One practical problem was a crash in dtrace_invop_start() when saved %rsp pointed to a virtual address that was not backed. i386 code already ignored userland exceptions. Reviewed by: markj, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D5906	2016-04-12 06:46:54 +00:00
anish	3d3fd1fdc9	Allow guest writes to AMD microcode update[0xc0010020] MSR without updating actual hardware MSR. This allows guest microcode update to go through which otherwise failing because wrmsr() was returning EINVAL. Submitted by:Yamagi Burmeister Approved by:grehan MFC after:2 weeks	2016-04-11 05:09:43 +00:00
ed	e55c02e6f8	Make CloudABI's way of doing TLS more friendly to userspace emulators. We're currently seeing how hard it would be to run CloudABI binaries on operating systems cannot be modified easily (Windows, Mac OS X). The idea is that we want to just run them without any sandboxing. Now that CloudABI executables are PIE, this is already a bit easier, but TLS is still problematic: - CloudABI executables want to write to the %fs, which typically requires extra system calls by the emulator every time it needs to switch between CloudABI's and its own TLS. - If CloudABI executables overwrite the %fs base unconditionally, it also becomes harder for the emulator to store a backup of the old value of %fs. To solve this, let's no longer overwrite %fs, but just %fs:0. As CloudABI's C library does not use a TCB, this space can now be used by an emulator to keep track of its internal state. The executable can now safely overwrite %fs:0, as long as it makes sure that the TCB is copied over to the new TLS area. Ensure that there is an initial TLS area set up when the process starts, only containing a bogus TCB. We don't really care about its contents on FreeBSD. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5836	2016-04-06 11:11:31 +00:00
bapt	577607dffc	Add kern.features flags for linux and linux64 modules kern.features.linux: 1 meaning linux 32 bits binaries are supported kern.features.linux64: 1 meaning linux 64 bits binaries are supported The goal here is to help 3rd party applications (including ports) to determine if the host do support linux emulation Reviewed by: dchagin MFC after: 1 week Relnotes: yes Differential Revision: D5830	2016-04-05 22:36:48 +00:00
jhb	ff4b317e50	Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386.	2016-04-03 23:03:54 +00:00
ed	910e4d679c	Make Position Independent Executables work for CloudABI. - Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to regular ET_EXECs. - Provide an AT_BASE entry in the auxiliary vector, so the executable knows at which address it got loaded and can apply relocations.	2016-03-31 18:52:00 +00:00
kib	eb986c64f5	Type of the interrupt handlers on x86 cannot be expressed in C. Simplify and unify placeholder type definitions. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5771	2016-03-29 19:56:48 +00:00
dchagin	a5f7ea1073	Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET. Pointed out by: ae@	2016-03-27 10:09:10 +00:00

1 2 3 4 5 ...

7505 Commits