freebsd-nq

Author	SHA1	Message	Date
Dmitry Chagin	9dba79fb66	Remove obsolete comment. MFC after: 3 days	2016-01-23 08:08:06 +00:00
Dmitry Chagin	f138999141	Fix a typo. MFC after: 3 days	2016-01-23 08:04:29 +00:00
Hans Petter Selasky	c1ecb7e114	Add missing atomic wrapper macro. Reviewed by: alfred @ Sponsored by: Mellanox Technologies MFC after: 1 week	2016-01-21 18:22:50 +00:00
Konstantin Belousov	f132cd0547	Use ANSI definitions. Wrap long line. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-19 08:08:08 +00:00
Konstantin Belousov	b57e68141f	Clear whole XMM register file instead of only XMM0. Also clear x87 registers. This brings amd64 on par with i386, providing consistent initial FPU state. Note that we do not clear any extended state, at least because kernel does not understand extended state structure and consequences of zero overwrite after fninit()/fpusave(). Submitted by: joss.upton@yahoo.com PR: 206370 MFC after: 2 weeks	2016-01-19 08:04:02 +00:00
Gleb Smirnoff	de44d808ef	Regen after r293907.	2016-01-14 10:15:21 +00:00
Gleb Smirnoff	037f750877	Change linux get_robust_list system call to match actual linux one. The set_robust_list system call request the kernel to record the head of the list of robust futexes owned by the calling thread. The head argument is the list head to record. The get_robust_list system call should return the head of the robust list of the thread whose thread id is specified in pid argument. The list head should be stored in the location pointed to by head argument. In contrast, our implemenattion of get_robust_list system call copies the known portion of memory pointed by recorded in set_robust_list system call pointer to the head of the robust list to the location pointed by head argument. So, it is possible for a local attacker to read portions of kernel memory, which may result in a privilege escalation. Submitted by: mjg Security: SA-16:03.linux	2016-01-14 10:13:58 +00:00
Jung-uk Kim	4ec1c9bfac	Remove dead code when the target processor has POPCNT instruction.	2016-01-13 19:19:50 +00:00
Dmitry Chagin	038c720553	Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it. Differential Revision: https://reviews.freebsd.org/D1090 Reviewed by: kib, trasz MFC after: 1 week	2016-01-09 20:18:53 +00:00
Ed Maste	0e42ee5dd8	Move amd64 metadata.h to x86 and share with i386 MFC after: 1 week	2016-01-07 19:47:26 +00:00
Ian Lepore	69dcb7e771	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
John Baldwin	9e8d8b4b0c	Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c. While here, move the common bits of <machine/cputypes.h> to <x86/cputypes.h> as well. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4670	2015-12-23 21:41:42 +00:00
Enji Cooper	418629d81d	Remove redundant ctx_switch_xsave declaration in sys/amd64/include/md_var.h This variable was added to sys/x86/include/x86_var.h recently. This unbreaks building kernel source that #includes both md_var.h and x86_var.h with gcc 4.2.1 on amd64 Differential Revision: https://reviews.freebsd.org/D4686 Reviewed by: kib X-MFC with: r291949 Sponsored by: EMC / Isilon Storage Division	2015-12-22 20:08:32 +00:00
Warner Losh	2fca0f2dd4	Save the physical address passed into the kernel of the UEFI system table.	2015-12-19 19:01:43 +00:00
Konstantin Belousov	7c958a41fe	Merge common parts of i386 and amd64 md_var.h and smp.h into new headers x86/include x86_var.h and x86_smp.h. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4358	2015-12-07 17:41:20 +00:00
Konstantin Belousov	49e806677c	Use ANSI C definition. MFC after: 1 week	2015-12-07 17:24:55 +00:00
Conrad Meyer	10386b56ad	pmap_invalidate_range: For very large ranges, flush the whole TLB Typical TLBs have 40-512 entries available. At some point, iterating every single page in a requested invalidation range and issuing invlpg on it is more expensive than flushing the TLB and allowing it to reload on demand. Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary number 4096 entries as a hueristic at which point we flush TLB rather than invalidating every single potential page. Reviewed by: alc Feedback from: jhb, kib MFC notes: Depends on r291688 Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4280	2015-12-06 17:39:13 +00:00
Konstantin Belousov	27691a24ab	For amd64 non-PCID machines, and for i386 machines with support for the PG_G global pte flag, pmap_invalidate_all() fails to flush global TLB entries []. This is because TLB shootdown handler for such configs reloads CR3, and on i386 pmap_invalidate_all() does the same for the initiating CPU. Note that current code does not issue total invalidation requests for the kernel_pmap. Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not specific for PCID for quite some time, and implement the same functionality for i386. Use the function instead of invltlb() in shootdown handlers and in i386 pmap_invalidate_all(), but only for the kernel pmap (which maps pages with the PG_G attribute set), which takes care of PG_G TLB entries on flush. To detect the affected pmap in i386 TLB shootdown handler, pmap should be passed to the smp_masked_invltlb() function, which makes amd64 and i386 TLB shootdown code almost identical. Merge the code under x86/. Noted by: jhb [] Reviewed by: cem, jhb, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4346	2015-12-03 11:14:14 +00:00
Konstantin Belousov	724f4b62b0	Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct sysent. sv_prepsyscall is unused. sv_sigsize and sv_sigtbl translate signal number from the FreeBSD namespace into the ABI domain. It is only utilized on i386 for iBCS2 binaries. The issue with this approach is that signals for iBCS2 were delivered with the FreeBSD signal frame layout, which does not follow iBCS2. The same note is true for any other potential user if sv_sigtbl. In other words, if ABI needs signal number translation, it really needs custom sv_sendsig method instead. Sponsored by: The FreeBSD Foundation	2015-11-28 08:49:07 +00:00
Ed Maste	2e0002c18e	Fix whitespace on addition of IPSEC option	2015-11-26 21:35:50 +00:00
Konstantin Belousov	5e27d79314	Split kerne timekeep ABI structure vdso_sv_tk out of the struct sysentvec. This allows the timekeep data to be shared between similar ABIs which cannot share sysentvec. Make the timekeep_push_vdso() tick callback to the timekeep structures instead of sysentvecs. If several sysentvec share the vdso_sv_tk structure, we would update the userspace data several times on each tick, without the change. Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when sysentvec is marked with the new SV_TIMEKEEP flag. This saves allocation and update of unneeded vdso_sv_tk for ABIs which do not provide userspace gettimeofday yet, which are PowerPCs arches right now. Make vdso_sv_tk allocator public, namely split out and export alloc_sv_tk() and alloc_sv_tk_compat32(). ABIs which share timekeep data now can allocate it manually and share as appropriate. Requested by: nwhitehorn Tested by: nwhitehorn, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-11-23 07:09:35 +00:00
Mark Johnston	7672ca059a	Remove unneeded includes of opt_kdtrace.h. As of r258541, KDTRACE_HOOKS is defined in opt_global.h, so opt_kdtrace.h is not needed when defining SDT(9) probes.	2015-11-22 02:01:01 +00:00
John Baldwin	645743ea99	Export various helper variables describing the layout and size of certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3773	2015-11-12 22:00:59 +00:00
Conrad Meyer	0e5d2011ae	pmap_change_attr: Only fixup DMAP for DMAPed ranges pmap_change_attr must change the memory type of both the requested KVA and the corresponding DMAP mappings (if such mappings exist), to satisfy an Intel requirement that two or more mappings to the same physical pages must have the same memory type. However, not all kernel mapped pages have corresponding DMAP mappings -- for example, 64-bit BARs. Skip fixing up the DMAP for out-of-bounds addresses. Submitted by: Steve Wahl <steve_wahl@dell.com> Reviewed by: alc, jhb Sponsored by: Dell Compellent Differential Revision: https://reviews.freebsd.org/D4030	2015-10-29 19:07:00 +00:00
John Baldwin	2219c44a1f	Update for LINUX32 rename. The assembler didn't complain about undefined symbols but just used 0 after the rename.	2015-10-29 15:20:47 +00:00
John Baldwin	6cea44a704	Fix build with DEBUG defined. Reported by: hselasky	2015-10-29 15:16:47 +00:00
Kirk McKusick	a57418a761	Bring the tags and links entries for amd64 up to date. Based on how out of date it is, I doubt that anyone other than me and my code-reading students still use it.	2015-10-27 22:59:24 +00:00
Konstantin Belousov	af95bbf5bf	Intel SDM before revision 56 described the CLFLUSH instruction as only ordered with the MFENCE instruction. Similar weak guarantees are also specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced CLFLUSH loop with MFENCE both before and after the loop. In the revision 56 of SDM, Intel stated that all existing implementations of CLFLUSH are strict, CLFLUSH instructions execution is ordered WRT other CLFLUSH and writes. Also, the strict behaviour is made architectural. A new instruction CLFLUSHOPT (which was documented for some time in the Instruction Set Extensions Programming Reference) provides the weak behaviour which was previously attributed to CLFLUSH. Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do not execute MFENCE before and after the flushing loop. Reviewed by: alc Sponsored by: The FreeBSD Foundation	2015-10-24 21:37:47 +00:00
Konstantin Belousov	3f8e071052	Add CLFLUSHOPT instruction wrappers. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-23 11:45:38 +00:00
John Baldwin	f2264f5048	Regen for linux32 rename and linux64 systrace.	2015-10-22 21:33:37 +00:00
John Baldwin	2f99bcce1e	Rename remaining linux32 symbols such as linux_sysent[] and linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with linux64.ko. While here, add support for linux64 binaries to systrace. - Update NOPROTO entries in amd64/linux/syscalls.master to match the main table to fix systrace build. - Add a special case for union l_semun arguments to the systrace generation. - The systrace_linux32 module now only builds the systrace_linux32.ko. module on amd64. - Add a new systrace_linux module that builds on both i386 and amd64. For i386 it builds the existing systrace_linux.ko. For amd64 it builds a systrace_linux.ko for 64-bit binaries. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D3954	2015-10-22 21:28:20 +00:00
John Baldwin	5047105b71	Merge r289055 to amd64/linux32: linux: fix handling of out-of-bounds syscall attempts Due to an off by one the code would read an entry past the table, as opposed to the last entry which contains the nosys handler.	2015-10-22 21:23:58 +00:00
Ed Schouten	b78ef4bd86	Refactoring: move out generic bits from cloudabi64_sysvec.c. In order to make it easier to support CloudABI on ARM64, move out all of the bits from the AMD64 cloudabi_sysvec.c into a new file cloudabi_module.c that would otherwise remain identical. This reduces the AMD64 specific code to just ~160 lines. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3974	2015-10-22 09:07:53 +00:00
Roger Pau Monné	6a306bff7f	x86/xen: Consolidate xen-os.h in a single place amd64 and i386 platform code contain very similar xen/xen-os.h The only differences are: - Functions/variables/types which were unused in i386/xen/xen-os.h: * xen_xchg * __xchg_dummy * __xg * __xchg * atomic_t * atomic_inc * rdtscll The functions/variables/types unused in xen-os.h can be dropped and there is no more differences betwen amd64 and i386. The new header is placed in x86/include/xen and each platform will have dummy headers include x86/xen/.h. This is to be able to include machine/xen/.h in the PV drivers. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3880 Sponsored by: Citrix Systems R&D	2015-10-21 10:04:35 +00:00
Alexander Motin	4a3760bae6	Remove compatibility shims for legacy ATA device names. We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely removed old stack at 10.x, so at 11.x it is time to remove compat shims.	2015-10-11 13:01:51 +00:00
Mateusz Guzik	3e15a670d2	linux: fix handling of out-of-bounds syscall attempts Due to an off by one the code would read an entry past the table, as opposed to the last entry which contains the nosys handler. Reported by: Pawel Biernacki <pawel.biernacki gmail.com>	2015-10-08 21:08:35 +00:00
Roger Pau Monné	a231723cc0	xen/console: Introduce a new console driver for Xen guest The current Xen console driver is crashing very quickly when using it on an ARM guest. This is because the console lock is recursive and it may lead to recursion on the tty lock and/or corrupt the ring pointer. Furthermore, the console lock is not always taken where it should be and has to be released too early because of the way the console has been designed. Over the years, code has been modified to support various new features but the driver has not been reworked. This new driver has been rewritten with the idea of only having a small set of specific function to write either via the shared ring or the hypercall interface. Note that HVM support has been left aside for now because it requires additional features which are not yet supported. A follow-up patch will be sent with HVM guest support. List of items that may be good to have but not mandatory: - Avoid to flush for each character written when using the tty - Support multiple consoles Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3698 Sponsored by: Citrix Systems R&D	2015-10-08 16:39:43 +00:00
Roger Pau Monné	1a52c10530	Update Xen headers from 4.2 to 4.6 Pull the latest headers for Xen which allow us to add support for ARM and use new features in FreeBSD. This is a verbatim copy of the xen/include/public so every headers which don't exits anymore in the Xen repositories have been dropped. Note the interface version hasn't been bumped, it will be done in a follow-up. Although, it requires fix in the code to get it compiled: - sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so drop it. - {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on xen/interface/event_channel.h, so include it. - {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include machine/intr_machdep.h. This is also fixing build compilation with the new headers. - dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas been dropped. So directly use struct blkif_request_segment Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if __XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file where xen/xen-os.h is not correctly included. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3805 Sponsored by: Citrix Systems R&D	2015-10-06 11:29:44 +00:00
Alan Cox	9f86aba61c	Exploit r288122 to address a cosmetic issue. Since PV chunk pages don't belong to a vm object, they can't be paged out. Since they can't be paged out, they are never enqueued in a paging queue. Nonetheless, passing PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages are being enqueued in the inactive queue. As of r288122, we can avoid this false impression by passing PQ_NONE. Submitted by: kmacy (an earlier version) Differential Revision: https://reviews.freebsd.org/D1674	2015-09-26 07:18:05 +00:00
Mateusz Guzik	c025b81442	amd64: plug redundant bootAP declaration Reported by: gcc5	2015-09-22 21:07:47 +00:00
Konstantin Belousov	cff8c6f2d1	Add support for weak symbols to the kernel linkers. It means that linkers no longer raise an error when undefined weak symbols are found, but relocate as if the symbol value was 0. Note that we do not repeat the mistake of userspace dynamic linker of making the symbol lookup prefer non-weak symbol definition over the weak one, if both are available. In fact, kernel linker uses the first definition found, and ignores duplicates. Signature of the elf_lookup() and elf_obj_lookup() functions changed to split result/error code and the symbol address returned. Otherwise, it is impossible to return zero address as the symbol value, to MD relocation code. This explains the mechanical changes in elf_machdep.c sources. The powerpc64 R_PPC_JMP_SLOT handler did not checked error from the lookup() call, the patch leaves the code as is (untested). Reported by: glebius Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-09-20 01:27:59 +00:00
Mark Johnston	610141cebb	Add stack_save_td_running(), a function to trace the kernel stack of a running thread. It is currently implemented only on amd64 and i386; on these architectures, it is implemented by raising an NMI on the CPU on which the target thread is currently running. Unlike stack_save_td(), it may fail, for example if the thread is running in user mode. This change also modifies the kern.proc.kstack sysctl to use this function, so that stacks of running threads are shown in the output of "procstat -kk". This is handy for debugging threads that are stuck in a busy loop. Reviewed by: bdrewery, jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3256	2015-09-11 03:54:37 +00:00
Mark Johnston	4db79feb8f	Merge stack(9) implementations for i386 and amd64 under x86/. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255	2015-09-11 03:24:07 +00:00
Konstantin Belousov	1fa6712471	Do not hold the process around the vm_fault() call from the trap()s. The only operation which is prevented by the hold is the kernel stack swapout for the faulted thread, which should be fine to allow. Remove useless checks for NULL curproc or curproc->p_vmspace from the trap_pfault() wrappers on x86 and powerpc. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-09-10 17:46:48 +00:00
Mark Johnston	21be12e0ca	Remove an unneeded instruction. MFC after: 1 week	2015-08-28 00:17:21 +00:00
Conrad Meyer	e974f91c38	Import ioat(4) driver I/OAT is also referred to as Crystal Beach DMA and is a Platform Storage Extension (PSE) on some Intel server platforms. This driver currently supports DMA descriptors only and is part of a larger effort to upstream an interconnect between multiple systems using the Non-Transparent Bridge (NTB) PSE. For now, this driver is only built on AMD64 platforms. It may be ported to work on i386 later, if that is desired. The hardware is exclusive to x86. Further documentation on ioat(4), including API documentation and usage, can be found in the new manual page. Bring in a test tool, ioatcontrol(8), in tools/tools/ioat. The test tool is not hooked up to the build and is not intended for end users. Submitted by: jimharris, Carl Delsey <carl.r.delsey@intel.com> Reviewed by: jimharris (reviewed my changes) Approved by: markj (mentor) Relnotes: yes Sponsored by: Intel Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3456	2015-08-24 19:32:03 +00:00
Roger Pau Monné	e8234cfef6	preload_search_info: make sure mod is set Add a check to preload_search_info to make sure mod is set. Most of the callers of preload_search_info don't check that the mod parameter is set, which can cause page faults. While at it, remove some now unnecessary checks before calling preload_search_info. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3440	2015-08-21 15:57:57 +00:00
Baptiste Daroussin	d83272a486	Add a kern.features.cloudabi64 entry when the module is loaded to helps the userland to be able to test is cloudabi64 is supported or not Reviewed by: ed Differential Revision: https://reviews.freebsd.org/D3430	2015-08-19 15:18:32 +00:00
Marcel Moolenaar	4a99d3f571	Add 24 more page table pages we allocate on boot-up. 16MB slop is a little tight in and by itself, but severily insufficient when one needs to map a large frame buffer as part of console initialization. 64MB slop should be enough for a while. As an example: a 15" MacBook Pro with retina display needs ~28MB of KVA for the frame buffer. PR: 193745	2015-08-18 01:53:41 +00:00
Konstantin Belousov	7a39d38dbd	XEN/amd64 may initiate i/o over the pages not mapped by the direct map. Handle busdma bouncing and ata PIO accesses by using global frame used by the current CPU locally for the duration of pmap_quick_enter/remove_page(). A spin mutex protects the concurent frame use and prevents thread migration. Noted by: royger Reviewed by: alc, jah, royger (previous version) Sponsored by: The FreeBSD Foundation	2015-08-17 18:42:45 +00:00
Marcel Moolenaar	7ef5e8bc80	Better support memory mapped console devices, such as VGA and EFI frame buffers and memory mapped UARTs. 1. Delay calling cninit() until after pmap_bootstrap(). This makes sure we have PMAP initialized enough to add translations. Keep kdb_init() after cninit() so that we have console when we need to break into the debugger on boot. 2. Unfortunately, the ATPIC code had be moved as well so as to avoid a spurious trap #30. The reason for which is not known at this time. 3. In pmap_mapdev_attr(), when we need to map a device prior to the VM system being initialized, use virtual_avail as the KVA to map the device at. In particular, avoid using the direct map on amd64 because we can't demote by virtue of not being able to allocate yet. Keep track of the translation. Re-use the translation after the VM has been initialized to not waste KVA and to satisfy the assumption in uart(4) that the handle returned for the low-level console is the same as later returned when the device is probed and attached. 4. In pmap_unmapdev() remove the mapping from the table when called pre-init. Otherwise keep the mapping. During bus probe and attach device resources are mapped and unmapped multiple times, which would have us destroy the mapping used by the low-level console. 5. In pmap_init(), set pmap_initialized to signal that we're not pre-init anymore. On amd64, bring the direct map in sync with the translations created at that time. 6. Implement bus_space_map() and bus_space_unmap() for real: when the tag corresponds to memory space, call the corresponding pmap_mapdev() and pmap_unmapdev() functions to construct and actual handle. 7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply call pmap_mapdev_attr() or bus_space_map() as desired. Notes: 1. uart(4) already used bus_space_map() during low-level console setup but since serial ports have traditionally been I/O port based, the lack of a proper implementation for said function was not a problem. It has always supported memory mapped UARTs for low-level consoles by setting hw.uart.console accordingly. 2. The use of the direct map on amd64 without setting caching attributes has been a bigger problem than previously thought. This change has the fortunate (and unexpected) side-effect of fixing various EFI frame buffer problems (though not all). PR: 191564, 194952 Special thanks to: 1. XipLink, Inc -- generously donated an Intel Bay Trail E3800 based eval board (ADLE3800PC). 2. The FreeBSD Foundation, in particular emaste@ -- for UEFI support in general and testing. 3. Everyone who tested the proposed for PR 191564. 4. jhb@ and kib@ for being a soundboard and applying a clue bat if so needed.	2015-08-12 15:26:32 +00:00
Konstantin Belousov	0e190a486f	Initialization of smp_tlb_wait does not require release semantic, no data is synchronized by store/load to the variable. The lapic_write_icr() function ensures that store buffers are flushed before IPI command is issued. Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-12 09:46:39 +00:00
Konstantin Belousov	c77d57c8b4	AP should load aps_ready with acquire semantic to see BSP updates to the SMP structures, synchronized with the load by release store in release_aps(). The change is formal, x86 strong memory model implicitely provided the guarantees. Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-12 09:43:12 +00:00
Konstantin Belousov	edc8222303	Make kstack_pages a tunable on arm, x86, and powepc. On i386, the initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week	2015-08-10 17:18:21 +00:00
John Baldwin	3c790178c5	Remove some more vestiges of the Xen PV domu support. Specifically, use vtophys() directly instead of vtomach() and retire the no-longer-used headers <machine/xenfunc.h> and <machine/xenvar.h>. Reported by: bde (stale bits in <machine/xenfunc.h>) Reviewed by: royger (earlier version) Differential Revision: https://reviews.freebsd.org/D3266	2015-08-06 17:07:21 +00:00
Ed Maste	fc8c856029	Rationalize BSD license on sys/*/include/in_cksum.h Remove the advertising clause from the Regents of the University of California's license, per the letter dated July 22, 1999. Update clause numbering.	2015-08-05 19:05:12 +00:00
Jason A. Harmening	713841afb2	Add two new pmap functions: vm_offset_t pmap_quick_enter_page(vm_page_t m) void pmap_quick_remove_page(vm_offset_t kva) These will create and destroy a temporary, CPU-local KVA mapping of a specified page. Guarantees: --Will not sleep and will not fail. --Safe to call under a non-sleepable lock or from an ithread Restrictions: --Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms --Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page. --MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm. NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing. Reviewed by: kib Approved by: kib (mentor) Differential Revision: http://reviews.freebsd.org/D3013	2015-08-04 19:46:13 +00:00
Warner Losh	75333e6435	Add pmspvc device back to GENERIC. The issues with the device playing grabby hands with other driver's devices has been solved. MFC After: 3 weeks	2015-08-03 13:49:46 +00:00
Ed Schouten	ee95773383	Let CloudABI use the SV_CAPSICUM flag. CloudABI processes will now start up in capabilities mode. Reviewed by: kib	2015-08-03 13:42:52 +00:00
Konstantin Belousov	f94cc23475	Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID reported, on APs. We already did this on BSP. Otherwise, the userspace software which depends on the features reported by the high CPUID levels is misbehaving. In particular, AVX detection is non-functional, depending on which CPU thread happens to execute when doing CPUID. Another victim is the libthr signal handlers interposer, which needs to save full FPU extended state. Reported and tested by: Andre Meiser <ortadur@web.de> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-03 12:14:42 +00:00
Ed Schouten	75c9f22394	Set p_osrel to __FreeBSD_version on process startup. Certain system calls have quirks applied to make them work as if called on an older version of FreeBSD. As CloudABI executables don't have the FreeBSD OS release number in the ELF header, this value is set to zero, making the system calls fall back to typically historic, non-standard behaviour. Reviewed by: kib	2015-08-03 07:29:57 +00:00
Glen Barber	45e1c1a38d	Pull pmspcv (pms(4)) from GENERIC. It has PCI ID conflicts with ahd(4), mvs(4), and likely other drivers. MFC after: immediately With hat: re Sponsored by: The FreeBSD Foundation	2015-07-31 15:23:48 +00:00
Konstantin Belousov	0b6476ec5b	Improve comments. Submitted by: bde MFC after: 2 weeks	2015-07-30 15:47:53 +00:00
Konstantin Belousov	1d1ec02c44	Remove full barrier from the amd64 atomic_load_acq_*(). Strong ordering semantic of x86 CPUs makes only the compiler barrier neccessary to give the acquire behaviour. Existing implementation ensured sequentially consistent semantic for load_acq, making much stronger guarantee than required by standard's definition of the load acquire. Consumers which depend on the barrier are believed to be identified and already fixed to use proper operations. Noted by: alc (long time ago) Reviewed by: alc, bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-07-28 07:04:51 +00:00
Alan Cox	d8b56c8eab	Add a comment discussing the appropriate use of the atomic_() functions with acquire and release semantics versus the mb() functions on amd64 processors. Reviewed by: bde (an earlier version), kib Sponsored by: EMC / Isilon Storage Division	2015-07-24 19:43:18 +00:00
John Baldwin	9a2d6ab990	Various changes to the registers displayed in DDB for x86. - Fix segment registers to only display the low 16 bits. - Remove unused handlers and entries for the debug registers. - Display xcr0 (if valid) in 'show sysregs'. - Add '0x' prefix to MSR values to match other values in 'show sysregs'. - MFamd64: Display various MSRs in 'show sysregs'. - Add a 'show dbregs' to display the value of debug registers. - Dynamically size the column width for register values to properly align columns on 64-bit platforms. - Display %gs for i386 in 'show registers'. Differential Revision: https://reviews.freebsd.org/D2784 Reviewed by: kib, markj MFC after: 2 weeks	2015-07-22 01:09:02 +00:00
Mark Johnston	a5cbf8b9c0	Let the unwinder handle faults during function prologues or epilogues. The i386 and amd64 DDB stack unwinders contain code to detect and handle the case where the first frame is not completely set up or torn down. This code was accidentally unused however, since db_backtrace() was never called with a non-NULL trap frame. This change fixes that. Also remove get_rsp() from the amd64 code. It appears to have come from i386, which needs to take into account whether the exception triggered a CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64, SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for kernel-mode exceptions. As a result, we can also remove custom print functions for these registers. Reviewed by: jhb Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D2881	2015-07-21 23:22:23 +00:00
Mark Johnston	f8a757d016	Improve stack unwinding on i386 and amd64 after an IP fault. If we can't find a symbol corresponding to the faulting instruction, assume that the previously-executed function is a call and attempt to find the calling function using the return address on the stack. Otherwise we end up associating the last stack frame with the current call, which is incorrect and causes the unwinder to skip printing of the calling function, resulting in a confusing backtrace. Reviewed by: jhb Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D2859	2015-07-21 23:13:11 +00:00
Mark Johnston	1a5bee0849	Remove some dead code from DDB's amd64 stack unwinder. The amd64 port copied some code from i386 to fetch function arguments and display them in backtraces. However, it was commented out and can't easily be implemented since the function arguments are passed in registers rather than on the stack in amd64. Remove it in preparation for some bug fixes in this area. Reviewed by: jhb Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D2857	2015-07-21 23:03:21 +00:00
Ed Schouten	d0da90b198	Describe COMPAT_CLOUDABI64 in the amd64 configuration NOTES file.	2015-07-21 12:53:47 +00:00
Ed Schouten	21d30b29d5	Make thread creation work for CloudABI processes. Summary: Remove the stub system call that was put in place during the system call import and replace it by a target-dependent version stored in sys/amd64. Initialize the thread in a way similar to cpu_set_upcall_kse(). We provide the entry point with two arguments: the thread ID and the argument pointer. Test Plan: Thread creation still seems to work, both for FreeBSD and CloudABI binaries. Reviewers: dchagin, mjg, kib Reviewed By: kib Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3110	2015-07-21 12:47:15 +00:00
Ed Schouten	62c31cffae	Make forking of CloudABI processes work. Just like FreeBSD+Capsicum, CloudABI uses process descriptors. Return the file descriptor number to the parent process. To the child process we both return a special value for the file descriptor number (CLOUDABI_PROCESS_CHILD). We also return the thread ID of the new thread in the copied process, so the threading library can reinitialize itself. Obtained from: https://github.com/NuxiNL/freebsd	2015-07-20 13:46:22 +00:00
Mark Johnston	32cd0147fa	Implement the lockstat provider using SDT(9) instead of the custom provider in lockstat.ko. This means that lockstat probes now have typed arguments and will utilize SDT probe hot-patching support when it arrives. Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D2993	2015-07-19 22:14:09 +00:00
Benno Rice	eacbeb2b95	Merge driver for PMC Sierra's range of SAS/SATA HBAs. Submitted by: Achim Leubner <Achim.Leubner@pmcs.com> Reviewed by: scottl	2015-07-17 23:30:43 +00:00
Konstantin Belousov	888e282ab4	When checking for the valid value of the frame pointer, verify that it belongs to the kernel stack address range for the thread. Right now, code checks that new frame is not farther then KSTACK_PAGES pages from the current frame, which allows the address to point past the top of the stack. Reviewed by: andrew, emaste, markj Differential revision: https://reviews.freebsd.org/D3108 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-07-16 19:40:18 +00:00
Ed Schouten	6e5fcd99df	Add a sysentvec for CloudABI on x86-64. Summary: For CloudABI we need to put two things on the stack of new processes: the argument data (a binary blob; not strings) and a startup data structure. The startup data structure contains interesting things such as a pointer to the ELF program header, the thread ID of the initial thread, a stack smashing protection canary, and a pointer to the argument data. Fetching system call arguments and setting the return value is similar to FreeBSD. The only differences are that system call 0 does not exist and that we call into cloudabi_convert_errno() to convert the error code. We also need this function in a couple of other places, so we'd better reuse it here. Reviewers: dchagin, kib Reviewed By: kib Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3098	2015-07-16 18:24:06 +00:00
Patrick Kelsey	2ec930efea	Revert inadvertent change to amd64/GENERIC.	2015-07-15 01:04:54 +00:00
Patrick Kelsey	8aa7fdbd78	Add netmap support for ixgbe SRIOV VFs (that is, to if_ixv). Differential Revision: https://reviews.freebsd.org/D2923 Reviewed by: erj, gnn Approved by: jmallett (mentor) Sponsored by: Norse Corp, Inc.	2015-07-15 01:02:01 +00:00
Christian Brueffer	f4c1eac7cd	Spell crypto correctly.	2015-07-14 10:47:56 +00:00
John-Mark Gurney	e808e13b8b	Now that aesni won't reuse fpu contexts (D3016), add seatbelts to the fpu code to prevent other reuse of the contexts in the future... Differential Revision: https://reviews.freebsd.org/D3015 Reviewed by: kib, gnn	2015-07-08 19:26:36 +00:00
Konstantin Belousov	8954a9a4e6	Add the atomic_thread_fence() family of functions with intent to provide a semantic defined by the C11 fences with corresponding memory_order. atomic_thread_fence_acq() gives r \| r, w, where r and w are read and write accesses, and \| denotes the fence itself. atomic_thread_fence_rel() is r, w \| w. atomic_thread_fence_acq_rel() is the combination of the acquire and release in single operation. Note that reads after the acq+rel fence could be made visible before writes preceeding the fence. atomic_thread_fence_seq_cst() orders all accesses before/after the fence, and the fence itself is globally ordered against other sequentially consistent atomic operations. Reviewed by: alc Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-07-08 18:12:24 +00:00
Achim Leubner	4e1bc9a039	Driver 'pmspcv' added. Supports PMC-Sierra PM8001/8081/8088/8089/8074/8076/8077 SAS/SATA HBA Controllers.	2015-07-07 13:17:02 +00:00
Neel Natu	5e4f29c037	Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.io Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual machines on the host. These tools break if the devmem device nodes also appear in /dev/vmm. Requested by: grehan	2015-07-06 19:41:43 +00:00
George V. Neville-Neil	3839369c03	Enable IPSEC in all GENERIC kernels. Universe and kernel build tests passed 4 July 2015 PR: 128030 Sponsored by: Rubicon Communications (Netgate)	2015-07-04 17:37:00 +00:00
Konstantin Belousov	6fdfd88220	Use single instance of the identical INKERNEL() and PMC_IN_KERNEL() macros on amd64 and i386. Move the definition to machine/param.h. kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb version to PINKERNEL(). On i386, correct the lowest kernel address. After the shared page was introduced, USRSTACK no longer points to the last user address + 1 [] Submitted by: Oliver Pinter [] Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-07-02 14:37:21 +00:00
Konstantin Belousov	3ce8c94f29	Disallow a debugger on 64bit system to set fs/gs bases of the 32bit process beyond the end of the process address space. Such setting is not dangerous to the kernel integrity, but it causes confusing application misbehaviour. Sponsored by: The FreeBSD Foundation MFC after: 12 days	2015-07-01 16:37:03 +00:00
Konstantin Belousov	3ac3c0f269	Add a comment about too strong semantic of atomic_load_acq() on x86. Submitted by: bde MFC after: 2 weeks	2015-06-29 09:58:40 +00:00
Konstantin Belousov	d9008978c8	pcb_gs32sd is unused for long time, remove it. Keep the padding in pcb. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-29 07:53:44 +00:00
Konstantin Belousov	1817023775	Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to obtain the thread %fs and %gs bases. Add x86 PT_SETFSBASE and PT_SETGSBASE requests to set the bases from debuggers. The set requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE), override the corresponding segment registers. The main purpose of the operations is to retrieve and modify the tcb address for debuggee. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-29 07:07:24 +00:00
Konstantin Belousov	7626d062c3	Remove unneeded data dependency, currently imposed by atomic_load_acq(9), on it source, for x86. Right now, atomic_load_acq() on x86 is sequentially consistent with other atomics, code ensures this by doing store/load barrier by performing locked nop on the source. Provide separate primitive __storeload_barrier(), which is implemented as the locked nop done on a cpu-private variable, and put __storeload_barrier() before load, to keep seq_cst semantic but avoid introducing false dependency on the no-modification of the source for its later use. Note that seq_cst property of x86 atomic_load_acq() is not documented and not carried by atomics implementations on other architectures, although some kernel code relies on the behaviour. This commit does not intend to change this. Reviewed by: alc Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-06-28 05:04:08 +00:00
Tycho Nightingale	ea587cd825	verify_gla() needs to account for non-zero segment base addresses. Reviewed by: neel	2015-06-26 18:00:29 +00:00
Roger Pau Monné	7e748038cd	amd64: set the correct LMA values The current linker script generates program headers with VMA == LMA: Entry point 0xffffffff802e7000 There are 6 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0xffffffff80200040 0xffffffff80200040 0x0000000000000150 0x0000000000000150 R E 8 INTERP 0x0000000000000190 0xffffffff80200190 0xffffffff80200190 0x000000000000000d 0x000000000000000d R 1 [Requesting program interpreter: /red/herring] LOAD 0x0000000000000000 0xffffffff80200000 0xffffffff80200000 0x00000000010559b0 0x00000000010559b0 R E 200000 LOAD 0x0000000001056000 0xffffffff81456000 0xffffffff81456000 0x0000000000132638 0x000000000052ecf8 RW 200000 DYNAMIC 0x0000000001056000 0xffffffff81456000 0xffffffff81456000 0x00000000000000d0 0x00000000000000d0 RW 8 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 8 This is fine for the FreeBSD loader, because it completely ignores p_paddr and instead uses p_vaddr with a hardcoded offset. Other loaders however acknowledge p_paddr (like the Xen ELF loader), in which case they will try to load the kernel at the wrong place. Fix this by adding an AT keyword to the first section specifying the physical address, other sections will follow suit, so it ends up looking like: Entry point 0xffffffff802e7000 There are 6 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0xffffffff80200040 0x0000000000200040 0x0000000000000150 0x0000000000000150 R E 8 INTERP 0x0000000000000190 0xffffffff80200190 0x0000000000200190 0x000000000000000d 0x000000000000000d R 1 [Requesting program interpreter: /red/herring] LOAD 0x0000000000000000 0xffffffff80200000 0x0000000000200000 0x00000000010559b0 0x00000000010559b0 R E 200000 LOAD 0x0000000001056000 0xffffffff81456000 0x0000000001456000 0x0000000000132638 0x000000000052ecf8 RW 200000 DYNAMIC 0x0000000001056000 0xffffffff81456000 0x0000000001456000 0x00000000000000d0 0x00000000000000d0 RW 8 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 8 Tested on bare metal using the native FreeBSD loader and grub2 from TRUEOS. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D2783	2015-06-26 07:12:17 +00:00
Neel Natu	90e528f838	Restore the host's GS.base before returning from 'svm_launch()'. Previously this was done by the caller of 'svm_launch()' after it returned. This works fine as long as no code is executed in the interim that depends on pcpu data. The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because it calls 'dtrace_probe()' which in turn relies on pcpu data. Reported by: avg MFC after: 1 week	2015-06-23 02:17:23 +00:00
Neel Natu	9b1aa8d622	Restructure memory allocation in bhyve to support "devmem". devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks	2015-06-18 06:00:17 +00:00
John Baldwin	5eb95e11ba	Report the values of x86 segment registers to remote debuggers. While here, also report %eflags from the i386 trapframe. Differential Revision: https://reviews.freebsd.org/D2743 Reviewed by: kib Obtained from: 1 month	2015-06-12 15:14:08 +00:00
Ruslan Bukin	4f4d15f0d0	Allow DTrace to be compiled-in to the kernel. This will require for AArch64 as we dont have modules yet. Sponsored by: HEIF5 Sponsored by: ARM Ltd. Differential Revision: https://reviews.freebsd.org/D1997	2015-06-10 15:53:39 +00:00
Mateusz Guzik	21de5aea6c	Fixup the build after r284215. Submitted by: Ivan Klymenko <fidaj ukr.net> [slighly modified]	2015-06-10 12:39:01 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Mateusz Guzik	4ea6a9a28f	Generalised support for copy-on-write structures shared by threads. Thread credentials are maintained as follows: each thread has a pointer to creds and a reference on them. The pointer is compared with proc's creds on userspace<->kernel boundary and updated if needed. This patch introduces a counter which can be compared instead, so that more structures can use this scheme without adding more comparisons on the boundary.	2015-06-10 10:43:59 +00:00
Alan Cox	65a9768f62	Account for superpage mappings that are created by pmap_copy().	2015-06-09 18:04:28 +00:00
Tycho Nightingale	277bdd9950	Support guest writes to the TSC by enabling the "use TSC offsetting" execution control and writing the difference between the host TSC and the guest TSC into the TSC offset in the VMCS upon encountering a write. Reviewed by: neel	2015-06-09 00:14:47 +00:00
Dmitry Chagin	c2bc5b15eb	Futex is an aligned 32-bit integer. Use the proper instruction and operand when dereferencing futex pointer.	2015-06-08 17:39:25 +00:00
Alan Cox	966272ca33	Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages. Differential Revision: https://reviews.freebsd.org/D2712 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division	2015-06-08 04:59:32 +00:00
Konstantin Belousov	32a1e9e4a5	Update print_INTEL_TLB() by the tag values from the Intel SDM rev. 55. The modern CPUs cache and TLB descriptions looked quite questionable without the update, e.g. Haswell i7 4770S reported: Data TLB: 4 KB pages, 4-way set associative, 64 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line After the update, the report is: Data TLB: 1 GByte pages, 4-way set associative, 4 entries Data TLB: 4 KB pages, 4-way set associative, 64 entries Instruction TLB: 2M/4M pages, fully associative, 8 entries Instruction TLB: 4KByte pages, 8-way set associative, 64 entries 64-Byte prefetching Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line Some tags were apparently removed from the table 3-21, Vol. 2A. Keep them around, but add a comment stating the removal. Update the format line for cpu_stdext_feature according to the bits from the SDM rev.55. It appears that Haswells do not store %cs and %ds values in the FPU save area. Store content of the %ecx register from the CPUID leaf 0x7 subleaf 0 as cpu_stdext_feature2 and print defined bits from it, again acording to SDM rev. 55. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-06 22:03:24 +00:00
Neel Natu	647c87825c	The 'verify_gla()' function is used to ensure that the effective address after decoding the instruction matches the one provided by hardware. Prior to r283293 'vie->num_valid' used to contain the actual length of the instruction whereas now it contains the maximum instruction length possible. This introduced a bug when calculating a RIP-relative base address. Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the length of the emulated instruction. Reported and tested by: tychon MFC after: 1 week	2015-06-05 21:22:26 +00:00
Neel Natu	b14bd6ac9d	Use tunable 'hw.vmm.svm.features' to disable specific SVM features even though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor. MFC after: 1 week	2015-06-04 02:12:23 +00:00
Dimitry Andric	38954a1d1c	Remove unneeded NULL checks in amd64's trap_fatal(). Since td_name is an array member of struct thread, it can never be NULL, so the check can be removed. In addition, curproc can never be NULL, so remove the if statement, and splice the two printfs() together. While here, remove the u_long cast, and use the correct printf format specifier curproc->p_pid. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D2695	2015-06-01 06:50:39 +00:00
Konstantin Belousov	69baeadc31	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-05-29 13:24:17 +00:00
Neel Natu	248e6799e9	Fix non-deterministic delays when accessing a vcpu that was in "running" or "sleeping" state. This is done by forcing the vcpu to transition to "idle" by returning to userspace with an exit code of VM_EXITCODE_REQIDLE. MFC after: 2 weeks	2015-05-28 17:37:01 +00:00
Konstantin Belousov	1c5f423183	Enabled rewritten PCID support by default. Sponsored by: The FreeBSD Foundation MFC after: 1 month	2015-05-27 09:50:18 +00:00
Dmitry Chagin	d707582f83	When I merged the lemul branch I missied kib@'s r282708 commit. This is not the final fix as I need properly cleanup thread resources before other threads suicide. Tested by: Ruslan Makhmatkhanov	2015-05-25 20:44:46 +00:00
Dmitry Chagin	5c5aac2d45	Regen for r283492.	2015-05-24 18:09:01 +00:00
Dmitry Chagin	9802eb9ebc	Implement Linux specific syncfs() system call.	2015-05-24 18:08:01 +00:00
Dmitry Chagin	c532a88cfc	Regen for r283488.	2015-05-24 18:05:21 +00:00
Dmitry Chagin	e1ff74c0f7	Implement recvmmsg() and sendmmsg() system calls.	2015-05-24 18:04:04 +00:00
Dmitry Chagin	b7aaa9fdb0	Reduce duplication between MD Linux code by moving msg related struct definitions out into the compat/linux/linux_socket.h	2015-05-24 18:03:14 +00:00
Dmitry Chagin	3ce05165b1	Regen for r283484.	2015-05-24 18:02:17 +00:00
Dmitry Chagin	6e4c8004dc	Implement epoll_pwait() system call.	2015-05-24 18:00:14 +00:00
Dmitry Chagin	ca045164a7	Regen for r283480.	2015-05-24 17:58:24 +00:00
Dmitry Chagin	19d8b461f4	Add utimensat() system call. The patch developed by Jilles Tjoelker and Andrew Wilcox and adopted for lemul branch by me.	2015-05-24 17:57:07 +00:00
Dmitry Chagin	0c38abc250	The kernel sends signals to the processes via ABI specific sv_sendsig method. Native ABI do not need signal conversion, only emulators may want this. Usually emulators implements its own sv_sendsig method. For now only ibcs2 emulator does not have own sv_sendsig implementation and depends on native sendsig() method. So, remove any extra attempts to convert signal numbers from native sendsig() methods except from i386 where ibsc2 is living.	2015-05-24 17:56:02 +00:00
Dmitry Chagin	4ab7403bbd	Rework signal code to allow using it by other modules, like linprocfs: 1. Linux sigset always 64 bit on all platforms. In order to move Linux sigset code to the linux_common module define it as 64 bit int. Move Linux sigset manipulation routines to the MI path. 2. Move Linux signal number definitions to the MI path. In general, they are the same on all platforms except for a few signals. 3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion tables to avoid conversion errors. 4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside of allowed on Linux signal numbers. PR: 197216	2015-05-24 17:47:20 +00:00
Dmitry Chagin	a7ac457613	According to Linux man sigaltstack(3) shall return EINVAL if the ss argument is not a null pointer, and the ss_flags member pointed to by ss contains flags other than SS_DISABLE. However, in fact, Linux also allows SS_ONSTACK flag which is simply ignored. For buggy apps (at least mono) ignore other than SS_DISABLE flags as a Linux do. While here move MI part of sigaltstack code to the appropriate place. Reported by: abi at abinet dot ru	2015-05-24 17:44:08 +00:00
Dmitry Chagin	657100de57	Regen for r283467.	2015-05-24 17:39:18 +00:00
Dmitry Chagin	fcdffc03f8	Call nosys in case when the incorrect syscall number is specified. Reported by: trinity	2015-05-24 17:38:02 +00:00
Dmitry Chagin	8d939ad405	Regen for r283465.	2015-05-24 17:35:42 +00:00
Dmitry Chagin	b6aeb7d5dd	Add preliminary fallocate system call implementation to emulate posix_fallocate() function. Differential Revision: https://reviews.freebsd.org/D1523 Reviewed by: emaste	2015-05-24 17:33:21 +00:00
Dmitry Chagin	274d2df2e5	Regen for r283451.	2015-05-24 17:00:43 +00:00
Dmitry Chagin	a6b40812ec	Implement ppoll() system call. Differential Revision: https://reviews.freebsd.org/D1105 Reviewed by: trasz	2015-05-24 16:59:25 +00:00
Dmitry Chagin	e8b026b37e	Include opt_compat.h, so that COMPAT_LINUX32 is defined, and we can access to the semop structs and functions. Submitted by: cognet@ Differential Revision: https://reviews.freebsd.org/D1095 Reviewed by: trasz	2015-05-24 16:51:04 +00:00
Dmitry Chagin	22f3dfdc12	Regen for r283444.	2015-05-24 16:50:17 +00:00
Dmitry Chagin	a31d76867d	Implement eventfd system call. Differential Revision: https://reviews.freebsd.org/D1094 In collaboration with: Jilles Tjoelker	2015-05-24 16:49:14 +00:00
Dmitry Chagin	3e89b64168	Put the correct value for the abi_nfdbits parameter of kern_select() for all supported Linuxulators. Differential Revision: https://reviews.freebsd.org/D1093 Reviewed by: trasz	2015-05-24 16:47:13 +00:00
Dmitry Chagin	28fb55359b	Regen for r283441.	2015-05-24 16:42:49 +00:00
Dmitry Chagin	e16fe1c730	Implement epoll family system calls. This is a tiny wrapper around kqueue() to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data, so we keep user data in the proc emuldata. Initial patch developed by rdivacky@ in 2007, then extended by Yuri Victorovich @ r255672 and finished by me in collaboration with mjg@ and jillies@. Differential Revision: https://reviews.freebsd.org/D1092	2015-05-24 16:41:39 +00:00
Dmitry Chagin	4d0f380d87	To avoid code duplication move open/fcntl definitions to the MI header file. Differential Revision: https://reviews.freebsd.org/D1087 Reviewed by: trasz	2015-05-24 16:31:44 +00:00
Dmitry Chagin	26c68e1fe5	Use the BSD_TO_LINUX_SIGNAL() wherever there is no need to check the ABI as it is known. Differential Revision: https://reviews.freebsd.org/D1086	2015-05-24 16:30:23 +00:00
Dmitry Chagin	437c43c1cb	Being exported through vdso the note.Linux section used by glibc to determine the kernel version (this saves one uname call). Temporarily disable the export of a note.Linux section until I figured out how to change the kernel version in the note.Linux on the fly. Differential Revision: https://reviews.freebsd.org/D1081 Reviewed by: trasz	2015-05-24 16:25:44 +00:00
Dmitry Chagin	4048f59cd0	Add AT_RANDOM and AT_EXECFN auxiliary vector entries which are used by glibc. At list since glibc version 2.16 using AT_RANDOM is mandatory. Differential Revision: https://reviews.freebsd.org/D1080	2015-05-24 16:24:24 +00:00
Dmitry Chagin	0a1884d768	Regen for r283428.	2015-05-24 16:19:57 +00:00
Dmitry Chagin	baa232bbfd	Change linux faccessat syscall definition to match actual linux one. The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented within the glibc wrapper function for faccessat(). If either of these flags are specified, then the wrapper function employs fstatat() to determine access permissions. Differential Revision: https://reviews.freebsd.org/D1078 Reviewed by: trasz	2015-05-24 16:18:03 +00:00
Dmitry Chagin	523be40fe4	Regen for r283424.	2015-05-24 16:11:21 +00:00
Dmitry Chagin	b2f587918d	Add preliminary support for x86-64 Linux binaries. Differential Revision: https://reviews.freebsd.org/D1076	2015-05-24 16:07:11 +00:00
Dmitry Chagin	bc27367760	Refund the proc emuldata struct for future use. For now move flags from thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module. Differential Revision: https://reviews.freebsd.org/D1073	2015-05-24 15:54:58 +00:00
Dmitry Chagin	67d3974849	Introduce a new module linux_common.ko which is intended for the following primary purposes: 1. Remove the dependency of linsysfs and linprocfs modules from linux.ko, which will be architecture specific on amd64. 2. Incorporate into linux_common.ko general code for platforms on which we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit). 3. Move malloc(9) declaration to linux_common.ko, to enable getting memory usage statistics properly. Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko. Temporarily remove dtrace garbage from linux_mib.c and linux_util.c Differential Revision: https://reviews.freebsd.org/D1072 In collaboration with: Vassilis Laganakos. Reviewed by: trasz	2015-05-24 15:51:18 +00:00
Dmitry Chagin	31eb438886	x86_64 Linux do not use multiplexing on ipc system calls. Move struct ipc_perm definition to the MD path as it differs for 64 and 32 bit platform. Differential Revision: https://reviews.freebsd.org/D1068 Reviewed by: trasz	2015-05-24 15:44:41 +00:00
Dmitry Chagin	26cf41d6ca	Remove stale comment about a signal trampoline which is moved to the shared page at r219609. Differential Revision: https://reviews.freebsd.org/D1063 Reviewed by: trasz	2015-05-24 15:32:52 +00:00
Dmitry Chagin	0020bdf13a	Put linux_platform into the vdso to avoid copying it onto the stack at every exec. Differential Revision: https://reviews.freebsd.org/D1062 Reviewed by: trasz	2015-05-24 15:30:52 +00:00
Dmitry Chagin	32084836c0	Eliminate a now unused global declaration of elf_linux_sysvec. Differential Revision: https://reviews.freebsd.org/D1061 Reviewed by: trasz	2015-05-24 15:29:20 +00:00
Dmitry Chagin	bdc379344a	Implement vdso - virtual dynamic shared object. Through vdso Linux exposes functions from kernel with proper DWARF CFI information so that it becomes easier to unwind through them. Using vdso is a mandatory for a thread cancelation && cleanup on a modern glibc. Differential Revision: https://reviews.freebsd.org/D1060	2015-05-24 15:28:17 +00:00
Dmitry Chagin	b2e0aad9e5	Regen for r283403.	2015-05-24 15:22:33 +00:00
Dmitry Chagin	ae50b4d7b5	Implement pselect6() system call. Differential Revision: https://reviews.freebsd.org/D1051 Reviewed by: trasz	2015-05-24 15:21:25 +00:00
Dmitry Chagin	e7fa9de6eb	Regen for r283401.	2015-05-24 15:19:44 +00:00
Dmitry Chagin	c3978c7bb1	Implement prlimit64() system call. Differential Revision: https://reviews.freebsd.org/D1050 Reviewed by: emaste, trasz	2015-05-24 15:18:19 +00:00
Dmitry Chagin	737325a46d	Regen for r283399.	2015-05-24 15:15:46 +00:00
Dmitry Chagin	254a937ee5	Implement dup3() system call. Differential Revision: https://reviews.freebsd.org/D1049 Reviewed by: emaste	2015-05-24 15:14:51 +00:00
Dmitry Chagin	f680d990e8	Regen for r283396.	2015-05-24 15:12:38 +00:00
Dmitry Chagin	7ac9766db4	Implement rt_sigqueueinfo() system call. Differential Revision: https://reviews.freebsd.org/D1047 Reviewed by: trasz	2015-05-24 15:11:32 +00:00
Dmitry Chagin	e4454275a5	Regen for r283394.	2015-05-24 15:08:25 +00:00
Dmitry Chagin	e5fe4ccf59	Implement waitid() system call. Differential Revision: https://reviews.freebsd.org/D1046	2015-05-24 15:06:39 +00:00
Dmitry Chagin	001398c4c5	To reduce code duplication introduce linux_copyout_rusage() method. Use it in linux_wait4() system call and move linux_wait4() to the MI path. While here add a prototype for the static bsd_to_linux_rusage(). Differential Revision: https://reviews.freebsd.org/D2138 Reviewed by: trasz	2015-05-24 15:03:09 +00:00
Dmitry Chagin	af682d487b	Some style(9) && whitespaces fixes. No functional changes. Differential Revision: https://reviews.freebsd.org/D1041 Reviewed by: emaste	2015-05-24 14:55:12 +00:00
Dmitry Chagin	81338031c4	Switch linuxulator to use the native 1:1 threads. The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread. Differential Revision: https://reviews.freebsd.org/D1039	2015-05-24 14:53:16 +00:00
Dmitry Chagin	91d1786f65	In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die. Differential Revision: https://reviews.freebsd.org/D1038	2015-05-24 14:51:29 +00:00
Dmitry Chagin	64cfe4dc38	Regen for r283379.	2015-05-24 14:47:00 +00:00
Dmitry Chagin	2003907d45	Implement a Linux version of sched_getparam() && sched_setparam(). Temporarily use the first thread in proc. Differential Revision: https://reviews.freebsd.org/D1036 Reviewed by: trasz	2015-05-24 14:45:57 +00:00
Dmitry Chagin	111c86e3d1	Remove a now unused include. Differential Revision: https://reviews.freebsd.org/D1035 Reviewed by: trasz	2015-05-24 14:44:57 +00:00
Dmitry Chagin	1aa90eca33	In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz	2015-05-24 14:39:26 +00:00
Dmitry Chagin	8c744294fe	Regen for r283370.	2015-05-24 14:34:46 +00:00
Dmitry Chagin	161acbb670	In preparation for switching linuxulator to the use the native 1:1 threads introduce linux_exit() stub instead of sys_exit() call (which terminates process). In the new linuxulator exit() system call terminates the calling thread (not a whole process). Differential Revision: https://reviews.freebsd.org/D1027 Reviewed by: trasz	2015-05-24 14:33:19 +00:00
Dmitry Chagin	1d80c8a8f0	In preparation for switching linuxulator to the use the native 1:1 threads print the thread id in addition to the pid in debug messages.	2015-05-24 14:29:35 +00:00
Neel Natu	47b9935d9b	Exceptions don't deliver an error code in real mode. MFC after: 1 week	2015-05-23 01:17:50 +00:00
Neel Natu	f149ce540e	Remove the verification of instruction length after instruction decode. The check has been bogus since r273375. MFC after: 1 week	2015-05-22 21:09:11 +00:00
Neel Natu	1c73ea3ef8	Don't rely on the 'VM-exit instruction length' field in the VMCS to always have an accurate length on an EPT violation. This is not needed by the instruction decoding code because it also has to work with AMD/SVM that does not provide a valid instruction length on a Nested Page Fault. In collaboration with: Leon Dang (ldang@nahannisys.com) Discussed with: grehan MFC after: 1 week	2015-05-22 17:34:22 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Neel Natu	b32d1908d5	Emulate the "CMP r/m, reg" instruction (opcode 39H). Reported and tested by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week	2015-05-21 18:23:37 +00:00
Pedro F. Giffuni	cd508278c1	ddb: finish converting boolean values. The replacement started at r283088 was necessarily incomplete without replacing boolean_t with bool. This also involved cleaning some type mismatches and ansifying old C function declarations. Pointed out by: bde Discussed with: bde, ian, jhb	2015-05-21 15:16:18 +00:00
Konstantin Belousov	100ac78be1	On amd64, make proc0 pmap initialization slightly more correct. In particular, switch to the proc0 pmap to have expected %cr3 and PCID for the thread0 during initialization, and the up to date pm_active mask. pmap_pinit0() should be done after proc0->p_vmspace is assigned so that the amd64 pmap_activate() find the correct curproc pmap. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-15 08:30:29 +00:00
Konstantin Belousov	f83e0dcb3a	Implement the support for PCID in UP kernels. Requested by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-15 07:57:47 +00:00
Jim Harris	6e3471bd0b	Add nvme and nvd drivers to GENERIC for amd64 and i386. MFC after: 3 days Sponsored by: Intel	2015-05-14 20:19:22 +00:00
Edward Tomasz Napierala	ba8f0eb8fc	Build GENERIC with RACCT/RCTL support by default. Note that it still needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf. Differential Revision: https://reviews.freebsd.org/D2407 Reviewed by: emaste@, wblock@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation	2015-05-14 14:03:55 +00:00
Konstantin Belousov	f116422f38	Initialize pcids array for the proc0 pmap. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-10 09:09:07 +00:00
Konstantin Belousov	78ac908e9b	Tweak assert to also print the thread address. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-10 09:05:57 +00:00
Konstantin Belousov	7b445033ff	On exec, single-threading must be enforced before arguments space is allocated from exec_map. If many threads try to perform execve(2) in parallel, the exec map is exhausted and some threads sleep uninterruptible waiting for the map space. Then, the thread which won the race for the space allocation, cannot single-thread the process, causing deadlock. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-10 09:00:40 +00:00
Konstantin Belousov	b40a715c11	Correct the assertion. We should compare the pmap' curcpu pcid value against 0, not the pmap. Noted by: Oliver Pinter <oliver.pinter@hardenedbsd.org> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-09 21:36:44 +00:00
Konstantin Belousov	a546448b8d	Rewrite amd64 PCID implementation to follow an algorithm described in the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency Algorithms". The same algorithm is already utilized by the MIPS pmap to handle ASIDs. The PCID for the address space is now allocated per-cpu during context switch to the thread using pmap, when no PCID on the cpu was ever allocated, or the current PCID is invalidated. If the PCID is reused, bit 63 of %cr3 can be set to avoid TLB flush. Each cpu has PCID' algorithm generation count, which is saved in the pmap pcpu block when pcpu PCID is allocated. On invalidation, the pmap generation count is zeroed, which signals the context switch code that already allocated PCID is no longer valid. The implication is the TLB shootdown for the given cpu/address space, due to the allocation of new PCID. The pm_save mask is no longer has to be tracked, which (significantly) reduces the targets of the TLB shootdown IPIs. Previously, pm_save was reset only on pmap_invalidate_all(), which made it accumulate the cpuids of all processors on which the thread was scheduled between full TLB shootdowns. Besides reducing the amount of TLB shootdowns and removing atomics to update pm_saves in the context switch code, the algorithm is much simpler than the maintanence of pm_save and selection of the right address space in the shootdown IPI handler. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-05-09 19:11:01 +00:00
Konstantin Belousov	6841f70168	Remove unused define. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-05-09 18:38:35 +00:00
Konstantin Belousov	b57a73f8e7	If x86 CPU implementation of the MWAIT instruction reasonably interacts with interrupts, query ACPI and use MWAIT for entrance into Cx sleep states. Support C1 "I/O then halt" mode. See Intel' document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface Specification" for description. Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use it instead of inlining "sti; hlt" sequence in several places. In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported} sysctls. Both jkim and avg have some other patches implementing the mwait functionality; this work is unrelated. Linux does not rely on the ACPI to provide correct tables describing Cx modes. Instead, the driver has pre-defined knowledge of the CPU models, it was supplied by Intel. Tested by: pho (previous versions) Sponsored by: The FreeBSD Foundation	2015-05-09 12:28:48 +00:00
Neel Natu	ede0403309	Check 'td_owepreempt' and yield the vcpu thread if it is set. This is done explicitly because a vcpu thread can be in a critical section for the entire time slice alloted to it. This in turn can delay the handling of the 'td_owepreempt'. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D2430	2015-05-06 23:40:24 +00:00
Neel Natu	9c4d547896	Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup(). Prior to this change both functions returned 0 for success, -1 for failure and +1 to indicate that an exception was injected into the guest. The numerical value of ERESTART also happens to be -1 so when these functions returned -1 it had to be translated to a positive errno value to prevent the VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce bugs when writing emulation code. Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if an exception was delivered to the guest. The return value is 0 or EFAULT so no additional translation is needed. Reviewed by: tychon MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D2428	2015-05-06 16:25:20 +00:00
Neel Natu	ea91ca92ba	Do a proper emulation of guest writes to MSR_EFER. - Must-Be-Zero bits cannot be set. - EFER_LME and EFER_LMA should respect the long mode consistency checks. - EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities. - Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce segment limits in 64-bit mode. MFC after: 2 weeks	2015-05-06 05:40:20 +00:00
Neel Natu	6a273d5ef7	Emulate the 'CMP r/m8, imm8' instruction encountered when booting a Windows Vista guest. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week	2015-05-04 04:27:23 +00:00
Neel Natu	317080849e	Don't advertise the Intel SMX capability to the guest. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week	2015-05-02 19:07:49 +00:00
Neel Natu	1d29bfc149	Emulate machine check related MSRs to allow guest OSes like Windows to boot. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-05-02 04:19:11 +00:00
Neel Natu	44e2f0fea9	r281630 relaxed the limits on the vectors that can be asserted in the IRRs. Do the same when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-05-01 16:00:29 +00:00
Neel Natu	fe22991fb8	Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are enabled. MFC after: 2 weeks	2015-05-01 05:11:14 +00:00
Neel Natu	8325ce5c7e	Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>. Only a subset of source files that include <machine/vmm.h> need to use the APIs that require the inclusion of <sys/cpuset.h>. MFC after: 1 week	2015-04-30 22:23:22 +00:00
Neel Natu	c07a0648ec	When an instruction cannot be decoded just return to userspace so bhyve(8) can dump the instruction bytes. Requested by: grehan MFC after: 1 week	2015-04-30 21:00:47 +00:00
Neel Natu	7d786ee2a9	Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs. This is required for booting Windows guests. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks	2015-04-30 19:23:50 +00:00
John Baldwin	ed95805e90	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes	2015-04-30 15:48:48 +00:00

... 2 3 4 5 6 ...

7339 Commits