freebsd-skq

Author	SHA1	Message	Date
Dag-Erling Smørgrav	e2082935f0	As discussed on -current, remove the hardcoded default maxswzone. MFC after: 3 weeks	2012-08-14 17:01:21 +00:00
Konstantin Belousov	b6d3609050	Add a hackish debugging facility to provide a bit of information about reason for generated trap. The dump of basic signal information and 8 bytes of the faulting instruction are printed on the controlling terminal of the process, if the machdep.uprintf_signal syscal is enabled. The print is the only practical way to debug traps from a.out processes I am aware of. Because I have to reimplement it each time I debug an issue with a.out support on amd64, commit the hack to main tree. MFC after: 1 week	2012-08-14 12:15:01 +00:00
Konstantin Belousov	95fd15898b	Real hardware, as opposed to QEMU, does not allow to have a call gate in long mode which transfers control to 32bit code segment. Unbreak the lcall $7,$0 implementation on amd64 by putting the 64bit user code segment' selector into call gate, and execute the 64bit trampoline which converts the return frame into 32bit format and switches back to 32bit mode for executing int $0x80 trampoline. Note that all jumps over the hoops are performed in the user mode. MFC after: 1 week	2012-08-14 12:13:27 +00:00
John Baldwin	6e4ac34b07	Remove the deassert INIT IPI from the IPI startup sequence for APs. It is not listed in the boot sequence in the MP specification (1.4), and it is explicitly ignored on modern CPUs. It was only ever required when bootstrapping systems with external APICs (that is, SMP machines with 486s), which FreeBSD has never supported (and never will). While here, tidy some comments and remove some banal ones.	2012-08-13 18:52:51 +00:00
John Baldwin	a4284ef768	Add a 10 millisecond delay after sending the initial INIT IPI. This matches the algorithm in the MP specification (1.4). Previously we were sending out the deassert INIT IPI immediately after the initial INIT IPI was sent.	2012-08-13 16:33:22 +00:00
Colin Percival	347c7fd7bf	Build modules along with the XENHVM kernels. No objections from: freebsd-xen mailing list MFC after: 1 week	2012-08-13 07:36:57 +00:00
Alan Cox	663f8700d4	The assertion that I added in r238889 could legitimately fail when a debugger creates a breakpoint. Replace that assertion with a narrower one that still achieves my objective. Reported and tested by: kib	2012-08-08 05:28:30 +00:00
Konstantin Belousov	65211d02c4	Do not apply errata 721 workaround when under hypervisor, since typical hypervisor does not implement access to the required MSR, causing #GP on boot. Reported and tested by: olgeni PR: amd64/170388 MFC after: 3 days	2012-08-07 08:36:10 +00:00
Sergey Kandaurov	16ec457aeb	Remove duplicate header inclusion of <sys/sysent.h> Discussed with: bz	2012-08-07 05:46:36 +00:00
Alan Cox	59fa03faa3	Shave off a few more cycles from the average execution time of pmap_enter() by simplifying the control flow and reducing the live range of "om".	2012-08-05 16:59:02 +00:00
Konstantin Belousov	0220d04fe3	Add lfence(). MFC after: 1 week	2012-08-01 17:24:53 +00:00
Alan Cox	879eedbc7b	Revise pmap_enter()'s handling of mapping updates that change the PTE's PG_M and PG_RW bits but not the physical page frame. First, only perform vm_page_dirty() on a managed vm_page when the PG_M bit is being cleared. If the updated PTE continues to have PG_M set, then there is no requirement to perform vm_page_dirty(). Second, flush the mapping from the TLB when PG_M alone is cleared, not just when PG_M and PG_RW are cleared. Otherwise, a stale TLB entry may stop PG_M from being set again on the next store to the virtual page. However, since the vm_page's dirty field already shows the physical page as being dirty, no actual harm comes from the PG_M bit not being set. Nonetheless, it is potentially confusing to someone expecting to see the PTE change after a store to the virtual page.	2012-08-01 16:04:13 +00:00
Konstantin Belousov	a42fa0af44	Change (unused) prototype for stmxcsr() to match reality. Noted by: jhb MFC after: 1 week	2012-07-30 19:26:02 +00:00
Alan Cox	bc27d6c608	Shave off a few more cycles from pmap_enter()'s critical section. In particular, do a little less work with the PV list lock held.	2012-07-29 18:20:49 +00:00
Konstantin Belousov	59c1a8315c	Forcibly shut up clang warning about NULL pointer dereference. MFC after: 3 weeks	2012-07-23 19:16:31 +00:00
Konstantin Belousov	7c80fcfdba	Constently use 2-space sentence breaks. Submitted by: bde MFC after: 1 week	2012-07-21 13:53:00 +00:00
Konstantin Belousov	1965c139f1	Stop caching curpcb in the local variable. Requested by: bde MFC after: 1 week	2012-07-21 13:47:37 +00:00
Konstantin Belousov	700de5109a	The PT_I386_{GET,SET}XMMREGS and PT_{GET,SET}XSTATE operate on the stopped threads. Implementation assumes that the thread's FPU context is spilled into the PCB due to stop. This is mostly true, except when FPU state for the thread is not initialized. Then the requests operate on the garbage state which is currently left in the PCB, causing confusion. The situation is indeed observed after a signal delivery and before #NM fault on execution of any FPU instruction in the signal handler, since sendsig(9) drops FPU state for current thread, clearing PCB_FPUINITDONE. When inspecting context state for the signal handler, debugger sees the FPU state of the main program context instead of the clear state supposed to be provided to handler. Fix this by forcing clean FPU state in PCB user FPU save area by performing getfpuregs(9) before accessing user FPU save area in ptrace_machdep.c. Note: this change will be merged to i386 kernel as well, where it is much more important, since e.g. gdb on i386 uses PT_I386_GETXMMREGS to inspect FPU context on CPUs that support SSE. Amd64 version of gdb uses PT_GETFPREGS to inspect both 64 and 32 bit processes, which does not exhibit the bug. Reported by: bde MFC after: 1 week	2012-07-21 13:06:37 +00:00
Konstantin Belousov	dfa8a51288	Stop clearing x87 exceptions in the #MF handler on amd64. If user code understands FPU hardware enough to catch SIGFPE and unmask exceptions in control word, then it may as well properly handle return from SIGFPE without causing an infinite loop of #MF exceptions due to faulting instruction restart, when needed. Clearing exceptions causes information loss for handlers which do understand FPU hardware, and struct siginfo si_code member cannot be considered adequate replacement for en_sw content due to translation. Supposed reason for clearing the exceptions, which is IRQ13 handling oddities, were never applicable to amd64. Note: this change will be merged to i386 kernel as well, since we do not support IRQ13 delivery of #MF notifications for some time. Requested by: bde MFC after: 1 week	2012-07-21 13:05:34 +00:00
Konstantin Belousov	83b22b05e6	Introduce curpcb magic variable, similar to curthread, which is MD amd64. It is implemented as __pure2 inline with non-volatile asm read from pcpu, which allows a compiler to cache its results. Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb. Note that __curthread() uses magic value 0 as an offsetof(struct pcpu, pc_curthread). It seems to be done this way due to machine/pcpu.h needs to be processed before sys/pcpu.h, because machine/pcpu.h contributes machine-depended fields to the struct pcpu definition. As result, machine/pcpu.h cannot use struct pcpu yet. The __curpcb() also uses a magic constant instead of offsetof(struct pcpu, pc_curpcb) for the same reason. The constants are now defined as symbols and CTASSERTs are added to ensure that future KBI changes do not break the code. Requested and reviewed by: bde MFC after: 3 weeks	2012-07-19 19:09:12 +00:00
Alan Cox	3088e08c4b	Don't unnecessarily set PGA_REFERENCED in pmap_enter().	2012-07-19 05:34:19 +00:00
Konstantin Belousov	bc84db6267	On AMD64, provide siginfo.si_code for floating point errors when error occurs using the SSE math processor. Update comments describing the handling of the exception status bits in coprocessors control words. Remove GET_FPU_CW and GET_FPU_SW macros which were used only once. Prefer to use curpcb to access pcb_save over the longer path of referencing pcb through the thread structure. Based on the submission by: Ed Alley <wea llnl gov> PR: amd64/169927 Reviewed by: bde MFC after: 3 weeks	2012-07-18 15:43:47 +00:00
Konstantin Belousov	a81f9fed5d	Add stmxcsr. Submitted by: Ed Alley <wea llnl gov> PR: amd64/169927 MFC after: 3 weeks	2012-07-18 15:36:03 +00:00
Konstantin Belousov	333d0c6060	Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month	2012-07-14 15:48:30 +00:00
Alan Cox	b9592bdab3	Wring a few cycles out of pmap_enter(). In particular, on a user-space pmap, avoid walking the page table twice.	2012-07-13 04:10:41 +00:00
John Baldwin	d706ec297a	Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h> on x86 and use that to implement stop_emulating() in the fpu/npx code. Reimplement start_emulating() in the non-XEN case by using load_cr0() and rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in the description of these instructions in Volume 2 of the ADM. Reviewed by: kib MFC after: 1 month	2012-07-09 20:55:39 +00:00
John Baldwin	5355f65974	Partially revert r217515 so that the mem_range_softc variable is always present on x86 kernels. This fixes the build of kernels that include 'device acpi' but do not include 'device mem'. MFC after: 1 month	2012-07-09 20:42:08 +00:00
Konstantin Belousov	f18d5bf44b	Use assembler mnemonic instead of manually assembling, contination for r238142. Reviewed by: jhb MFC after: 1 month	2012-07-06 20:11:58 +00:00
John Baldwin	6632f45773	Several fixes to the amd64 disassembler: - Add generic support for opcodes that are escape bytes used for multi-byte opcodes (such as the 0x0f prefix). Use this to replace the hard-coded 0x0f special case and add support for three-byte opcodes that use the 0x0f38 prefix. - Decode all Intel VMX instructions. invept and invvpid in particular are three-byte opcodes that use the 0x0f38 escape prefix. - Rework how the special 'SDEP' size flag works such that the default instruction name (i_name) is the instruction when the data size prefix (0x66) is not specified, and the alternate name in i_extra is used when the prefix is included. - Add a new 'ADEP' size flag similar to 'SDEP' except that it chooses between i_name and i_extra based on the address size prefix (0x67). Use this to fix the decoding for jrcxz vs jecxz which is determined by the address size prefix, not the operand size prefix. Also, jcxz is not possible in 64-bit mode, but jrcxz is the default instruction for that opcode. - Add support for handling instructions that have a mandatory 'rep' prefix (this means not outputting the 'repe ' prefix until determining if it is used as part of an opcode). Make 'pause' less of a special case this way. - Decode 'cmpxchg16b' and 'cdqe' which are variants of other instructions but with a REX.W prefix. MFC after: 1 month	2012-07-06 14:25:59 +00:00
Alan Cox	cc861283f4	Make pmap_enter()'s management of PV entries consistent with the other pmap functions that manage PV entries. Specifically, remove the PV entry from the containing PV list only after the corresponding PTE is destroyed. Update the pmap's wired mapping count in pmap_enter() before the PV list lock is acquired.	2012-07-06 06:42:25 +00:00
John Baldwin	7574a595f2	Now that our assembler supports the xsave family of instructions, use them natively rather than hand-assembled versions. For xgetbv/xsetbv, add a wrapper API to deal with xcr* registers: rxcr() and load_xcr(). Reviewed by: kib MFC after: 1 month	2012-07-05 18:19:35 +00:00
Alan Cox	8f2994ce67	Calculate the new PTE value in pmap_enter() before acquiring any locks. Move an assertion to the beginning of pmap_enter().	2012-07-05 07:20:16 +00:00
Alan Cox	1bc8531c1e	Correct an error in r237513. The call to reserve_pv_entries() must come before pmap_demote_pde() updates the PDE. Otherwise, pmap_pv_demote_pde() can crash. Crash reported by: kib Patch tested by: kib	2012-07-05 00:08:47 +00:00
John Baldwin	66f9aec075	Decode the 'xsave', 'xrstor', 'xsaveopt', 'xgetbv', 'xsetbv', and 'rdtscp' instructions. MFC after: 1 month	2012-07-04 16:47:39 +00:00
Xin LI	309dca0171	tws(4) is interfaced with CAM so move it to the same section. Reported by: joel MFC after: 3 days	2012-07-01 08:10:49 +00:00
Alan Cox	2bde6e3518	Optimize reserve_pv_entries() using the popcnt instruction.	2012-06-30 20:25:12 +00:00
Alan Cox	92e2574577	In r237592, I forgot that pmap_enter() might already hold a PV list lock at the point that it calls get_pv_entry(). Thus, pmap_enter()'s PV list lock pointer must be passed to get_pv_entry() for those rare occasions when get_pv_entry() calls reclaim_pv_chunk(). Update some related comments.	2012-06-29 18:15:56 +00:00
Alan Cox	6c67613030	Avoid some unnecessary PV list locking in pmap_enter().	2012-06-28 22:03:59 +00:00
Alan Cox	23e59dfa8d	Optimize pmap_pv_demote_pde().	2012-06-28 05:42:04 +00:00
Alan Cox	e30df26e7b	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.	2012-06-27 03:45:25 +00:00
Alan Cox	5b5b0ef34d	Introduce RELEASE_PV_LIST_LOCK().	2012-06-26 16:45:18 +00:00
Alan Cox	0d646df757	Add PV list locking to pmap_enter(). Its execution is no longer serialized by the pvh global lock. Add a needed atomic operation to pmap_object_init_pt().	2012-06-26 06:02:43 +00:00
Alan Cox	aaf3bc56fd	Add PV chunk and list locking to pmap_change_wiring(), pmap_protect(), and pmap_remove(). The execution of these functions is no longer serialized by the pvh global lock. Make some stylistic changes to the affected code for the sake of consistency with related code elsewhere in the pmap.	2012-06-25 07:13:25 +00:00
Alan Cox	f745b16359	Introduce reserve_pv_entry() and use it in pmap_pv_demote_pde(). In order to add PV list locking to pmap_pv_demote_pde(), it is necessary to change the way that pmap_pv_demote_pde() allocates PV entries. Specifically, once pmap_pv_demote_pde() begins modifying the PV lists, it can't allocate any new PV chunks, because that could require the PV list lock to be dropped. So, all necessary PV chunks must be allocated in advance. To my surprise, this new approach is a few percent faster than the old one.	2012-06-23 22:54:25 +00:00
Konstantin Belousov	aea810386d	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:06:40 +00:00
Konstantin Belousov	232aa31fb9	Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to timekeeping information. MFC after: 1 week	2012-06-22 06:38:31 +00:00
Alan Cox	240cc83f55	Introduce CHANGE_PV_LIST_LOCK_TO_{PHYS,VM_PAGE}() to avoid duplication of code.	2012-06-22 05:01:36 +00:00
Alan Cox	290d3e6395	Update the PV stats in free_pv_entry() using atomics. After which, it is no longer necessary for free_pv_entry() to be serialized by the pvh global lock. Retire pmap_insert_entry() and pmap_remove_entry(). Once upon a time, these functions were called from multiple places within the pmap. Now, each has only one caller.	2012-06-21 16:37:36 +00:00
Alan Cox	7ed5b3afa2	Add PV list locking to pmap_copy(), pmap_enter_object(), and pmap_enter_quick(). These functions are no longer serialized by the pvh global lock. There is no need to release the PV list lock before calling free_pv_chunk() in pmap_remove_pages().	2012-06-20 07:25:20 +00:00
Alan Cox	2f49b6b831	Condition the implementation of pv_entry_count on PV_STATS. On amd64, pv_entry_count is purely informational. It does not serve any functional purpose. Add PV chunk locking to get_pv_entry().	2012-06-19 08:12:44 +00:00

1 2 3 4 5 ...

6115 Commits