freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	b45e10c3f4	Fix braino in r334799. Maxmem is in pages. Reported by: ae, pho Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-06-11 15:28:20 +00:00
Bruce Evans	3cd246d9a9	Untangle configuration ifdefs a little. On x86, msi is optional on pci, and also on apic in common and i386 files (except for xen it is optional only on xenhvm), but it was not ifdefed except on apic in common and i386 files. This is all that is left from an attempt to build a (sub-)minimal kernel without any devices. The isa "option" is still used without ifdefs in many standard files even on amd64. ISAPNP is not optional on at least i386. ATPIC is not optional on i386 (it is used mainly for Xspuriousint). But pci is now supposed to be optional on x86.	2018-06-10 14:49:13 +00:00
Mark Johnston	f090f67503	Tell the compiler that rdtscp clobbers %ecx.	2018-06-09 18:31:19 +00:00
Tycho Nightingale	4d20e87b7e	Don't bother looking for non-executable pages when a process is excluded from PTI. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15708	2018-06-08 20:35:58 +00:00
Matt Macy	eb7c901995	hwpmc: simplify calling convention for hwpmc interrupt handling pmc_process_interrupt takes 5 arguments when only 3 are needed. cpu is always available in curcpu and inuserspace can always be derived from the passed trapframe. While facially a reasonable cleanup this change was motivated by the need to workaround a compiler bug. core2_intr(cpu, tf) -> pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) -> pmc_add_sample(cpu, ring, pm, tf, inuserspace) In the process of optimizing the tail call the tf pointer was getting clobbered: (kgdb) up at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709 4709 pmc_save_kernel_callchain(ps->ps_pc, (kgdb) up 1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, resulting in a crash in pmc_save_kernel_callchain.	2018-06-08 04:58:03 +00:00
Mateusz Guzik	dfa5753e09	amd64: remove now unused bzero, bcmp and bcopy. move pagecopy higher up.	2018-06-08 04:18:42 +00:00
Mateusz Guzik	c9ca1a70cc	amd64: fix a retarded bug in memset memset fills the target buffer from a byte-sized value passed in as the second argument. The fully-sized (8 bytes) register containing it is named %rsi. Lower 4 bytes can be referred to as %esi and finally the lowest byte is %sil. Vast majority of all the callers just zero the target buffer and set it up by doing xor %esi,%esi which has a side-effect of zeroing the upper parts of the register as well. Some others do a word-sized move to %esi which has the same result. However, there are callers which only fill %sil. This does not clear up the rest of the register. The value of %rsi is multiplied by $0x0101010101010101 to create a 8-byte sized pattern for 8-byte stores. Prior to the patch, the func just blindly took %rsi assuming the unwanted bytes are zeroed out. Since this is not the case for the callers which only play with %sil (the rest of the register can have absolutely anything), the resulting pattern can be garbage. This has potential for funny bugs. One side effect (which was not amusing) after enabling it instead of bzero was that the kernel was hanging on boot as a xen domU. Reported by: Trond Endrestøl <Trond.Endrestol fagskolen.gjovik.no> Pointy hat: me	2018-06-08 00:47:24 +00:00
Konstantin Belousov	943defc3a0	Account for dmap limit when selecting the pages for the bootstrap pagetables. physmap[] can be inconsistent with the physical memory limit due to buggy bios, or to the hw.physmem tunable. Since bootstrap pagetables are initialized by accesses through the DMAP, we must ensure that DMAP really cover the selected pages. This is only relevant when machine has less than 4G RAM and buggy BIOS, which is the combination on Acer Chromebook 720. The call to mp_bootaddress() is moved later to have Maxmem initialized. An alternative could be to always cover 4G for DMAP, but this change seems to be simpler. Reported and tested by: grembo Reviewed by: royger Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15675	2018-06-07 17:04:34 +00:00
Matt Macy	155046394a	cpufunc: add rdtscp for x86	2018-06-07 00:54:11 +00:00
Matt Macy	07d80fd8dc	hwpmc: ABI fixes - increase pmc cpuid field from 8 to 12 bits - add cpuid version string to initialize entry in the log so that filter can identify which counter index an event name maps to - GC unused config flags - make fixed counter assignment more robust as well as the changes needed to be properly identified for filter	2018-06-04 02:05:48 +00:00
Mateusz Guzik	d0a22279db	Remove an unused argument to turnstile_unpend. PR: 228694 Submitted by: Julian Pszczołowski <julian.pszczolowski@gmail.com>	2018-06-02 22:37:53 +00:00
Mateusz Guzik	15825d5b78	amd64: add a mild depessimization to rep mov/stos users Currently all the primitives are waiting for a rewrite, tidy them up in the meantime. Vast majority of cases pass sizes which are multiple of 8. Which means the following rep stosb/movb has nothing to do. Turns out testing first if there is anything to do is a big win across the board (cpus with and without ERMS, Intel and AMD) while not pessimizing the case where there is work to do. Sample results for zeroing 64 bytes (ops/second): Ryzen Threadripper 1950X 91433212 -> 147265741 Intel(R) Xeon(R) CPU X5675 @ 3.07GHz 90714044 -> 121992888 bzero and bcopy are on their way out and were not modified. Nothing in the tree uses them.	2018-06-02 20:14:43 +00:00
Bruce Evans	c507c512b9	Finish COMPAT_AOUT support for amd64. It wasn't in any amd64 or MI file in /sys/conf, so was unavailable in configurations that don't use modules, and was not testable or notable in NOTES. Its normal configuration (not using a module) is still silently deprecated in aout(4) by not mentioning it there. Update i386 NOTES for COMPAT_AOUT. It is not i386-only, or even very MD. Sort its entry better. Finish gzip configuration (but not support) for amd64. gzip is really gzipped aout. It is currently broken even for i386 (a call to vm fails). amd64 has always attempted to configure and test it, but it depends on COMPAT_AOUT (as noted). The bug that it depends on unconfigured files was not detected since it is configured as a device. All other optional image activators are configured properly using an option.	2018-06-02 06:40:15 +00:00
Bruce Evans	49c871278a	Fix high resolution kernel profiling just enough to not crash at boot time, especially for SMP. If configured, it turns itself on at boot time for calibration, so is fragile even if never otherwise used. Both types of kernel profiling were supposed to use a global spinlock in the SMP case. If hi-res profiling is configured (but not necessarily used), this was supposed to be optimized by only using it when necessary, and slightly more efficiently, in asm. But it was not done at all for mcount entry where it is necessary. This caused crashes in the SMP case when either type of profiling was enabled. For mcount exit, it only caused wrong times. The times were wrongest with an i8254 timer since using that requires exclusive access to the hardware. The i8254 timer was too slow to use here 20 years ago and is much less usable now, but it is the default for the SMP case since TSCs weren't invariant when SMP was new. Do the locking in all hi-res SMP cases for simplicity. Calibration uses special asms, and the clobber lists in these were sort of inverted. They contained the arg and return registers which are not clobbered, but on amd64 they didn't contain the residue of the call-used registers which may be clobbered (%r10 and %r11). This usually caused hangs at boot time. This usually affected even the UP case.	2018-06-02 05:48:44 +00:00
Bruce Evans	dbe3061729	Fix recent breakages of kernel profiling, mostly on i386 (high resolution kernel profiling remains broken). memmove() was broken using ALTENTRY(). ALTENTRY() is only different from ENTRY() in the profiling case, and its use in that case was sort of backwards. The backwardness magically turned memmove() into memcpy() instead of completely breaking it. Only the high resolution parts of profiling itself were broken. Use ordinary ENTRY() for memmove(). Turn bcopy() into a tail call to memmove() to reduce complications. This gives slightly different pessimizations and profiling lossage. The pessimizations are minimized by not using a frame pointer() for bcopy(). Calls to profiling functions from exception trampolines were not relocated. This caused crashes on the first exception. Fix this using function pointers. Addresses of exception handlers in trampolines were not relocated. This caused unknown offsets in the profiling data. Relocate by abusing setidt_disp as for pmc although this is slower than necessary and requires namespace pollution. pmc seems to be missing some relocations. Stack traces and lots of other things in debuggers need similar relocations. Most user addresses were misclassified as unknown kernel addresses and then ignored. Treat all unknown addresses as user. Now only user addresses in the kernel text range are significantly misclassified (as known kernel addresses). The ibrs functions didn't preserve enough registers. This is the only recent breakage on amd64. Although these functions are written in asm, in the profiling case they call profiling functions which are mostly for the C ABI, so they only have to save call-used registers. They also have to save arg and return registers in some cases and actually save them in all cases to reduce complications. They end up saving all registers except %ecx on i386 and %r10 and %r11 on amd64. Saving these is only needed for 1 caller on each of amd64 and i386. Save them there. This is slightly simpler. Remove saving %ecx in handle_ibrs_exit on i386. Both handle_ibrs_entry and handle_ibrs_exit use %ecx, but only the latter needed to or did save it. But saving it there doesn't work for the profiling case. amd64 has more automatic saving of the most common scratch registers %rax, %rcx and %rdx (its complications for %r10 are from unusual use of %r10 by SYSCALL). Thus profiling of handle_ibrs_exit_rs() was not broken, and I didn't simplify the saving by moving the saving of these registers from it to the caller.	2018-06-02 04:25:09 +00:00
Matt Macy	e92a1350b5	hwpmc: remove unused pre-table driven bits for intel Intel now provides comprehensive tables for all performance counters and the various valid configuration permutations as text .json files. Libpmc has been converted to use these and hwpmc_core has been greatly simplified by moving to passthrough of the table values. The one gotcha is that said tables don't support pentium pro and and pentium IV. There's very few users of hwpmc on _amd64_ kernels on new hardware. It is unlikely that anyone is doing low level optimization on 15 year old Intel hardware. Nonetheless, if someone feels strongly enough to populate the corresponding tables for p4 and ppro I will reinstate the files in to the build. Code for the K8 counters and !x86 architectures remains unchanged.	2018-05-31 22:41:07 +00:00
Dimitry Andric	b451efbedc	Resolve conflicts between macros in fenv.h and ieeefp.h This is a follow-up to r321483, which disabled -Wmacro-redefined for some lib/msun tests. If an application included both fenv.h and ieeefp.h, several macros such as __fldcw(), __fldenv() were defined in both headers, with slightly different arguments, leading to conflicts. Fix this by putting all the common macros in the machine-specific versions of ieeefp.h. Where needed, update the arguments in places where the macros are invoked. This also slightly reduces the differences between the amd64 and i386 versions of ieeefp.h. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D15633	2018-05-31 20:22:47 +00:00
Mateusz Guzik	64415b8b22	amd64: switch pagecopy from non-temporal stores to rep movsq The copied data is accessed in part soon after and it results with additional cache misses during a -j 1 buildkernel WITHOUT_CTF=yes KERNFAST=1, as measured with pmc stat. before: 256165411 cache-references # 0.003 refs/inst 15105408 cache-misses # 5.897% 20.70 real # 99.67% cpu 13.24 user # 63.94% cpu 7.40 sys # 35.73% cpu after: 256764469 cache-references # 0.003 refs/inst 11913551 cache-misses # 4.640% 20.70 real # 99.67% cpu 13.19 user # 63.73% cpu 7.44 sys # 35.95% cpu Note the real time did not change, but traffic to RAM was reduced (multiple measurements performed with switching the implementation at runtime). Since nobody else is using non-temporal for this and there is no apparent benefit at least these days, don't use them either. Side note is that pagecopy arguments should probably get reversed to not have to flip them around in the primitive. Discussed with: jeff	2018-05-31 09:56:02 +00:00
Brooks Davis	cbf7e0cba7	Correct pointer subtraction in KASSERT(). The assertion would never fire without truly spectacular future programming errors. Reported by: Coverity CID: 1391370 Sponsored by: DARPA, AFRL	2018-05-29 20:03:24 +00:00
Andriy Gapon	279be68bfd	re-synchronize TSC-s on SMP systems after resume, if necessary The TSC-s are checked and synchronized only if they were good originally. That is, invariant, synchronized, etc. This is necessary on an AMD-based system where after a wakeup from STR I see that BSP clock differs from AP clocks by a count that roughly corresponds to one second. The APs are in sync with each other. Not sure if this is a hardware quirk or a firmware bug. This is what I see after a resume with this change: SMP: passed TSC synchronization test after adjustment acpi_timer0: restoring timecounter, ACPI-fast -> TSC-low Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D15551	2018-05-25 07:33:20 +00:00
Brooks Davis	5f77b8a88b	Avoid two suword() calls per auxarg entry. Instead, construct an auxargs array and copy it out all at once. Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent the array. This is the correct type where pairs of words just happend to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act on this array rather than introducing a new macro. Return errors on copyout() and suword() failures and handle them in the caller. Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64 which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32 (now removed due to the use of proper types). Reviewed by: kib Comments from: emaste, jhb Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15485	2018-05-24 16:25:18 +00:00
Matt Macy	14d13423dd	take NUMA out	2018-05-24 04:31:53 +00:00
Matt Macy	e98bbcf9ca	libpmcstat: compile in events based on json description	2018-05-24 04:30:06 +00:00
Konstantin Belousov	8936419a6c	x86: stop unconditionally clearing PSL_T on the trace trap. We certainly should clear PSL_T when calling the SIGTRAP signal handler, which is already done by all x86 sendsig(9) ABI code. On the other hand, there is no obvious reason why PSL_T needs to be cleared when returning from the signal handler. For instance, Linux allows userspace to set PSL_T and keep tracing enabled for the desired period. There are userspace programs which would use PSL_T if we make it possible, for instance sbcl. Remember if PSL_T was set by PT_STEP or PT_SETSTEP by mean of TDB_STEP flag, and only clear it when the flag is set. Discussed with: Ali Mashtizadeh Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D15054	2018-05-23 21:39:29 +00:00
Konstantin Belousov	61bc50d032	Style. Wording and reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D15054	2018-05-23 21:25:49 +00:00
Konstantin Belousov	14f7050dba	Enable IBRS when entering an interrupt handler from usermode. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-05-22 13:25:15 +00:00
John Baldwin	9e2154ff1c	Cleanups related to debug exceptions on x86. - Add constants for fields in DR6 and the reserved fields in DR7. Use these constants instead of magic numbers in most places that use DR6 and DR7. - Refer to T_TRCTRAP as "debug exception" rather than a "trace trap" as it is not just for trace exceptions. - Always read DR6 for debug exceptions and only clear TF in the flags register for user exceptions where DR6.BS is set. - Clear DR6 before returning from a debug exception handler as recommended by the SDM dating all the way back to the 386. This allows debuggers to determine the cause of each exception. For kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value to other parts of the handler (namely, user_dbreg_trap()). For user traps, wait until after trapsignal to clear DR6 so that userland debuggers can read DR6 via PT_GETDBREGS while the thread is stopped in trapsignal(). Reviewed by: kib, rgrimes MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D15189	2018-05-22 00:45:00 +00:00
Konstantin Belousov	3621ba1ede	Add Intel Spec Store Bypass Disable control. Speculative Store Bypass (SSB) is a speculative execution side channel vulnerability identified by Jann Horn of Google Project Zero (GPZ) and Ken Johnson of the Microsoft Security Response Center (MSRC) https://bugs.chromium.org/p/project-zero/issues/detail?id=1528. Updated Intel microcode introduces a MSR bit to disable SSB as a mitigation for the vulnerability. Introduce a sysctl hw.spec_store_bypass_disable to provide global control over the SSBD bit, akin to the existing sysctl that controls IBRS. The sysctl can be set to one of three values: 0: off 1: on 2: auto Future work will enable applications to control SSBD on a per-process basis (when it is not enabled globally). SSBD bit detection and control was verified with prerelease microcode. Security: CVE-2018-3639 Tested by: emaste (previous version, without updated microcode) Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-21 21:08:19 +00:00
Konstantin Belousov	2320153fcc	Preserve other bits in IA32_SPEC_CTL MSR when changing the IBRS and STIBP states. Tested by: emaste (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-05-21 21:05:55 +00:00
Konstantin Belousov	5988464ec4	Fix grammar. Submitted by: alc MFC after: 1 week	2018-05-21 19:15:05 +00:00
Konstantin Belousov	0a4b04a616	Add missed barrier for pm_gen/pm_active interaction. When we issue shootdown IPIs, we first assign zero to pm_gens to indicate the need to flush on the next context switch in case our IPI misses the context, next we read pm_active. On context switch we set our bit in pm_active, then we read pm_gen. It is crucial that both threads see the memory in the program order, otherwise invalidation thread might read pm_active bit as zero and the context switching thread might read pm_gen as zero. IA32 allows CPU for both reads to see zero. We must use the barriers between write and read. The pm_active bit set is already locked, so only the invalidation functions need it. I never saw it in real life, or at least I do not have a good reproduction case. I found this during code inspection when hunting for the Xen TLB issue reported by cperciva. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D15506	2018-05-21 18:41:16 +00:00
Mateusz Guzik	edacda736b	amd64: annotate pti with __read_frequently	2018-05-21 05:20:23 +00:00
Mark Johnston	892bdccca0	Enable kernel dump features in GENERIC for most platforms. This turns on support for kernel dump encryption and compression, and netdump. arm and mips platforms are omitted for now, since they are more constrained and don't benefit as much from these features. Reviewed by: cem, manu, rgrimes Tested by: manu (arm64) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D15465	2018-05-19 19:53:23 +00:00
Matt Macy	f5ad6b4b00	pmap: silence warnings	2018-05-19 05:58:05 +00:00
Ed Maste	3dc3b1235a	amd64 GENERIC: correct whitespace on smartpqi entry	2018-05-18 17:51:42 +00:00
Antoine Brodin	147d12a7d3	vmmdev: return EFAULT when trying to read beyond VM system memory max address Currently, when using dd(1) to take a VM memory image, the capture never ends, reading zeroes when it's beyond VM system memory max address. Return EFAULT when trying to read beyond VM system memory max address. Reviewed by: imp, grehan, anish Approved by: grehan Differential Revision: https://reviews.freebsd.org/D15156	2018-05-15 17:20:58 +00:00
John Baldwin	0b3e6e4c50	Make the common interrupt entry point labels local labels. Kernel debuggers depend on symbol names to find stack frames with a trapframe rather than a normal stack frame. The labels used for the shared interrupt entry point for the PTI and non-PTI cases did not match the existing patterns confusing debuggers. Add the '.L' prefix to mark these symbols as local so they are not visible in the symbol table. Reviewed by: kib MFC after: 1 week Sponsored by: Chelsio Communications	2018-05-14 17:27:53 +00:00
Konstantin Belousov	8b4fc8b11c	Make fpusave() and fpurestore() on amd64 ifuncs. From now on, linking amd64 kernel requires either lld or newer ld.bfd. Reviewed by: jhb (as part of the large patch) Discussed with: emaste Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13838	2018-05-10 15:01:43 +00:00
Mateusz Guzik	20ca271fdd	amd64: depessimize bcmp for small buffers Adapt assembly generated by clang for memcmp and use it for <= 64 sized compares (which are the vast majority). Sample result of doing stats on Broadwell (% of samples): before: 4.0 kernel bcmp cache_lookup after : 0.7 kernel bcmp cache_lookup The routine is most definitely still not optimal. Anyone interested in spending time improving it is welcome to take over. Reviewed by: kib	2018-05-09 15:16:25 +00:00
Konstantin Belousov	55c9d75e6b	Avoid calls to bzero() before ireloc. Evaluate cpu_stdext_feature early to have moved link_elf_ireloc() see correct flags, most important is SMAP. Tested by: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D15367	2018-05-09 14:39:24 +00:00
Konstantin Belousov	71d1bbce91	Remove PG_U from the rest of the kernel pmap ptes. Supposedly, they PG_U bits there were set to easier making some kernel page accessible to userspace in-place. Since it was not used for the whole existence of the amd64 pmap.c and current design of the shared pages prefers double-mapping over the in-place access, remove PG_U both from the direct map and KVA slots. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-05-09 12:09:08 +00:00
Konstantin Belousov	5aaa5bc3d6	Remove PG_U from the recursive pte for kernel pmap' PML4 page. This PML4 page is never used for the userspace process, so there is no security implications. But the configuration trips SMAP check, which should be corrected. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-05-09 12:03:40 +00:00
Konstantin Belousov	053641bb1c	Prepare DB# handler for deferred trigger of watchpoints. Since pop %ss/mov %ss instructions defer all interrupts and exceptions for the next instruction, it is possible that the userspace watchpoint trap executes on the first instruction of the kernel entry for syscall/bpt. In this case, DB# should be treated similarly to NMI: on amd64 we must always load GSBASE even if the trap comes from kernel mode, and load the kernel page table root into %cr3. Moreover, the trap must use the dedicated stack, because we are still on the user stack when trapped on syscall entry. For i386, we must reload %cr3. The syscall instruction is not configured, so there is no issue with executing on user stack when trapping. Due to some CPU erratas it is not always possible to detect that the userspace watchpoint triggered by inspecting %dr6. In trap(), compare the trap %rip with the known unsafe entry points and if matched pretend that the watchpoint did not fire at all. Thank you to the MSRC Incident Response Team, and in particular Greg Lenti and Nate Warfield, for coordinating the response to this issue across multiple vendors. Thanks to Computer Recycling at The Working Center of Kitchener for making hardware available to allow us to test the patch on additional CPU families. Reviewed by: jhb Discussed with: Matthew Dillon Tested by: emaste Sponsored by: The FreeBSD Foundation Security: CVE-2018-8897 Security: FreeBSD-SA-18:06.debugreg	2018-05-08 17:00:34 +00:00
Mateusz Guzik	a9456603f2	amd64: stop asserting params != NULL in the syscall path The parameter is effectively controllable by userspace. It does not matter what it is set to as it is being passed to copyin - worst case the operation will just fail. While here stop computing it unless it is going to be used. Noted by: dillon@backplane.com	2018-05-07 21:32:08 +00:00
Mateusz Guzik	bed34b0b04	amd64: fix up memset added in r333324 There was a missing trick expanding the passed pattern to a full word by multiplication. As a side effect non-zero patterns would be incorrectly laid down. This stems from the use of rep stosq which is word-sized, while the passed argument is byte-sized. I initially repurposed memcpy into memset without taking this into account. All but non-bzero testing was performed with a variant utilizing ERMS, i.e. using only stosb which happens to not into the problem whatsoever. So my bad twice. Thanks to Oliver Pinter for noting the problem and providing a testcase.	2018-05-07 20:54:42 +00:00
Mateusz Guzik	f185a3dc33	amd64: tweak the memmove comment regarding authorship To make it clear the mentioned author did not write memmove.	2018-05-07 17:37:07 +00:00
Mateusz Guzik	6a909b9680	amd64: replace libkern's memset and memmove with assembly variants memmove is repurposed bcopy (arguments swapped, return value added) The libkern variant is a wrapper around bcopy, so this is a big improvement. memset is repurposed memcpy. The librkern variant is doing fishy stuff, including branching on 0 and calling bzero. Both functions are rather crude and subject to partial depessimization. This is a soft prerequisite to adding variants utilizing the 'Enhanced REP MOVSB/STOSB' bit and let the kernel patch at runtime.	2018-05-07 15:07:28 +00:00
Mateusz Guzik	ac7edb45e1	amd64: syscall path bcopy -> memcpy	2018-05-04 22:41:12 +00:00
Mateusz Guzik	f0648bcc04	amd64: get rid of the pessimized bcopy in syscall arg copy The code was unnecessarily conditionally copying either 5 or 6 args. It can blindly copy 6, which also means the size is known at compilation time and the operation can be depessimized. Note the entire syscall handling code is rather slow. Tested on Skylake, sample result for getppid (calls/s): without pti: 7310106 -> 10653569 with pti: 3304843 -> 4148306 Some syscalls (like read) did not note any difference, other have typically very modest wins.	2018-05-04 04:05:07 +00:00
Konstantin Belousov	7035cf14ee	Implement support for ifuncs in the kernel linker. Required MD bits are only provided for x86. Reviewed by: jhb (previous version, as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D13838	2018-05-03 21:37:46 +00:00

1 2 3 4 5 ...

7730 Commits