freebsd-skq

Author	SHA1	Message	Date
mjg	420c428175	amd64: annotate the syscall return address check with __predict_false before: 0xffffffff80b03ebb <+2059>: mov 0x460(%r14),%rax 0xffffffff80b03ec2 <+2066>: mov 0x98(%rax),%rax 0xffffffff80b03ec9 <+2073>: shr $0x2f,%rax 0xffffffff80b03ecd <+2077>: je 0xffffffff80b03edd <amd64_syscall+2093> 0xffffffff80b03ecf <+2079>: mov 0x3f8(%r14),%rax 0xffffffff80b03ed6 <+2086>: orl $0x1,0xc8(%rax) 0xffffffff80b03edd <+2093>: add $0xf8,%rsp after: 0xffffffff80b03ebb <+2059>: mov 0x460(%r14),%rax 0xffffffff80b03ec2 <+2066>: mov 0x98(%rax),%rax 0xffffffff80b03ec9 <+2073>: shr $0x2f,%rax 0xffffffff80b03ecd <+2077>: jne 0xffffffff80b03eef <amd64_syscall+2111> 0xffffffff80b03ecf <+2079>: add $0xf8,%rsp Reviewed by: kib MFC after: 1 week	2017-08-02 11:25:38 +00:00
kib	38f0d940ba	Do not call trapsignal() after handling usermode fault or interrupt, when a signal is not intended to be sent. The variable holding the signal number to send is left uninitialized, which sometimes triggers invalid signal checks. For NMI, a return to usermode without ast processing is done. On the other hand, for spurious dtrace probe interrupt it is usermode which triggered the interrupt, so handle it through userret() as any other fault. Reported by: Nils Beyer <nbe@renzel.net> PR: 221151 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-02 10:12:10 +00:00
truckman	43c5cd502a	Lower the amd64 shared page, which contains the signal trampoline, from the top of user memory to one page lower on machines with the Ryzen (AMD Family 17h) CPU. This pushes ps_strings and the stack down by one page as well. On Ryzen there is some sort of interaction between code running at the top of user memory address space and interrupts that can cause FreeBSD to either hang or silently reset. This sounds similar to the problem found with DragonFly BSD that was fixed with this commit: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20 but our signal trampoline location was already lower than the address that DragonFly moved their signal trampoline to. It also does not appear to be related to SMT as described here: https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498 "Hi, Matt Dillon here. Yes, I did find what I believe to be a hardware issue with Ryzen related to concurrent operations. In a nutshell, for any given hyperthread pair, if one hyperthread is in a cpu-bound loop of any kind (can be in user mode), and the other hyperthread is returning from an interrupt via IRETQ, the hyperthread issuing the IRETQ can stall indefinitely until the other hyperthread with the cpu-bound loop pauses (aka HLT until next interrupt). After this situation occurs, the system appears to destabilize. The situation does not occur if the cpu-bound loop is on a different core than the core doing the IRETQ. The %rip the IRETQ returns to (e.g. userland %rip address) matters a LOT. The problem occurs more often with high %rip addresses such as near the top of the user stack, which is where DragonFly's signal trampoline traditionally resides. So a user program taking a signal on one thread while another thread is cpu-bound can cause this behavior. Changing the location of the signal trampoline makes it more difficult to reproduce the problem. I have not been because the able to completely mitigate it. When a cpu-thread stalls in this manner it appears to stall INSIDE the microcode for IRETQ. It doesn't make it to the return pc, and the cpu thread cannot take any IPIs or other hardware interrupts while in this state." since the system instability has been observed on FreeBSD with SMT disabled. Interrupts to appear to play a factor since running a signal-intensive process on the first CPU core, which handles most of the interrupts on my machine, is far more likely to trigger the problem than running such a process on any other core. Also lower sv_maxuser to prevent a malicious user from using mmap() to load and execute code in the top page of user memory that was made available when the shared page was moved down. Make the same changes to the 64-bit Linux emulator. PR: 219399 Reported by: nbe@renzel.net Reviewed by: kib Reviewed by: dchagin (previous version) Tested by: nbe@renzel.net (earlier version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11780	2017-08-02 01:43:35 +00:00
markj	b23f6b9b03	Batch updates to v_wire_count when freeing page table pages on x86. The removed release stores are not needed since stores are totally ordered on i386 and amd64. Reviewed by: alc, kib (previous revision) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11790	2017-08-01 05:26:30 +00:00
kib	030abb1986	Remove unused symbols. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-30 21:52:22 +00:00
dchagin	93ea6317f4	Avoid using [LINUX_]SHAREDPAGE constant directly in the vdso code. This is needed for https://reviews.freebsd.org/D11780. Reported by: kib@	2017-07-30 21:24:20 +00:00
alc	621c14d3f9	Add support for pmap_enter(..., psind=1) to the amd64 pmap. In other words, add support for explicitly requesting that pmap_enter() create a 2MB page mapping. (Essentially, this feature allows the machine-independent layer to create superpage mappings preemptively, and not wait for automatic promotion to occur.) Export pmap_ps_enabled() to the machine-independent layer. Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or reclaim a PV entry when one is not available. Refactor pmap_enter_pde() into two functions, one by the same name, that is a general-purpose function for creating PDE PG_PS mappings, and another, pmap_enter_2mpage(), that is used to prefault 2MB read- and/or execute-only mappings for execve(2), mmap(2), and shmat(2). Submitted by: Yufeng Zhou <yz70@rice.edu> (an earlier version) Reviewed by: kib, markj Tested by: pho MFC after: 10 days Differential Revision: https://reviews.freebsd.org/D11556	2017-07-23 06:33:58 +00:00
rlibby	dfe1112fa8	__pcpu: gcc -Wredundant-decls Pollution from counter.h made __pcpu visible in amd64/pmap.c. Delete the existing extern decl of __pcpu in amd64/pmap.c and avoid referring to that symbol, instead accessing the pcpu region via PCPU_SET macros. Also delete an unused extern decl of __pcpu from mp_x86.c. Reviewed by: kib Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11666	2017-07-21 17:11:36 +00:00
rlibby	1eb6bbbb6e	efi: restrict visibility of EFIABI_ATTR-declared functions In-tree gcc (4.2) doesn't understand __attribute__((ms_abi)) (EFIABI_ATTR). Avoid declaring functions with that attribute when the compiler is detected to be gcc < 4.4. Reviewed by: kib, imp (previous version) Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11636	2017-07-20 06:47:06 +00:00
alc	91e58f1e61	Style-only change: Consistently use the variable name "pdpg" throughout this file. Previously, half of the pointers to a vm_page being used as a page directory page were named "pdpg" and the rest were named "mpde". Discussed with: kib MFC after: 1 week	2017-07-15 16:42:55 +00:00
alc	cebebf1a21	Extract the innermost loop of pmap_remove() out into its own function, pmap_remove_ptes(). (This new function will also be used by an upcoming change to pmap_enter() that adds support for psind == 1 mappings.) Submitted by: Yufeng Zhou <yz70@rice.edu> (an earlier version) Reviewed by: kib, markj MFC after: 1 week	2017-07-15 01:49:54 +00:00
kib	4021fe5d78	Fix size argument to vm_pager_allocate(), it is in bytes, not in pages. It is believed to be only cosmetic. Noted by: andrew Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-13 08:23:37 +00:00
kib	7fe4b6ffe3	Revert r320936 to recommit with the correct log message.	2017-07-13 08:23:12 +00:00
kib	c5a4bd67a3	It is believed to be only cosmetic. Noted by: andrew Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-13 08:19:50 +00:00
ian	242599409b	Protect access to the AT realtime clock with its own mutex. The mutex protecting access to the registered realtime clock should not be overloaded to protect access to the atrtc hardware, which might not even be the registered rtc. More importantly, the resettodr mutex needs to be eliminated to remove locking/sleeping restrictions on clock drivers, and that can't happen if MD code for amd64 depends on it. This change moves the protection into what's really being protected: access to the atrtc date and time registers. This change also adds protection when the clock is accessed from xentimer_settime(), which bypasses the resettodr locking. Differential Revision: https://reviews.freebsd.org/D11483	2017-07-12 02:42:57 +00:00
imp	a87c7a85be	An MMC/SD/SDIO stack using CAM Implement the MMC/SD/SDIO protocol within a CAM framework. CAM's flexible queueing will make it easier to write non-storage drivers than the legacy stack. SDIO drivers from both the kernel and as userland daemons are possible, though much of that functionality will come later. Some of the CAM integration isn't complete (there are sleeps in the device probe state machine, for example), but those minor issues can be improved in-tree more easily than out of tree and shouldn't gate progress on other fronts. Appologies to reviews if specific items have been overlooked. Submitted by: Ilya Bakulin Reviewed by: emaste, imp, mav, adrian, ian Differential Review: https://reviews.freebsd.org/D4761 merge with first commit, various compile hacks.	2017-07-09 16:57:24 +00:00
rlibby	2160513c84	amd-vi: gcc build errors amdvi_cmp_wait: gcc complained about a malformed string behind an ifdef. struct amdvi_dte: widen the type of the first reserved bitfield so that the packed representation would not cross an alignment boundary for that type. Apparently that causes in-tree gcc (4.2) to insert padding (despite packed, resulting in a wrong structure definition), and causes more modern gcc to emit a warning. ivrs_hdr_iterate_tbl: delete a misleading check about header length being less than 0 (the type is unsigned) and replace it with a check that the length doesn't exceed the table size. Reviewed by: anish, grehan Approved by: markj (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11485	2017-07-07 06:37:19 +00:00
sbruno	12de4d18c8	Garbage collect kernel option TWA_FLASH_FIRMWARE Submitted by: kevin.bowling0kev009.com Differential Revision: https://reviews.freebsd.org/D11387	2017-07-03 19:33:50 +00:00
dchagin	a0f4bac287	Add support for musl consumers to the Linuxulator. PR: 213809 Submitted by: Yonas Yanfa Reported by: Yonas Yanfa MFC after: 1 week Relnotes: yes	2017-07-03 10:24:49 +00:00
alc	bf2a2bba1b	When "force" is specified to pmap_invalidate_cache_range(), the given start address is not required to be page aligned. However, the loop within pmap_invalidate_cache_range() that performs the actual cache line invalidations requires that the starting address be truncated to a multiple of the cache line size. This change corrects an error in that truncation. Submitted by: Brett Gutstein <bgutstein@rice.edu> Reviewed by: kib MFC after: 1 week	2017-07-01 16:42:09 +00:00
jah	d1caaa9300	Clean up MD pollution of bus_dma.h: --Remove special-case handling of sparc64 bus_dmamap* functions. Replace with a more generic mechanism that allows MD busdma implementations to generate inline mapping functions by defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This is currently useful for sparc64, x86, and arm64, which all implement non-load dmamap operations as simple wrappers around map objects which may be bus- or device-specific. --Remove NULL-checked bus_dmamap macros. Implement the equivalent NULL checks in the inlined x86 implementation. For non-x86 platforms, these checks are a minor pessimization as those platforms do not currently allow NULL maps. NULL maps were originally allowed on arm64, which appears to have been the motivation behind adding arm[64]-specific barriers to bus_dma.h, but that support was removed in r299463. --Simplify the internal interface used by the bus_dmamap_load* variants and move it to bus_dma_internal.h --Fix some drivers that directly include sys/bus_dma.h despite the recommendations of bus_dma(9) Reviewed by: kib (previous revision), marius Differential Revision: https://reviews.freebsd.org/D10729	2017-07-01 05:35:29 +00:00
kib	3c908eddcb	Translate between abridged and full x87 tags for compat32 ptrace(PT_GETFPREGS). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-06-24 11:38:31 +00:00
kib	e2a14c603f	Move struct syscall_args syscall arguments parameters container into struct thread. For all architectures, the syscall trap handlers have to allocate the structure on the stack. The structure takes 88 bytes on 64bit arches which is not negligible. Also, it cannot be easily found by other code, which e.g. caused duplication of some members of the structure to struct thread already. The change removes td_dbg_sc_code and td_dbg_sc_nargs which were directly copied from syscall_args. The structure is put into the copied on fork part of the struct thread to make the syscall arguments information correct in the child after fork. This move will also allow several more uses shortly. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080	2017-06-12 21:03:23 +00:00
kib	7b6fe97487	Make struct syscall_args visible to userspace compilation environment from machine/proc.h, consistently on all architectures. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080	2017-06-12 20:53:44 +00:00
alc	56e961bd47	Eliminate duplication of the pmap and pv list unlock operations in pmap_enter() by implementing a single return path. Otherwise, the duplication will only increase with the upcoming support for psind == 1. Reviewed by: kib (some time ago)	2017-06-03 17:24:13 +00:00
dchagin	8ab86e050d	In r246085 some bits that are MI movied out into headers in compat/linux, but I missed that when I commited x86_64 Linuxulator. So remove the duplicates. MFC after: 1 week	2017-05-28 08:46:57 +00:00
trasz	31b7c45db9	Bump default MAXTSIZ (kern.maxtsiz) from 128MB to 32GB. The old limit prevents one from running eg clang built with debug; the new one is arbitrary (equal to MAXDSIZ) and... well, should be quite future-proof. Same fix might be applicable to other 64 bit architectures; I'll ask their respective maintainers to make sure it won't break anything. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10758	2017-05-17 08:38:41 +00:00
emaste	1901c3e1f2	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
cem	5c7d65801e	Correct page frame mask constant used in pmap_change_attr_locked This was introduced in r290156. It's present in 11.0, but not any 10.x release unless someone decided to MFC it. It affects ordinary pages right above the DMAP limit, which is effectively system memory rounded up to a 1 GB (3rd level superpage) boundary (or up to a minimum of 4 GB, on small systems). Reported by: vangyzen Reviewed by: kib, alc Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D4030	2017-05-16 16:20:22 +00:00
kib	542fd28222	Ensure that resume path on amd64 only accesses page tables for normal operation after processor is configured to allow all required features. In particular, NX must be enabled in EFER, otherwise load of page table element with nx bit set causes reserved bit page fault. Since malloc uses direct mapping for small allocations, in particular for the suspension pcbs, and DMAP is nx after r316767, this commit tripped fault on resume path. Restore complete state of EFER while wakeup code is still executing with custom page table, before calling resumectx, instead of trying to guess which features might be needed before resumectx restored EFER on its own. Bisected and tested by: trasz Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-05-15 20:52:43 +00:00
sephe	1e56a84cc0	pcicfg: Fix direct calls of pci_cfg{read,write} on systems w/o PCI host bridge. Reported by: dexuan@ Reviewed by: jhb@ MFC after: 1 week Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D10564	2017-05-04 05:28:46 +00:00
anish	6bb9bc6652	Add AMD IOMMU/AMD-Vi support in bhyve for passthrough/direct assignment to VMs. To enable AMD-Vi, set hw.vmm.amdvi.enable=1. Reviewed by:bcr Approved by:grehan Tested by:rgrimes Differential Revision:https://reviews.freebsd.org/D10049	2017-04-30 02:08:46 +00:00
jkim	9445e9d6e1	Use kmem_malloc() instead of malloc(9) for the native amd64 filter. r316767 broke the BPF JIT compiler for amd64 because malloc()'d space is no longer executable. Discussed with: kib, alc	2017-04-17 22:02:09 +00:00
jkim	22babe5387	Move declarations for a machine-dependent function to the header file.	2017-04-17 21:51:26 +00:00
glebius	21ead51d79	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
glebius	0266d4259e	Remove unused assembly symbols pointing to vmmeter.	2017-04-17 17:18:07 +00:00
glebius	5763443023	All these files need sys/vmmeter.h, but now they got it implicitly included via sys/pcpu.h.	2017-04-17 17:07:00 +00:00
kib	6eae6fe2b3	Map DMAP as nx. Demotions preserve PG_NX, so it is enough to set nx bit for initial lowest-level paging entries. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-04-13 15:49:55 +00:00
pkelsey	33064e92a2	Corrected misspelled versions of rendezvous. The MFC will include a compat definition of smp_no_rendevous_barrier() that calls smp_no_rendezvous_barrier(). Reviewed by: gnn, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10313	2017-04-09 02:00:03 +00:00
avatar	93f11e526d	Trying to be more compatible with Linux if.h definitions: - renaming l_ifreq::ifru_metric to l_ifreq::ifru_ivalue; - adding a definition for ifr_ifindex which points to l_ifreq::ifru_ivalue. A quick search indicates that Linux already got the above changes since 2.1.14. Reviewed by: kib, marcel, dchagin MFC after: 1 week	2017-04-08 14:41:39 +00:00
avg	7a52acd8b3	revert r315959 because it causes build problems The change introduced a dependency between genassym.c and header files generated from .m files, but that dependency is not specified in the make files. Also, the change could be not as useful as I thought it was. Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others	2017-03-27 12:34:29 +00:00
bde	254458ab34	Fix printing of negative offsets (typically from frame pointers) again. I fixed this in 1997, but the fix was over-engineered and fragile and was broken in 2003 if not before. i386 parameters were copied to 8 other arches verbatim, mostly after they stopped working on i386, and mostly without the large comment saying how the values were chosen on i386. powerpc has a non-verbatim copy which just changes the uncritical parameter and seems to add a sign extension bug to it. Just treat negative offsets as offsets if they are no more negative than -db_offset_max (default -64K), and remove all the broken parameters. -64K is not very negative, but it is enough for frame and stack pointer offsets since kernel stacks are small. The over-engineering was mainly to go more negative than -64K for the negative offset format, without affecting printing for more than a single address. Addresses in the top 64K of a (full 32-bit or 64-bit) address space are now printed less well, but there aren't many interesting ones. For arches that have many interesting ones very near the top (e.g., 68k has interrupt vectors there), there would be no good limit for the negative offset format and -64K is a good as anything.	2017-03-26 18:46:35 +00:00
avg	04ec8ce247	specific end of interrupt implementation for AMD Local APIC The change is more intrusive than I would like because the feature requires that a vector number is written to a special register. Thus, now the vector number has to be provided to lapic_eoi(). It was readily available in the IO-APIC and MSI cases, but the IPI handlers required more work. Also, we now store the VMM IPI number in a global variable, so that it is available to the justreturn handler for the same reason. Reviewed by: kib MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9880	2017-03-25 18:45:09 +00:00
dchagin	7e4bbabbee	Implement Linux mincore() system call. This is necessary for the upcoming drm-next. Suggested by: hselasky@ MFC after: 1 month	2017-03-25 15:47:29 +00:00
bde	27ac811b7a	Remove buggy adjustment of page tables in db_write_bytes(). Long ago, perhaps only on i386, kernel text was mapped read-only and it was necessary to change the mapping to read-write to set breakpoints in kernel text. Other writes by ddb to kernel text were also allowed. This write protection is harder to implement with 4MB pages, and was lost even for 4K pages when 4MB pages were implemented. So changing the mapping became useless. It was actually worse than useless since it followed followed various null and otherwise garbage pointers to not change random memory instead of the mapping. (On i386s, the pointers became good in pmap_bootstrap(), and on amd64 the pointers became bad in pmap_bootstrap() if not before.) Another bug broke detection of following of null pointers on i386, except early in boot where not detecting this was a feature. When I fixed the bug, I accidentally broke the feature and soon got traps in db_write_bytes(). Setting breakpoints early in ddb was broken. kib pointed out that a clean way to do the adjustment would be to use a special [sub]map giving a small window on the bytes to be written. The trap handler didn't know how to fix up errors for pagefaults accessing the map itself. Such errors rarely need fixups, since most traps for the map are for the first access which is a read. Reviewed by: kib	2017-03-24 17:34:55 +00:00
ed	63254ceea6	Stop providing the compat_3_brand. As of r315860, the ELF image activator works fine for CloudABI without it. Reviewed by: kib MFC after: 2 weeks	2017-03-23 14:12:21 +00:00
kib	da63ef1f60	Update r315753 with the proper flag name. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:28:13 +00:00
kib	a22b5a3135	Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only matches static binaries. Interpretation of the 'static' there is that the binary must not specify an interpreter. In particular, shared objects are matched by the brand if BI_CAN_EXEC_DYN is also set. This improves precision of the brand matching, which should eliminate surprises due to brand ordering. Revert r315701. Discussed with and tested by: ed (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:23:01 +00:00
markj	c4e0ff355e	Add support for 8- and 16-bit atomic_(f)cmpset to x86. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10068	2017-03-22 17:29:04 +00:00
ed	fc95dfd2e5	Set the interpreter path to /nonexistent. CloudABI executables are statically linked and don't have an interpreter. Setting the interpreter path to NULL used to work previously, but r314851 introduced code that checks the string unconditionally. Running CloudABI executables now causes a null pointer dereference. Looking at the rest of imgact_elf.c, it seems various other codepaths already leaned on the fact that the interpreter path is set. Let's just go ahead and pick an obviously incorrect interpreter path to appease imgact_elf.c. MFC after: 1 week	2017-03-22 07:05:27 +00:00
dchagin	69ea87350f	Implement getrandom() syscall. Note. GRND_RANDOM option is not supported for now. MFC after: 1 month	2017-03-18 18:34:29 +00:00
dchagin	82392d7947	To reduce code duplication move socket defines to the MI path. MFC after: 1 week	2017-03-18 18:23:30 +00:00
bde	02d7df1d66	Don't access the reserved registers %dr4 and %dr5 on i386. On the original i386, %dr[4-5] were unimplemented but not very clearly reserved, so debuggers read them to print them. i386 was still doing this. On the original athlon64, %dr[4-5] are documented as reserved but are aliased to %dr[6-7] unless CR4_DE is set, when accessing them traps. On 2 of my systems, accessing %dr[4-5] trapped sometimes. On my Haswell system, the apparent randomness was because the boot CPU starts with CR4_DE set while all other CPUs start with CR4_DE clear. FreeBSD doesn't support the data breakpoints enabled by CR4_DE and it never changes this flag, so the flag remains different across CPUs and the behaviour seemed inconsistent except while booting when the CPU doesn't change. The invalid accesses broke: - read access for printing the registers in ddb "show watches" on CPUs with CR4_DE set - read accesses in fill_dbregs() on CPUs with CR4_DE set. This didn't implement panic(3) since the user case always skipped %dr[4-5]. - write accesses in set_dbregs(). This also didn't affect userland. When it didn't trap, the aliasing made it fragile. Don't print the dummy (zero) values of %dr[4-5] in "show watches" for i386 or amd64. Fix style bugs near this printing. amd64 also has space in the dbregs struct for the reserved %dr[8-15] and already didn't print the dummy values for these, and never accessed any of the 10 reserved debug registers. Remove cpufuncs for making the invalid accesses. Even amd64 had these.	2017-03-17 13:49:05 +00:00
grehan	071fd9390c	Hide the AMD MONITORX/MWAITX capability. Otherwise, recent Linux guests will use these instructions, resulting in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits. This fixes boot-time hangs in recent Linux guests on Ryzen CPUs (and probably Bulldozer aka AMD FX as well). Reviewed by: kib MFC after: 1 week	2017-03-16 03:21:42 +00:00
manu	044a23ec3c	Remove i915drm and radeondrm from NOTES and conf. This unbreak LINT kernel. Reported by: lwhsu	2017-03-12 00:52:16 +00:00
dchagin	d7b4f21065	Reduce code duplication between MD Linux code by moving SYSV IPC 64-bit related struct definitions out into the MI path. Invert the native ipc structs to the Linux ipc structs convesion logic. Since 64-bit variant of ipc structs has more precision convert native ipc structs to the 64-bit Linux ipc structs and then truncate 64-bit values into the non 64-bit if needed. Unlike Linux, return EOVERFLOW if the values do not fit. Fix SYSV IPC for 64-bit Linuxulator which never sets IPC_64 bit. MFC after: 1 month	2017-03-07 17:07:16 +00:00
mmokhi	d2f0c04c1f	Regenerated Linuxulator syscall tables for r314782 Approved by: dchagin MFC after: 1 month	2017-03-06 18:20:37 +00:00
mmokhi	4c583e7b8e	Add UNIMPLEMENTED() placeholder macro for the syscalls that are not implemented in Linux kernel itself. Cleanup DUMMY() macros. Reviewed by: dchagin, trasz Approved by: dchagin MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D9804	2017-03-06 18:11:38 +00:00
imp	7e6cabd06e	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
kib	06450fe6c4	Initialize pcb_save for thread0. Otherwise kernel traps on NULL dereference if fpu_kern(9) is used from the thread0 context. Reported by: cem Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-28 22:54:52 +00:00
glebius	745bcd6fba	Remove SVR4 (System V Release 4) binary compatibility support. UNIX System V Release 4 is operating system released in 1988. It ceased to exist in early 2000-s.	2017-02-28 05:14:42 +00:00
dchagin	948e5bcce0	Regen for r314312 (Linux epoll_pwait). MFC after: 1 month	2017-02-26 19:59:28 +00:00
dchagin	8a318fc47b	Change Linux epoll_pwait syscall definition to match Linux actual one. MFC after: 1 month	2017-02-26 19:57:18 +00:00
alc	c062fef7f7	Refine the fix from r312954. Specifically, add a new PDE-only flag, PG_PROMOTED, that indicates whether lingering 4KB page mappings might need to be flushed on a PDE change that restricts or destroys a 2MB page mapping. This flag allows the pmap to avoid range invalidations that are both unnecessary and costly. Reviewed by: kib, markj MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9665	2017-02-26 19:54:02 +00:00
dchagin	396e17522f	Implement timerfd family syscalls. MFC after: 1 month	2017-02-26 09:48:18 +00:00
dchagin	d9e839a5db	Regen after r314291 (timerfd definition). MFC after: 1 month	2017-02-26 09:37:25 +00:00
dchagin	d7d0ab04b1	Change Linuxulator timerfd syscalls definition to match actual Linux one. MFC after: 1 month	2017-02-26 09:35:44 +00:00
trasz	aaee258648	Fix linux_fstatfs() to return proper value for f_frsize. Without it, linux df(1) binary from Xenial shows garbage. Reviewed by: dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9692	2017-02-25 20:32:37 +00:00
mmokhi	bf114a4c4a	Add linux_preadv() and linux_pwritev() syscalls to Linuxulator. Reviewed by: dchagin Approved by: dchagin, trasz (src committers) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9722	2017-02-24 20:04:02 +00:00
dchagin	8f19e32110	Revert r314217. Commit is not match that I have approved.	2017-02-24 19:47:27 +00:00
mmokhi	0de5ce3724	Add linux_preadv() and linux_pwritev() syscalls to Linuxulator. Reviewed by: dchagin Approved by: dchagin, trasz (src committers) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9722	2017-02-24 19:22:17 +00:00
pfg	077418d939	sys: Replace zero with NULL for pointers. Found with: devel/coccinelle MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9694	2017-02-22 02:35:59 +00:00
markj	4f689e04e8	ddb show pte: use pmap of kdb_thread show pte from the pmap of the process of the current DDB thread, instead of necessarily the PCPU pmap. Submitted by: Ryan Libby <rlibby@gmail.com> Reviewed by: kib MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D9645	2017-02-21 21:06:12 +00:00
trasz	1bc1c21a5b	Reimplement linux_arch_prctl() as a wrapper around sysarch(2). This also adds support for LINUX_ARCH_SET_GS. Reviewed by: dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9372	2017-02-20 16:13:40 +00:00
alc	881c4340ef	In pmap_enter(), set the PG_MANAGED flag on the new PTE in one place, rather two places, and do so before the pmap lock is acquired. Submitted by: Yufeng Zhou <yz70@rice.edu> Reviewed by: kib MFC after: 1 week	2017-02-19 18:00:57 +00:00
dchagin	ca1a1a9d60	Implement rt_tgsigqueueinfo system call used by glibc for pthread_sigqueue(3). MFC after: 2 week	2017-02-19 07:38:11 +00:00
kib	b22278d483	Microoptimize amd64/pmap.c pmap_protect_pde(). For the loop that dirties vm_pages in case superpage was written to, check the complete condition before the loop. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-19 03:33:20 +00:00
jah	85257d4d56	Bring back r313037, with fixes for mips: Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to replace pcpu_find(curcpu) in MI code. Reviewed by: andreast, kan, lidl Tested by: lidl(mips, sparc64), andreast(powerpc) Differential Revision: https://reviews.freebsd.org/D9587	2017-02-19 02:03:09 +00:00
kib	5c280e36bd	Merge i386 and amd64 mtrr drivers. Reviewed by: royger, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9648	2017-02-17 21:08:32 +00:00
royger	bf0e949003	x86: fix MTRR initialization if EARLY_AP_STARTUP is used MTRR handlers are set in {amd64/i686}_mem_drvinit, which is called at SI_SUB_DRIVERS, and that's too late when EARLY_AP_STARTUP is set because APs have already started at this point. {amd64/i686}_mrinit is also called too late for the BSP, since that happens when the memory device is attached, also after APs have already started. Move the position to SI_SUB_CPU, and also initialize the state for the BSP, so that the APs can correctly get to the same state as the BSP. Sponsored by: Citrix Systems R&D MFC after: 1 week Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D9630	2017-02-17 12:47:51 +00:00
trasz	db2e4d79bb	Implement linux version of ptrace(2). It's nowhere near complete, but it allows to use 64 bit linux strace(1) on 64 bit linux binaries. Reviewed by: dchagin (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9406	2017-02-16 13:32:15 +00:00
trasz	93717f4b2a	Regen after r313769. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-15 14:25:50 +00:00
trasz	f11d19c42a	Fix definition of linux64 ptrace syscall. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-02-15 14:12:39 +00:00
jhb	e80fc50712	Regenerate all the system call tables to drop "created from" lines. One of the ibcs2 files contains some actual changes (new headers) as it hasn't been regenerated after older changes to makesyscalls.sh.	2017-02-10 19:45:02 +00:00
erj	1edbd9d1a6	ixl(4): Update to 1.7.12-k Refresh upstream driver before impending conversion to iflib. Major new features: - Support for Fortville-based 25G adapters - Support for I2C reads/writes (To prevent getting or sending corrupt data, you should set dev.ixl.0.debug.disable_fw_link_management=1 when using I2C [this will disable link!], then set it to 0 when done. The driver implements the SIOCGI2C ioctl, so ifconfig -v works for reading I2C data, but there are read_i2c and write_i2c sysctls under the .debug sysctl tree [the latter being useful for upper page support in QSFP+]). - Addition of an iWARP client interface (so the future iWARP driver for X722 devices can communicate with the base driver). - Compiling this option in is enabled by default, with "options IXL_IW" in GENERIC. Differential Revision: https://reviews.freebsd.org/D9227 Reviewed by: sbruno MFC after: 2 weeks Sponsored by: Intel Corporation	2017-02-10 01:04:11 +00:00
dchagin	ef67426db9	Regen after r313284. MFC after: 2 week	2017-02-05 14:19:19 +00:00
dchagin	1ba976c2fa	Update syscall.master to 4.10-rc6. Also fix comments, a typo, and wrong numbering for a few unimplemented syscalls. For 32-bit Linuxulator, socketcall() syscall was historically the entry point for the sockets API. Starting in Linux 4.3, direct syscalls are provided for the sockets API. Enable it. The initial version of patch was provided by trasz@ and extended by me. Submitted by: trasz MFC after: 2 week Differential Revision: https://reviews.freebsd.org/D9381	2017-02-05 14:17:09 +00:00
jah	f5659d40d3	Revert r313037 The switch to get_pcpu() in MI code seems to cause hangs on MIPS. Back out until we can get a better idea of what's happening there. Reported by: kan, lidl	2017-02-04 06:24:49 +00:00
jah	fc31303beb	Implement get_pcpu() for the remaining architectures and use it to replace pcpu_find(curcpu) in MI code.	2017-02-01 03:32:49 +00:00
trasz	69861f6d1f	Replace sys_ftruncate() with kern_ftruncate() in various compats. Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9368	2017-01-30 11:50:54 +00:00
kib	82ecdbf179	Do not leave stale 4K TLB entries on pde (superpage) removal or protection change. On superpage promotion, x86 pmaps do not invalidate existing 4K entries for the superpage range, because they are compatible with the promoted 2/4M entry. But the invalidation on superpage removal or protection change only did single INVLPG with the base address of the superpage. This reliably flushed superpage TLB entry, and 4K entry for the first page of the superpage, potentially leaving other 4K TLB entries lingering. Do the invalidation of the whole superpage range to correct the problem. Note that the precise invalidation is done by x86 code for kernel_pmap only, for user pmaps whole (per-AS) TLB is flushed. This made the bug well hidden, because promotions of the kernel mappings require specific load. Reported and tested by: Jonathan Looney <jtl@netflix.com> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-29 19:14:48 +00:00
bapt	bd0b52fc1f	Revert crap accidentally committed	2017-01-28 16:31:23 +00:00
bapt	02ac05d572	Revert r312923 a better approach will be taken later	2017-01-28 16:30:14 +00:00
tijl	dab9980fd3	Apply r210555 to 64 bit linux support: The interpreter name should no longer be treated as a buffer that can be overwritten. PR: 216346 MFC after: 3 days	2017-01-24 16:13:59 +00:00
kib	d52cc80daf	Use SFENCE for ordering CLFLUSHOPT. SDM states that CLFLUSHOPT instructions can be ordered with other writes by SFENCE, heavier MFENCE is not required. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-01-20 19:08:44 +00:00
avg	fa73e5b5c7	vmm_dev: work around a bogus error with gcc 6.3.0 The error is: vmm_dev.c: In function 'alloc_memseg': vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull] Apparently, the gcc is unable to figure out that if a ternary operator produced a non-NULL value once, then the operator with exactly the same operands would produce the same value again. MFC after: 1 week	2017-01-20 13:21:27 +00:00
ed	be72efbdd4	Catch up with changes to structure member names. Pointer/length pairs are now always named ${name} and ${name}_len.	2017-01-17 22:05:52 +00:00
cem	6d240aee91	Fix a variety of cosmetic typos and misspellings No functional change. PR: 216096, 216097, 216098, 216101, 216102, 216106, 216109, 216110 Reported by: Bulat <bltsrc at mail.ru> Sponsored by: Dell EMC Isilon	2017-01-15 18:00:45 +00:00
markj	d8a11d2a36	Coalesce TLB shootdowns of global PTEs in pmap_advise() on x86. We would previously invalidate such entries individually, resulting in more IPIs than necessary. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9094	2017-01-10 21:52:48 +00:00
sbruno	efab05d612	Migrate e1000 to the IFLIB framework: - em(4) igb(4) and lem(4) - deprecate the igb device from kernel configurations - create a symbolic link in /boot/kernel from if_em.ko to if_igb.ko Devices tested: - 82574L - I218-LM - 82546GB - 82579LM - I350 - I217 Please report problems to freebsd-net@freebsd.org Partial review from jhb and suggestions on how to not brick folks who originally would have lost their igbX device. Submitted by: mmacy@nextbsd.org MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks and Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8299	2017-01-10 03:23:22 +00:00
mjg	50a477c2ee	amd64: add atomic_fcmpset Reviewed by: kib, jhb	2017-01-03 21:00:24 +00:00
kib	149e8d3d06	Fix typo. Remove spurious blank line. MFC after: 3 days	2016-12-18 09:32:23 +00:00
jhb	203d8b88f7	Enable EARLY_AP_STARTUP on amd64 and i386 kernels by default. PR: 199321, 203682 MFC after: 2 months Sponsored by: Netflix	2016-12-16 21:10:37 +00:00
kib	64b6cb0328	Provide non-final but valid PCB pointer for thread0 for duration of hammer_time(). This makes assembler exception handlers not fault itself when setting PCB flags, and allow normal kernel trap handler to get control. The pointer is reset after FPU parameters are obtained. Set thread0.td_critnest to 1 for duration of hammer_time() as well. In particular, page faults at that early stage panic immediately instead of trying to call not yet operational VM to resolve it. As result, faults during second half of the hammer_time() execution have a chance to be reported instead of silent machine reboot or hang. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-12-14 11:40:31 +00:00
def	f63c437216	Add support for encrypted kernel crash dumps. Changes include modifications in kernel crash dump routines, dumpon(8) and savecore(8). A new tool called decryptcore(8) was added. A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump configuration in the diocskerneldump_arg structure to the kernel. The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for backward ABI compatibility. dumpon(8) generates an one-time random symmetric key and encrypts it using an RSA public key in capability mode. Currently only AES-256-CBC is supported but EKCD was designed to implement support for other algorithms in the future. The public key is chosen using the -k flag. The dumpon rc(8) script can do this automatically during startup using the dumppubkey rc.conf(5) variable. Once the keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O control. When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random IV and sets up the key schedule for the specified algorithm. Each time the kernel tries to write a crash dump to the dump device, the IV is replaced by a SHA-256 hash of the previous value. This is intended to make a possible differential cryptanalysis harder since it is possible to write multiple crash dumps without reboot by repeating the following commands: # sysctl debug.kdb.enter=1 db> call doadump(0) db> continue # savecore A kernel dump key consists of an algorithm identifier, an IV and an encrypted symmetric key. The kernel dump key size is included in a kernel dump header. The size is an unsigned 32-bit integer and it is aligned to a block size. The header structure has 512 bytes to match the block size so it was required to make a panic string 4 bytes shorter to add a new field to the header structure. If the kernel dump key size in the header is nonzero it is assumed that the kernel dump key is placed after the first header on the dump device and the core dump is encrypted. Separate functions were implemented to write the kernel dump header and the kernel dump key as they need to be unencrypted. The dump_write function encrypts data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps are not supported due to the way they are constructed which makes it impossible to use the CBC mode for encryption. It should be also noted that textdumps don't contain sensitive data by design as a user decides what information should be dumped. savecore(8) writes the kernel dump key to a key.# file if its size in the header is nonzero. # is the number of the current core dump. decryptcore(8) decrypts the core dump using a private RSA key and the kernel dump key. This is performed by a child process in capability mode. If the decryption was not successful the parent process removes a partially decrypted core dump. Description on how to encrypt crash dumps was added to the decryptcore(8), dumpon(8), rc.conf(5) and savecore(8) manual pages. EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU. The feature still has to be tested on arm and arm64 as it wasn't possible to run FreeBSD due to the problems with QEMU emulation and lack of hardware. Designed by: def, pjd Reviewed by: cem, oshogbo, pjd Partial review: delphij, emaste, jhb, kib Approved by: pjd (mentor) Differential Revision: https://reviews.freebsd.org/D4712	2016-12-10 16:20:39 +00:00
imp	3459557545	Permit loading of efirt module even when there's no EFI to call. The module loading is successful, but attempts to use it will not be successful. This is similar to what we do (did?) with ACPI on non-ACPI systems. We succeed if we can't find the necessary information to hook into EFI, but still fail if we're unable to allocate resources if we do find EFI. Not Objected to by: kib@ MFC Afer: 3 days	2016-12-09 23:37:11 +00:00
markj	8bb19c4929	Add a COMPAT_FREEBSD11 kernel option. Use it wherever COMPAT_FREEBSD10 is currently specified. Reviewed by: glebius, imp, jhb Differential Revision: https://reviews.freebsd.org/D8736	2016-12-09 18:54:12 +00:00
glebius	f8eae77f98	Treat R_X86_64_PLT32 relocs as R_X86_64_PC32. If we load a binary that is designed to be a library, it produces relocatable code via assembler directives in the assembly itself (rather than compiler options). This emits R_X86_64_PLT32 relocations, which are not handled by the kernel linker. Submitted by: gallatin Reviewed by: kib	2016-12-09 18:07:28 +00:00
alc	7571ef95c1	Previously, vm_radix_remove() would panic if the radix trie didn't contain a vm_page_t at the specified index. However, with this change, vm_radix_remove() no longer panics. Instead, it returns NULL if there is no vm_page_t at the specified index. Otherwise, it returns the vm_page_t. The motivation for this change is that it simplifies the use of radix tries in the amd64, arm64, and i386 pmap implementations. Instead of performing a lookup before every remove, the pmap can simply perform the remove. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D8708	2016-12-08 04:29:29 +00:00
jhb	f987575fac	Report page faults due to reserved bits in PTEs as a separate fault type. Rather than reporting a page fault due to a bad PTE as a protection violation with the "rsv" flag, treat these faults as a separate type of fault altogether. MFC after: 1 month	2016-11-19 01:34:12 +00:00
bdrewery	30f99dbeef	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
cem	7ae132fee1	Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code. This can be handy in tracking down what code touched hung bios and bufs last. The full history is especially useful, but adds enough bloat that it shouldn't be enabled in release builds. Function names (or arbitrary string constants) are tracked in a fixed-size ring in bufs. Bios gain a pointer to the upper buf for tracking. SCSI CCBs gain a pointer to the upper bio for tracking. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8366	2016-10-31 23:09:52 +00:00
jhb	1018a82d1b	Move declarations of invpcid_works and pmap_pcid_enabled to pmap.h. Previously these were only declared under #ifdef SMP in <machine/smp.h>. However, these variables are defind in pmap.c unconditionally, and efirt.c references them unconditionally. This fixes non-SMP kernel builds. Discussed with: kib MFC after: 1 week	2016-10-31 18:37:05 +00:00
avg	9127df4438	fix a syntax error in r308039 ... that I somehow introduced between testing the change iand committing it. MFC after: 1 week X-MFC with: r307903	2016-10-28 15:57:55 +00:00
avg	6310254212	vmm: another take at maximmum address passed to contigmalloc Just using vm_paddr_t value with all bits set. That should work as long as the type is unsigned. While there, fix a couple of whitespace issues nearby. MFC after: 1 week X-MFC with: r307903	2016-10-28 14:38:01 +00:00
jhb	95a3814f21	MFamd64: Add bounds checks on addresses used with /dev/mem. Reject attempts to read from or memory map offsets in /dev/mem that are beyond the maximum-supported physical address of the current CPU. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7408	2016-10-27 21:23:14 +00:00
glebius	856adf7415	The argument validation in r296956 was not enough to close all possible overflows in sysarch(2). Submitted by: Kun Yang <kun.yang chaitin.com> Patch by: kib Security: SA-16:15	2016-10-25 17:13:46 +00:00
avg	a2af253a41	fix up r307903, use correct max address definition MFC after: 1 week X-MFC with: r307903	2016-10-25 10:59:21 +00:00
avg	7d3b940604	vmm/svm: iopm_bitmap and msr_bitmap must be contiguous in physical memory To achieve that the whole svm_softc is allocated with contigmalloc now. It would be more effient to de-embed those arrays and allocate only them with contigmalloc. Previously, if malloc(9) used non-contiguous pages for the arrays, then random bits in physical pages next to the first page would be used to determine permissions for I/O port and MSR accesses. That could result in a guest dangerously modifying the host hardware configuration. One example is that sometimes NMI watchdog driver in a Linux guest would be able to configure a performance counter on a host system. The counter would generate an interrupt and if hwpmc(4) driver is loaded on the host, then the interrupt would be delivered as an NMI. Discussed with: jhb Reviewed by: grehan MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8321	2016-10-25 10:34:14 +00:00
kib	45100446da	Follow-up to r307866: - Make !KDB config buildable. - Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi in one place, namely nmi_call_kdb(). This allows to remove do_panic argument from the functions, and to remove i386/amd64 duplication of the variable and sysctl definitions. Note that now NMI causes panic(9) instead of trap_fatal() reporting and then panic(9), consistently for NMIs delivered while CPU operated in ring 0 and 3. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-24 20:47:46 +00:00
kib	a04db702cd	Handle broadcast NMIs. On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs reporting hardware errors are broadcasted to all CPUs. When kernel is configured to enter kdb on NMI, the outcome is problematic, because each CPU tries to enter kdb. All CPUs are executing NMI handlers, which set the latches disabling the nested NMI delivery; this means that stop_cpus_hard(), used by kdb_enter() to stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One indication of this is the harmless but annoying diagnostic "timeout stopping cpus". Much more harming behaviour is that because all CPUs try to enter kdb, and if ddb is used as debugger, all CPUs issue prompt on console and race for the input, not to mention the simultaneous use of the ddb shared state. Try to fix this by introducing a pseudo-lock for simultaneous attempts to handle NMIs. If one core happens to enter NMI trap handler, other cores see it and simulate reception of the IPI_STOP_HARD. More, generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting for the acknowledgement, relying on the nmi handler on other cores suspending and then restarting the CPU. Since it is impossible to detect at runtime whether some stray NMI is broadcast or unicast, add a knob for administrator (really developer) to configure debugging NMI handling mode. The updated patch was debugged with the help from Andrey Gapon (avg) and discussed with him. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8249	2016-10-24 16:40:27 +00:00
jkim	229f578eb8	Implement BPF_MOD and BPF_XOR instructions. These two ALU instructions first appeared on Linux. Then, libpcap adopted and made them available since 1.6.2. Now more platforms including NetBSD have them in kernel. So do we. --이 줄 이하는 자동으로 제거됩니다-- > Description of fields to fill in above: 76 columns --\| > PR: If and which Problem Report is related. > Submitted by: If someone else sent in the change. > Reported by: If someone else reported the issue. > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]\|week[s]\|month[s]]. Request a reminder email. > MFH: Ports tree branch name. Request approval for merge. > Relnotes: Set to 'yes' for mention in release notes. > Security: Vulnerability reference (one per line) or description. > Sponsored by: If the change was sponsored by an organization. > Differential Revision: https://reviews.freebsd.org/D### (full phabric URL needed). > Empty fields above will be automatically removed. M share/man/man4/bpf.4 M sys/amd64/amd64/bpf_jit_machdep.c M sys/amd64/amd64/bpf_jit_machdep.h M sys/i386/i386/bpf_jit_machdep.c M sys/i386/i386/bpf_jit_machdep.h M sys/net/bpf_filter.c	2016-10-21 06:55:07 +00:00
jkim	42d876c52e	Redude code for conditional jumps.	2016-10-21 06:09:30 +00:00
jkim	b35ec131f8	Fix compiler warnings for user land.	2016-10-21 06:06:54 +00:00
stevek	8cc74c4a97	Add sysctl to make amd64 minidump retry count tunable at runtime. PR: 213462 Submitted by: RaviPrakash Darbha <rdarbha@juniper.net> Reviewed by: cemi, markj Approved by: sjg (mentor) Obtained from: Juniper Networks Differential Revision: https://reviews.freebsd.org/D8254	2016-10-17 22:57:41 +00:00
kib	eb0078e23d	Do not try to create /dev/efi device node before devfs is initialized. Split efirt.ko initialization into early stage where runtime services KPI environment is created, to be used e.g. for RTC, and the later devfs node creation stage, per module. Switch the efi device to use make_dev_s(9) instead of make_dev(9). At least, this gracefully handles the duplicated device name issue. Remove ARGSUSED comment from efidev_ioctl(), all unused arguments are annotated with __unused attribute. Reported by: ambrisko, O. Hartmann <ohartman@zedat.fu-berlin.de> Reviewed by: imp Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-16 06:07:43 +00:00
jhb	f689fd5a63	Drop support for using mmap() with /dev/kmem. Using the device pager with /dev/kmem is not stable since KVA mappings are transient, but the device pager caches the PA associated with a given offset forever. Interestingly, mips' implementation of memmap() already refused requests for /dev/kmem. Note that kvm_read/kvm_write do not use mmap, but use read and write on /dev/kmem, so this should not affect libkvm users. Reviewed by: kib MFC after: 2 months	2016-10-14 20:01:07 +00:00
jtl	62030781cd	In the TCP stack, the hhook(9) framework provides hooks for kernel modules to add actions that run when a TCP frame is sent or received on a TCP session in the ESTABLISHED state. In the base tree, this functionality is only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd, and cc_vegas congestion control modules. Presently, we incur overhead to check for hooks each time a TCP frame is sent or received on an ESTABLISHED TCP session. This change adds a new compile-time option (TCP_HHOOK) to determine whether to include the hhook(9) framework for TCP. To retain backwards compatibility, I added the TCP_HHOOK option to every configuration file that already defined "options INET". (Therefore, this patch introduces no functional change. In order to see a functional difference, you need to compile a custom kernel without the TCP_HHOOK option.) This change will allow users to easily exclude this functionality from their kernel, should they wish to do so. Note that any users who use a custom kernel configuration and use one of the congestion control modules listed above will need to add the TCP_HHOOK option to their kernel configuration. Reviewed by: rrs, lstewart, hiren (previous version), sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8185	2016-10-12 02:16:42 +00:00
imp	41d4ddad0f	Create /dev/efidev to provide an ioctl interface to userland. It supports userland interfaces to UEFI Runtime Services. This is indended to the the MI portion of EFI RuntimeServices support. Differential Revision: https://reviews.freebsd.org/D8128 Reviewed by: kib@, wblock@, Ganael Laplanche	2016-10-11 22:24:30 +00:00
kib	559623d89a	Re-apply r306516 (by cem): Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041	2016-10-04 17:01:24 +00:00
cem	de42bf751c	Revert r306516 for now, it is incomplete on i386 Noted by: kib	2016-09-30 18:58:50 +00:00
cem	22e3a710d0	Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version), kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041	2016-09-30 18:12:16 +00:00
hselasky	5e41da7ccd	Move the ConnectX-3 and ConnectX-2 driver from sys/ofed into sys/dev/mlx4 like other PCI network drivers. The sys/ofed directory is now mainly reserved for generic infiniband code, with exception of the mthca driver. - Add new manual page, mlx4en(4), describing how to configure and load mlx4en. - All relevant driver C-files are now prefixed mlx4, mlx4_en and mlx4_ib respectivly to avoid object filename collisions when compiling the kernel. This also fixes an issue with proper dependency file generation for the C-files in question. - Device mlxen is now device mlx4en and depends on device mlx4, see mlx4en(4). Only the network device name remains unchanged. - The mlx4 and mlx4en modules are now built by default on i386 and amd64 targets. Only building the mlx4ib module depends on WITH_OFED=YES . Sponsored by: Mellanox Technologies	2016-09-30 08:23:06 +00:00
kib	7dd86df41b	Handle TLB shootdown IPI during the EFI runtime calls, on SandyBridge and IvyBridge machines, which support PCID but do not have INVPCID instruction. MFC after: 1 week	2016-09-26 17:25:25 +00:00
kib	3deaeb8d22	For machines which support PCID but not have INVPCID instruction, i.e. SandyBridge and IvyBridge, correct a race between pmap_activate() and invltlb_pcid_handler(). Reported by and tested by: Slawa Olhovchenkov <slw@zxy.spb.ru> MFC after: 1 week	2016-09-26 17:22:44 +00:00
bde	94a237c17c	Fix vm86 initialization, part 3 of 2 and a half. (Actually, just fix early printfs and debugging of vm86 initialization and some other early initialization in some cases.) Add an option debug.late_console (with default 1=off) to move console and kdb initialization back where it was. Do the same for amd64 although there is no vm86 there. On my test system, debug.late_console=0 works for the syscons, sio and uart console drivers on amd64 and i386, and for vt on i386 but not on amd64. The early printfs fixed by debug.late_console=0 are: - on i386, the message about lost memory above 4G - with -v in otherwise normal use, about 20 printfs for SMAP - other debugging messages for memory sizing. Mostly under -v and not printed in normal use. Document in a comment how much earlier the initialization and early printf()s can be. That is very early for the console. Not much more than curthread is needed. kdb use obviously needs to be not so early, since it needs IDT initialization and that is done relatively late for convenience and historical reasons.	2016-09-25 14:56:24 +00:00
imp	5720aec884	Change the efi_get_table interface to a void ** so we can return the pointer by dereferencing the pointer. Reviewed by: kib@ MFC After: 2 weeks Sponsored by: Netflix, Inc	2016-09-22 19:04:51 +00:00
markj	f2a1dd4de8	Regenerate syscall provider argument strings.	2016-09-22 04:50:03 +00:00
kib	df422cbea3	Add kernel interfaces to call EFI Runtime Services. Runtime services require special execution environment for the call. Besides that, OS must inform firmware about runtime virtual memory map which will be active during the calls, with the SetVirtualAddressMap() runtime call, done while the 1:1 mapping is still used. There are two complication: the SetVirtualAddressMap() effectively must be done from loader, which needs to know kernel address map in advance. More, despite not explicitely mentioned in the specification, both 1:1 and the map passed to SetVirtualAddressMap() must be active during the SetVirtualAddressMap() call. Second, there are buggy BIOSes which require both mappings active during runtime calls as well, most likely because they fail to identify all relocations to perform. On amd64, we can get rid of both problems by providing 1:1 mapping for the duration of runtime calls, by temprorary remapping user addresses. As result, we avoid the need for loader to know about future kernel address map, and avoid bugs in BIOSes. Typically BIOS only maps something in low 4G. If not runtime bugs, we would take advantage of the DMAP, as previous versions of this patch did. Similar but more complicated trick can be used even for i386 and 32bit runtime, if and when the EFI boot on i386 is supported. We would need a trampoline page, since potentially whole 4G of VA would be switched on calls, instead of only userspace portion on amd64. Context switches are disabled for the duration of the call, FPU access is granted, and interrupts are not disabled. The later is possible because kernel is mapped during calls. To test, the sysctl mib debug.efi_time is provided, setting it to 1 makes one call to EFI get_time() runtime service, on success the efitm structure is printed to the control terminal. Load efirt.ko, or add EFIRT option to the kernel config, to enable code. Discussed with: emaste, imp Tested by: emaste (mac, qemu) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-21 11:31:58 +00:00
kib	eaff6cf773	Rename efi_systbl to efi_systbl_phys, the variable contains the physical address of the EFI System Table. Add _KERNEL guard around its declaration in sys/efi.h. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-21 10:55:28 +00:00
kib	0b0178b3a6	Add a way for the architecture to specify the calling ABI for methods in the EFI Runtime Services Table. On amd64, the calling conventions are MS. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-21 10:35:44 +00:00
kib	ff88ae0aef	Add amd64 functions to load/store GDT register, store IDT and TR registers. Note that lgdt() name is already used for function which, besides loading GDT, also reloads segment descriptors cache, thus new function is named bare_lgdt(). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-21 10:10:36 +00:00
kib	5aff5acf9c	Export the pmap_cache_bits() and pmap_pinit_pml4() functions from the amd64 pmap. The new pmap_pinit_pml4() function initializes the level 4 page table with entries for the kernel mappings. Both functions are needed for upcoming EFI Runtime Services support. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-21 10:05:51 +00:00
kib	e9e29e2686	MFC r305939: Remove trailing space.	2016-09-21 08:14:55 +00:00
kib	3c5296b91a	Move pmap_p*e_index() inline functions from pmap.c to pmap.h. They are already used in minidump code. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-20 09:38:07 +00:00
emaste	a8b2a6847f	Catch up to sys/capability.h rename to sys/capsicum.h in r263232 MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-09-19 18:44:43 +00:00
kib	90ee5f51f2	Consolidate four efi_next_descriptor() definitions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-09-18 17:38:02 +00:00
kib	3fd792803a	Remove trailing space. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-09-18 17:33:49 +00:00
bde	b8aaa2c367	Fix decoding of tf_rsp on amd64, and move TF_HAS_STACKREGS() to the i386-only section, and fix a comment about the amd64 kernel trapframe not having stackregs. tf_rsp doesn't need decoding on amd64, but had an old clone of i386 code to do this in 1 place, and since the amd64 kernel trapframe does have stackregs, the result was an off-by-16 error for %rsp in an error message.	2016-09-16 07:09:35 +00:00
bde	d6a5db2944	(1) Ifdef the new dr6 variable for KDB. While here, avoid using the old variable 'code' and remove it in trap(). ('code' was meant for holding things like %dr6, but is too small to hold %dr6 on amd64 and was reduced to an obfuscation of tf_err, with early truncation on amd64.) Submitted by: Michael Butler (imb@...)	2016-09-16 04:58:37 +00:00
bde	634d4e4a33	Decode some REX prefixes in inst_call(). This makes the 'next' and 'until' commands work in more cases.	2016-09-15 18:30:53 +00:00
bde	bf8d177543	Abort single stepping in ddb if the trap is not for single-stepping. This is not very easy to do, since ddb didn't know when traps are for single-stepping. It more or less assumed that traps are either breakpoints or single-step, but even for x86 this became inadequate with the release of the i386 in ~1986, and FreeBSD passes it other trap types for NMIs and panics. On x86, teach ddb when a trap is for single stepping using the %dr6 register. Unknown traps are now treated almost the same as breakpoints instead of as the same as single-steps. Previously, the classification of breakpoints was almost correct and everything else was unknown so had to be treated as a single-step. Now the classification of single- steps is precise, the classification of breakpoints is almost correct (as before) and everything else is unknown and treated like a breakpoint. This fixes: - breakpoints not set by ddb, including the main one in kdb_enter(), were treated as single-steps and not stopped on when stepping (except for the usual, simple case of a step with residual count 1). As special cases, kdb_enter() didn't stop for fatal traps or panics - similarly for "hardware breakpoints". Use a new MD macro IS_SSTEP_TRAP(type, code) to code to classify single-steps. This is excessively complicated for bug-for-bug and backwards compatibilty. Design errors apparently started in Mach in ~1990 or perhaps in the FreeBSD interface in ~1993. Common trap types like single steps should have a unique MI code (like the TRAP* codes for user SIGTRAP) so that debuggers don't need macros like IS_SSTEP_TRAP() to decode them. But 'type' is actually an ambiguous MD trap number, and code was always 0 (now it is (int)%dr6 on x86). So it was impossible to determine the trap type from the args. Global variables had to be used. There is already a classification macro db_pc_is_single_step(), but this just gets in the way. It is only used to recover from bugs in IS_BREAKPOINT_TRAP(). On some arches, IS_BREAKPOINT_TRAP() just duplicates the ambiguity in 'type' and misclassifies single-steps as breakpoints. It defaults to 'false', which is the opposite of what is needed for bug-for-bug compatibility. When this is cleaned up, MI classification bits should be passed in 'code'. This could be done now for positive-logic bits, since 'code' was always 0, but some negative logic is needed for compatibility so a simple MI classificition is not usable yet. After reading %dr6, clear the single-step bit in it so that the type of the next debugger trap can be decoded. This is a little ddb-specific. ddb doesn't understand the need to clear this bit and doing it before calling kdb is easiest. gdb would need to reverse this to support hardware breakpoints, but it just doesn't support them now since gdbstub doesn't support %dr*. Fix a bug involving %dr6: when emulating a single-step trap for vm86, set the bit for it in %dr6. Userland debuggers need this. ddb now needs this for vm86 bios calls. The bit gets copied to 'code' then cleared again. Fix related style bugs: - when clearing bits for hardware breakpoints in %dr6, spell the mask as ~0xf on both amd64 and i386 to get the correct number of bits using sign extension and not need a comment about using the wrong mask on amd64 (amd64 traps for invalid results but clearing the reserved top bits didn't trap since they are 0). - rewrite my old wrong comments about using %dr6 for ddb watchpoints.	2016-09-15 17:24:23 +00:00
jhb	bc4a384597	Remove 'cpu' and 'cpu_class' on amd64. The 'cpu' and 'cpu_class' variables were always set to the same value on amd64 and are legacy holdovers from i386. Remove them entirely on amd64. Reviewed by: imp, kib (older version) Differential Revision: https://reviews.freebsd.org/D7888	2016-09-15 17:05:54 +00:00
bz	de4b915bfb	Try to fix LINT builds after r305807. Seems to be a simple s&r error I missed while reading through the 1st time as well.	2016-09-14 16:08:23 +00:00
bde	d58cd5baa4	Use the MI macro TRAPF_USERMODE() instead of open-coded checks for SEL_UPL and sometimes PSL_VM. This is just a style change on amd64, but on i386 it fixes 1 unimportant place where the PSL_VM check was missing and starts fixing 1 important place where the PSL_VM check had a logic error. Fix logic errors in treating vm86 bioscall mode as kernel mode. The main place checked all the necessary flags, but put the necessary parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong place. The broken case is only reached if a vm86 bioscall uses a %cs which is nonzero mod 4, but that is unusual -- most bios calls start with %cs = 0xc000 or 0xf000 and rarely change it. Another place was missing the check for PCB_VM86CALL, but was only reachable if there are bugs virtualizing PSL_I. Add a macro TF_HAS_STACKREGS() and use this instead of converting open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only care about whether the frame has stack registers. This fixes 3 places in my recent fix for register variables in vm86 mode where I messed up the PSL_VM check and cleans up other places.	2016-09-14 12:57:40 +00:00
kib	cb80ddd115	Add FPU_KERN_NOCTX flag to the fpu_kern_enter() function on amd64. The flag specifies that the block which uses FPU must be executed in critical section, i.e. take no context switches, and does not need an FPU save area during the execution. It is intended to be applied around fast and short code pathes where save area allocation is impossible or undesirable, due to context or due to the relative cost of calculation vs. allocation. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-09-11 09:14:07 +00:00
alc	44f29780e8	Various changes to pmap_ts_referenced() Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and into vm/pmap.h, and describe what its purpose is. Eliminate the archaic "XXX" comment about its value. I don't believe that its exact value, e.g., 5 versus 6, matters. Update the arm64 and riscv pmap implementations of pmap_ts_referenced() to opportunistically update the page's dirty field. On amd64, use the PDE value already cached in a local variable rather than dereferencing a pointer again and again. Reviewed by: kib, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7836	2016-09-10 16:49:25 +00:00
jhb	7380cda6f5	MFC 303713: Correct assertion on vcpuid argument to vm_gpa_hold(). PR: 208168	2016-09-09 20:30:36 +00:00
jhb	a40f879158	MFC 304637: Fix build for !SMP kernels after the Xen MSIX workaround. Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014	2016-09-09 19:57:32 +00:00
avg	e62f31d754	work around AMD erratum 793 for family 16h, models 00h-0Fh	2016-09-07 14:24:29 +00:00
jhb	b83d0562bd	Reset PCI pass through devices via PCI-e FLR during VM start and end. Add routines to trigger a function level reset (FLR) of a PCI-express device via the PCI-express device control register. This also includes support routines to wait for pending transactions to complete as well as calculating the maximum completion timeout permitted by a device. Change the ppt(4) driver to reset pass through devices before attaching to a VM during startup and before detaching from a VM during shutdown. Reviewed by: imp, wblock (earlier version) MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7751	2016-09-06 21:15:35 +00:00
jhb	9b7bf59c96	Update the I/O MMU in bhyve when PCI devices are added and removed. When the I/O MMU is active in bhyve, all PCI devices need valid entries in the DMAR context tables. The I/O MMU code does a single enumeration of the available PCI devices during initialization to add all existing devices to a domain representing the host. The ppt(4) driver then moves pass through devices in and out of domains for virtual machines as needed. However, when new PCI devices were added at runtime either via SR-IOV or HotPlug, the I/O MMU tables were not updated. This change adds a new set of EVENTHANDLERS that are invoked when PCI devices are added and deleted. The I/O MMU driver in bhyve installs handlers for these events which it uses to add and remove devices to the "host" domain. Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7667	2016-09-06 20:17:54 +00:00
jhb	31bd6f147b	Remove remnants of PERFMON and I586_PMC_GUPROF from amd64. These options were never fully ported over from i386.	2016-09-06 19:25:32 +00:00
jhb	5f56f30076	Leave ppt devices in the host domain when they are not attached to a VM. This allows a pass through device to be reset to a normal device driver on the host and reused on the host. ppt devices are now always active in some I/O MMU domain when the I/O MMU is active, either the host domain or the domain of a VM they are attached to. Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D7666	2016-09-06 18:53:17 +00:00
markj	fb5804c98d	Remove support for idle page zeroing. Idle page zeroing has been disabled by default on all architectures since r170816 and has some bugs that make it seemingly unusable. Specifically, the idle-priority pagezero thread exacerbates contention for the free page lock, and yields the CPU without releasing it in non-preemptive kernels. The pagezero thread also does not behave correctly when superpage reservations are enabled: its target is a function of v_free_count, which includes reserved-but-free pages, but it is only able to zero pages belonging to the physical memory allocator. Reviewed by: alc, imp, kib Differential Revision: https://reviews.freebsd.org/D7714	2016-09-03 20:38:13 +00:00
alc	24a2d27767	As an optimization to the machine-independent layer, change the machine- dependent pmap_ts_referenced() so that it updates the page's dirty field if a modified bit is found while counting reference bits. This opportunistic update can be performed at low cost and can eliminate the need for some future calls to pmap_is_modified() by the machine- independent layer. Reviewed by: kib, markj MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7722	2016-09-01 15:57:44 +00:00
bde	aa589d0481	Shorten banal comments about zeroing and copying pages. Don't give implementation details that last echoed the code 15-20 years ago. But add a detail about pagezero() on i386. Switch from Mach style to BSD style.	2016-08-29 14:38:31 +00:00
bde	eb85aca715	On amd64, declare sse2_pagezero() and start using it again, but only for zeroing pages in idle where nontemporal writes are clearly best. This is almost a no-op since zeroing in idle works does nothing good and is off by default. Fix END() statement forgotten in previous commit. Align the loop in sse2_pagezero(). Since it writes to main memory, the loop doesn't have to be very carefully written to keep up. Unrolling it was considered useless or harmful and was not done on i386, but that was too careless. Timing for i386: the loop was not unrolled at all, and moved only 4 bytes/iteration. So on a 2GHz CPU, it needed to run at 2 cycles/ iteration to keep up with a memory speed of just 4GB/sec. But when it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/ iteration so it gave a maximum speed of 2.67GB/sec and couldn't even keep up with PC3200 memory. Fix the alignment so that it keep up with 4GB/sec memory, and unroll once to get nearer to 8GB/sec. Further unrolling might be useless or harmful since it would prevent the loop fitting in 16-bytes. My test system with an old CPU and old DDR1 only needed 5+ GB/sec. My test system with a new CPU and DDR3 doesn't need any changes to keep up ~16GB/sec. Timing for amd64: with 8-byte accesses and newer faster CPUs it is easy to reach 16GB/sec but not so easy to go much faster. The alignment doesn't matter much if the CPU is not very old. The loop was already unrolled 4 times, but needs 32 bytes and uses a fancy method that doesn't work for 2-way unrolling in 16 bytes. Just align it to 32-bytes.	2016-08-29 13:07:21 +00:00
bde	61cf0d838a	Restore the nontemporal pagezero() under the name sse2_pagezero() (the same name as for i386). It is not reconnected yet. Which method is better is too machine-dependent and system-dependent to replace the old method unconditionally.	2016-08-29 06:07:43 +00:00
jhb	731da97ac4	Enable I/O MMU when PCI pass through is first used. Rather than enabling the I/O MMU when the vmm module is loaded, defer initialization until the first attempt to pass a PCI device through to a guest. If the I/O MMU fails to initialize or is not present, than fail the attempt to pass a PCI device through to a guest. The hw.vmm.force_iommu tunable has been removed since the I/O MMU is no longer enabled during boot. However, the I/O MMU support can be disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent use of the I/O MMU on any systems where it is buggy. Reviewed by: grehan MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D7448	2016-08-26 20:15:22 +00:00
ed	d81be03d3f	Make execution of 32-bit CloudABI executables work on amd64. A nice thing about requiring a vDSO is that it makes it incredibly easy to provide full support for running 32-bit processes on 64-bit systems. Instead of letting the kernel be responsible for composing/decomposing 64-bit arguments across multiple registers/stack slots, all of this can now be done in the vDSO. This means that there is no need to provide duplicate copies of certain system calls, like the sys_lseek() and freebsd32_lseek() we have for COMPAT_FREEBSD32. This change imports a new vDSO from the CloudABI repository that has automatically generated code in it that copies system call arguments into a buffer, padding them to eight bytes and zero-extending any pointers/size_t arguments. After returning from the kernel, it does the inverse: extracting return values, in the process truncating pointers/size_t values to 32 bits. Obtained from: https://github.com/NuxiNL/cloudabi	2016-08-24 10:51:33 +00:00
ed	c0aa6fd209	Convert pointers obtained from the threadattr_t structure with TO_PTR(). In all of these source files, the userspace pointer size corresponds with the kernelspace pointer size, meaning that casting directly works. As I'm planning on making 32-bit execution on 64-bit systems work as well, use TO_PTR() here as well, so that the changes between source files remain minimal.	2016-08-24 10:13:18 +00:00
jhb	4e659fa057	Fix build for !SMP kernels after the Xen MSIX workaround. Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014 Reported by: Glyn Grinstead <glyn@grinstead.org> MFC after: 3 days	2016-08-22 21:23:17 +00:00
jhb	e24281ea43	Remove the si(4) driver and sicontrol(8) for Specialix serial cards. The si(4) driver supported multiport serial adapters for ISA, EISA, and PCI buses. This driver does not use bus_space, instead it depends on direct use of the pointer returned by rman_get_virtual(). It is also still locked by Giant and calls for patch testing to convert it to use bus_space were unanswered. Relnotes: yes	2016-08-19 21:14:27 +00:00
kib	e07c032881	MFC r303913: Unconditionally perform checks that FPU region was entered, when #NM exception is caught in kernel mode.	2016-08-17 07:09:22 +00:00
avg	5461062c89	MFC r302835: fix-up for configuration of AMD Family 10h processors borrowed from Linux	2016-08-15 09:04:31 +00:00
kib	04bce34a47	The pmap_delayed_invl_wait() function blocks on turnstile, it does not spin, in the committed version. Remove stray '*' in the text. Sponsored by: The FreeBSD Foundation. MFC after: 3 days	2016-08-11 12:37:11 +00:00
ed	cc2c089a3f	Provide the CloudABI vDSO to its executables. CloudABI executables already provide support for passing in vDSOs. This functionality is used by the emulator for OS X to inject system call handlers. On FreeBSD, we could use it to optimize calls to gettimeofday(), etc. Though I don't have any plans to optimize any system calls right now, let's go ahead and already pass in a vDSO. This will allow us to simplify the executables, as the traditional "syscall" shims can be removed entirely. It also means that we gain more flexibility with regards to adding and removing system calls. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D7438	2016-08-10 21:02:41 +00:00
kib	acae466016	Unconditionally perform checks that FPU region was entered, when #NM exception is caught in kernel mode. There are third-party modules which trigger the issue, and since the problem causes usermode state corruption at least, panic in production kernels as well. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-10 13:44:03 +00:00
jhb	fbbd0c6298	MFC 302181,302635: Disable MSI-X migration on older Xen hypervisors. 302181: Add a tunable to disable migration of MSI-X interrupts. The new 'machdep.disable_msix_migration' tunable can be set to 1 to disable migration of MSI-X interrupts. Xen versions prior to 4.6.0 do not properly handle updates to MSI-X table entries after the initial write. In particular, the operation to unmask a table entry after updating it during migration is not propagated to the "real" table for passthrough devices causing the interrupt to remain masked. At least some systems in EC2 are affected by this bug when using SRIOV. The tunable can be set in loader.conf as a workaround. 302635: xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated.	2016-08-05 17:13:25 +00:00
jhb	1c2702c041	Don't permit mappings of invalid physical addresses on amd64 via /dev/mem. Discussed with: kib	2016-08-04 17:55:23 +00:00
jhb	b1de97ff2b	Correct assertion on vcpuid argument to vm_gpa_hold(). PR: 208168 Submitted by: Dave Cameron <daverabbitz@ihug.co.nz> Reviewed by: grehan MFC after: 1 month	2016-08-03 15:20:10 +00:00
kib	9a5f028012	Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no difference between files. For pc98, put x86/mp_x86.c into the same place as used by i386 file list. Fix typo in comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 13:51:53 +00:00
mjg	b584a8e1ae	amd64: implement pagezero using rep stos The current implementation uses non-temporal writes. This turns out to be detrimental to performance if the page is used shortly after, which is the typical case with page faults. Switch to rep stos. Reviewed by: kib MFC after: 1 week	2016-07-31 11:34:08 +00:00
brooks	017f31c108	Don't create pointless backups of generated files in "make sysent". Any sensible workflow will include a revision control system from which to restore the old files if required. In normal usage, developers just have to clean up the mess. Reviewed by: jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7353	2016-07-28 21:29:04 +00:00
mav	fcb5c9368b	Add more UEFI/e820 memory types from latest specifications. This is only cosmetics. MFC after: 2 weeks	2016-07-24 09:15:11 +00:00
dchagin	98960de4ab	MFC r302517: Fix a copy/paste bug introduced during X86_64 Linuxulator work. FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation use READ_IMPLIES_EXEC flag, introduced in r302515. While here move common part of mmap() and mprotect() code to the files in compat/linux to reduce code dupcliation between Linuxulator's. MFC r302518, r302626: Add linux_mmap.c to the appropriate conf/files.	2016-07-17 15:23:32 +00:00
dchagin	1886392a37	Regen for r302962 (Linux personality), record mergeinfo for r320516.	2016-07-17 15:11:23 +00:00
dchagin	8576a4ebaa	MFC r302515: Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.	2016-07-17 15:07:33 +00:00
mav	480936abd1	Increase number of I/O APIC pins from 24 to 32 to give PCI up to 16 IRQs. Move HPET to the top of the supported 0-31 range. Proposed by: jhb@, grehan@	2016-07-14 14:35:25 +00:00
avg	00ea714475	remove a stray change from r302834 MFC after: 3 weeks X-MFC with: r302834	2016-07-14 11:13:26 +00:00
avg	555e531ac1	fix-up for configuration of AMD Family 10h processors borrowed from Linux http://lxr.free-electrons.com/source/arch/x86/kernel/cpu/amd.c#L643 BIOS may configure Family 10h processors to convert WC+ cache type to CD. That can hurt performance of guest VMs using nested paging. Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D6059	2016-07-14 11:03:05 +00:00
badger	5908cb719e	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
royger	844ce8697a	xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. Sponsored by: Citrix Systems R&D MFC after: 3 days Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D7148	2016-07-12 08:43:09 +00:00
dchagin	c93d4a7bde	Fix a copy/paste bug introduced during X86_64 Linuxulator work. FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation use READ_IMPLIES_EXEC flag, introduced in r302515. While here move common part of mmap() and mprotect() code to the files in compat/linux to reduce code dupcliation between Linuxulator's. Reported by: Johannes Jost Meixner, Shawn Webb MFC after: 1 week XMFC with: r302515, r302516	2016-07-10 08:22:04 +00:00
dchagin	7acd3da18d	Regen for r302215 (Linux personality).	2016-07-10 08:17:16 +00:00
dchagin	50efd461d3	Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.	2016-07-10 08:15:50 +00:00
ed	887bfdc0a4	Don't forget to set sa->narg for CloudABI system calls. It turns out that this value is not used within the system call code under normal conditions, except when using tracing tools like ktrace. If we forget to set this value, it is set to random garbage. This may cause ktrace to hang indefinitely, making it impossible to kill. Reported by: Michael Plass PR: 210800 MFC before: 11.0-RELEASE	2016-07-08 20:09:21 +00:00
nwhitehorn	89d01c24d1	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
sephe	ad3696213e	MFC 301015 hyperv/vmbus: Rename ISR functions MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6601	2016-06-24 01:20:33 +00:00

... 2 3 4 5 6 ...

8130 Commits