freebsd-skq

Author	SHA1	Message	Date
kib	e114084144	There is no need to disable interrupts around npxsave call. i386 was changed to only require critical section around the thread FPU state manipulations, and vm86_bioscall callers already enter critical section for other reasons. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-03-23 15:46:53 +00:00
kib	97713e367c	Update comment to match current field names. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-03-23 15:44:31 +00:00
emaste	289be8516c	Share Linux errno table with libsysdecode Requested by: jhb Reviewed by: jhb Sponsored by: Turing Robotic Industries Inc.	2018-03-22 12:58:49 +00:00
emaste	444301c25e	Fix kernel memory disclosure in ibcs2_getdents ibcs2_getdents() copies a dirent structure to userland. The ibcs2 dirent structure contains a 2 byte pad element. This element is never initialized, but copied to userland none-the-less. Note that ibcs2 has not built on HEAD since r302095. Submitted by: Domagoj Stolfa <ds815@cam.ac.uk> Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com> MFC after: 3 days Security: Kernel memory disclosure (803)	2018-03-21 23:26:42 +00:00
emaste	af3be5b4fb	Add ) missing from r330297 Sponsored by: The FreeBSD Foundation	2018-03-21 23:17:26 +00:00
emaste	6d7d087d6c	Restore close quote lost in r331254	2018-03-20 21:04:47 +00:00
emaste	6fe54a5343	Rename assym.s to assym.inc assym is only to be included by other .s files, and should never actually be assembled by itself. Reviewed by: imp, bdrewery (earlier) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D14180	2018-03-20 17:58:51 +00:00
emaste	3df5d233f9	Rationalize license text on Linuxolator files i386 linux.h missed in r330239. Approved by: sos MFC after: 1 week Sponsored by: The FreeBSD Foundation	2018-03-20 02:50:11 +00:00
emaste	58e1b5421f	Rename linuxulator functions with linux_ prefix It's preferable to have a consistent prefix. This also reduces differences between the three linux*_sysvec.c files. Sponsored by: Turing Robotic Industries Inc.	2018-03-19 21:26:32 +00:00
emaste	47fc89046a	linux*_sysvec.c: rationalize whitespace and comments There's a fair amount of duplication between MD linuxulator files. Make indentation and comments consistent between the three versions of linux_sysvec.c to reduce diffs when comparing them. Sponsored by: Turing Robotic Industries Inc.	2018-03-19 15:11:10 +00:00
emaste	566c3d41cc	Share a single bsd-linux errno table across MD consumers Three copies of the linuxulator linux_sysvec.c contained identical BSD to Linux errno translation tables, and future work to support other architectures will also use the same table. Move the table to a common file to be used by all. Make it 'const int' to place it in .rodata. (Some existing Linux architectures use MD errno values, but x86 and Arm share the generic set.) This change should introduce no functional change; a followup will add missing errno values. MFC after: 3 weeks Sponsored by: Turing Robotic Industries Inc. Differential Revision: https://reviews.freebsd.org/D14665	2018-03-16 14:46:38 +00:00
emaste	d90e216823	ANSIfy i386/vm86.c	2018-03-16 12:12:41 +00:00
emaste	9975d0e7b5	Remove stray ; at end of linux_vdso_deinstall()	2018-03-14 13:20:36 +00:00
emaste	6851f84d1f	Use C99 boolean type for translate_osrel Migrate to modern types before creating MD Linuxolator bits for new architectures. Reviewed by: cem Sponsored by: Turing Robotic Industries Inc. Differential Revision: https://reviews.freebsd.org/D14676	2018-03-13 16:40:29 +00:00
emaste	bc4d21ce60	Apply some style(9) to Linuxulator linux_sysvec.c comments	2018-03-13 00:40:05 +00:00
emaste	c303c68d6a	imgact_linux.c: use standard indentation Sponsored by: Turing Robotic Industries Inc.	2018-03-12 23:28:25 +00:00
emaste	ac81b47ca6	Linuxulator: apply style(9) to return Sponsored by: Turing Robotic Industries Inc.	2018-03-12 15:35:24 +00:00
brooks	1db1eebcab	Remove obsolete pcaudioio.h. Nothing uses the #define's values or the types. (Some NTP code does use an audio_info_t, but it is in #ifdef'd support for Solaris and is not this audio_info_t). Sponsored by: DARPA, AFRL	2018-03-11 16:17:53 +00:00
eadler	9b1ea58f58	sys: Fix a few potential infoleaks in cloudabi While there is no immediate leak, if the structure changes underneath us, there might be in the future. Submitted by: Domagoj Stolfa <domagoj.stolfa@gmail.com> MFC After: 1 month Sponsored by: DARPA/AFRL	2018-03-07 14:44:32 +00:00
jtl	8e9b6569cb	amd64: Protect the kernel text, data, and BSS by setting the RW/NX bits correctly for the data contained on each memory page. There are several components to this change: * Add a variable to indicate the start of the R/W portion of the initial memory. * Stop detecting NX bit support for each AP. Instead, use the value from the BSP and, if supported, activate the feature on the other APs just before loading the correct page table. (Functionally, we already assume that the BSP and all APs had the same support or lack of support for the NX bit.) * Set the RW and NX bits correctly for the kernel text, data, and BSS (subject to some caveats below). * Ensure DDB can write to memory when necessary (such as to set a breakpoint). * Ensure GDB can write to memory when necessary (such as to set a breakpoint). For this purpose, add new MD functions gdb_begin_write() and gdb_end_write() which the GDB support code can call before and after writing to memory. This change is not comprehensive: * It doesn't do anything to protect modules. * It doesn't do anything for kernel memory allocated after the kernel starts running. * In order to avoid excessive memory inefficiency, it may let multiple types of data share a 2M page, and assigns the most permissions needed for data on that page. Reviewed by: jhb, kib Discussed with: emaste MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D14282	2018-03-06 14:28:37 +00:00
kib	d27cb27779	Unify bulk free operations in several pmaps. Submitted by: Yoshihiro Ota Reviewed by: markj MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D13485	2018-03-04 20:53:20 +00:00
rpokala	03d8fd318a	imcsmb(4): Intel integrated Memory Controller (iMC) SMBus controller driver imcsmb(4) provides smbus(4) support for the SMBus controller functionality in the integrated Memory Controllers (iMCs) embedded in Intel Sandybridge- Xeon, Ivybridge-Xeon, Haswell-Xeon, and Broadwell-Xeon CPUs. Each CPU implements one or more iMCs, depending on the number of cores; each iMC implements two SMBus controllers (iMC-SMBs). * IMPORTANT NOTE * Because motherboard firmware or the BMC might try to use the iMC-SMBs for monitoring DIMM temperatures and/or managing an NVDIMM, the driver might need to temporarily disable those functions, or take a hardware interlock, before using the iMC-SMBs. Details on how to do this may vary from board to board, and the procedure may be proprietary. It is strongly suggested that anyone wishing to use this driver contact their motherboard vendor, and modify the driver as described in the manual page and in the driver itself. (For what it's worth, the driver as-is has been tested on various SuperMicro motherboards.) Reviewed by: avg, jhb MFC after: 1 week Relnotes: yes Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D14447 Discussed with: avg, ian, jhb Tested by: allanjude (previous version), Panasas	2018-03-03 01:53:51 +00:00
brooks	a0e10aee51	Rename kernel-only members of semid_ds and msgid_ds. This deliberately breaks the API in preperation for future syscall revisions which will remove these nonstandard members. In an exp-run a single port (devel/qemu-user-static) was found to use them which it did becuase it emulates system calls. This has been fixed in the ports tree. PR: 224443 (exp-run) Reviewed by: kib, jhb (previous version) Exp-run by: antoine Sponsored by: DARPA, AFRP Differential Revision: https://reviews.freebsd.org/D14490	2018-03-02 22:10:48 +00:00
cem	807035047f	Remove unused error return from API that cannot fail No implementation of fpu_kern_enter() can fail, and it was causing needless error checking boilerplate and confusion. Change the return code to void to match reality. (This trivial change took nine days to land because of the commit hook on sys/dev/random. Please consider removing the hook or otherwise lowering the bar -- secteam never seems to have free time to review patches.) Reported by: Lachlan McIlroy <Lachlan.McIlroy AT isilon.com> Reviewed by: delphij Approved by: secteam (delphij) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14380	2018-02-23 20:15:19 +00:00
emaste	5122d84e64	Correct pseudo misspelling in sys/ comments contrib code and #define in intel_ata.h unchanged.	2018-02-23 18:15:50 +00:00
emaste	f4add23ff5	Correct proper nouns in the Linuxulator - Capitalize Linux - Spell FreeBSD out in full - Address some style(9) on changed lines Sponsored by: Turing Robotic Industries Inc.	2018-02-22 02:24:17 +00:00
kib	ee3d0fb8ef	vm_wait() rework. Make vm_wait() take the vm_object argument which specifies the domain set to wait for the min condition pass. If there is no object associated with the wait, use curthread' policy domainset. The mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by the new helper vm_wait_doms(), which directly takes the bitmask of the domains to wait for passing min condition. Eliminate pagedaemon_wait(). vm_domain_clear() handles the same operations. Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls are enough. Eliminate several control state variables from vm_domain, unneeded after the vm_wait() conversion. Scetched and reviewed by: jeff Tested by: pho Sponsored by: The FreeBSD Foundation, Mellanox Technologies Differential revision: https://reviews.freebsd.org/D14384	2018-02-20 10:13:13 +00:00
emaste	624a2a708b	Rationalize license text on Linuxolator files Many licenses on Linuxolator files contained small variations from the standard FreeBSD license text. To avoid license proliferation switch to the standard 2-clause FreeBSD license for those files where I have permission from each of the listed copyright holders. Additional files waiting on permission from others are listed in review D14210. Approved by: kan, marcel, sos, rdivacky MFC after: 1 week Sponsored by: The FreeBSD Foundation	2018-02-16 15:00:14 +00:00
cem	47a3cad5ce	x86 pmap: Make memory mapped via pmap_qenter() non-executable The idea is, the pmap_qenter() API is now defined to not produce executable mappings. If you need executable mappings, use another API. Add pg_nx flag in pmap_qenter on x86 to make kernel pages non-executable. Other architectures that support execute-specific permissons on page table entries should subsequently be updated to match. Submitted by: Darrick Lew <darrick.freebsd AT gmail.com> Reviewed by: markj Discussed with: alc, jhb, kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14062	2018-02-14 23:35:47 +00:00
hselasky	4287266c4f	Import the mthca kernel side infiniband driver from Linux 4.9 and fix compilation under FreeBSD. The mthca driver was temporarily removed as part of the Linux 4.9 RoCE/infinband upgrade. Top commit in Linux source tree: 69973b830859bc6529a7a0468ba0d80ee5117826 Sponsored by: Mellanox Technologies	2018-02-13 17:04:34 +00:00
jeff	ba27b5187b	Make v_wire_count a per-cpu counter(9) counter. This eliminates a significant source of cache line contention from vm_page_alloc(). Use accessors and vm_page_unwire_noq() so that the mechanism can be easily changed in the future. Reviewed by: markj Discussed with: kib, glebius Tested by: pho (earlier version) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14273	2018-02-12 22:53:00 +00:00
imp	c78af2dc5f	We don't support gcc < 4.2.1, so varargs.h now is just #error always. Unifdef for versions prior to 4.2.1 and remove now-unused header files. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D14323	2018-02-12 14:48:14 +00:00
markj	7bc81b6db1	Use vm_page_unwire_noq() instead of directly modifying page wire counts. No functional change intended. Reviewed by: alc, kib (previous revision) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D14266	2018-02-08 19:28:51 +00:00
jeff	e67ec0d694	Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000	2018-02-06 22:10:07 +00:00
kib	fba8d52923	Move signal trampolines out of locore.s into separate source file. Similar to other arches, the move makes the subject of locore.s only the kernel startup. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2018-02-06 00:02:30 +00:00
emaste	0eb6366bc8	Additional linuxolator whitespace cleanup, missed in r328890	2018-02-05 18:39:06 +00:00
emaste	5c9ea56c98	Linuxolator whitespace cleanup A version of each of the MD files by necessity exists for each CPU architecture supported by the Linuxolator. Clean these up so that new architectures do not inherit whitespace issues. Clean up shared Linuxolator files while here. Sponsored by: Turing Robotic Industries Inc.	2018-02-05 17:29:12 +00:00
kib	01b52fdebb	IBRS support, AKA Spectre hardware mitigation. It is coded according to the Intel document 336996-001, reading of the patches posted on lkml, and some additional consultations with Intel. For existing processors, you need a microcode update which adds IBRS CPU features, and to manually enable it by setting the tunable/sysctl hw.ibrs_disable to 0. Current status can be checked in sysctl hw.ibrs_active. The mitigation might be inactive if the CPU feature is not patched in, or if CPU reports that IBRS use is not required, by IA32_ARCH_CAP_IBRS_ALL bit. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14029	2018-01-31 14:36:27 +00:00
bdrewery	59f27534cc	Don't use an .OBJDIR for 'make sysent'. Reported by: emaste, jhb Sponsored by: Dell EMC	2018-01-29 19:14:15 +00:00
imp	1912ffb2e5	Add ISA PNP tables to ISA drivers. Fix a few incidental comments. ACPI ISA PBP tables not tagged, there's bigger issues with them.	2018-01-29 00:22:30 +00:00
kib	545e25ea75	Use PCID to optimize PTI. Use PCID to avoid complete TLB shootdown when switching between user and kernel mode with PTI enabled. I use the model close to what I read about KAISER, user-mode PCID has 1:1 correspondence to the kernel-mode PCID, by setting bit 11 in PCID. Full kernel-mode TLB shootdown is performed on context switches, since KVA TLB invalidation only works in the current pmap. User-mode part of TLB is flushed on the pmap activations as well. Similarly, IPI TLB shootdowns must handle both kernel and user address spaces for each address. Note that machines which implement PCID but do not have INVPCID instructions, cause the usual complications in the IPI handlers, due to the need to switch to the target PCID temporary. This is racy, but because for PCID/no-INVPCID we disable the interrupts in pmap_activate_sw(), IPI handler cannot see inconsistent state of CPU PCID vs PCPU pmap/kcr3/ucr3 pointers. On the other hand, on kernel/user switches, CR3_PCID_SAVE bit is set and we do not clear TLB. I can imagine alternative use of PCID, where there is only one PCID allocated for the kernel pmap. Then, there is no need to shootdown kernel TLB entries on context switch. But copyout(3) would need to either use method similar to proc_rwmem() to access the userspace data, or (in reverse) provide a temporal mapping for the kernel buffer into user mode PCID and use trampoline for copy. Reviewed by: markj (previous version) Tested by: pho Discussed with: alc (some aspects) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D13985	2018-01-27 11:49:37 +00:00
emaste	ae11f64597	Use BSD-2-Clause-FreeBSD license on linux_support.s These files previously had a 3-clause license and 'THE REGENTS' text. Switch to standard 2-clause text with kib's approval, and add the SPDX tag. Approved by: kib	2018-01-23 20:35:43 +00:00
pfg	ced875130d	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
royger	774ffaf6a8	xen: fix IDT setup after PTI On amd64 the IDT handler was not set correctly when using PTI. While there also fix the selectors to SEL_KPL. Obtained from: kib MFC with: r328083	2018-01-20 14:59:37 +00:00
nwhitehorn	49a1f46412	Define PHYS_TO_DMAP() and DMAP_TO_PHYS() as panics on the architectures (i386 and arm) that never implement them. This allows the removal of #ifdef PHYS_TO_DMAP on code otherwise protected by a runtime check on PMAP_HAS_DMAP. It also fixes the build on ARM and i386 after I forgot an #ifdef in r328168. Reported by: Milan Obuch Pointy hat to: me	2018-01-19 22:17:13 +00:00
nwhitehorn	e79f2b9178	Remove SFBUF_OPTIONAL_DIRECT_MAP and such hacks, replacing them across the kernel by PHYS_TO_DMAP() as previously present on amd64, arm64, riscv, and powerpc64. This introduces a new MI macro (PMAP_HAS_DMAP) that can be evaluated at runtime to determine if the architecture has a direct map; if it does not (or does) unconditionally and PMAP_HAS_DMAP is either 0 or 1, the compiler can remove the conditional logic. As part of this, implement PHYS_TO_DMAP() on sparc64 and mips64, which had similar things but spelled differently. 32-bit MIPS has a partial direct-map that maps poorly to this concept and is unchanged. Reviewed by: kib Suggestions from: marius, alc, kib Runtime tested on: amd64, powerpc64, powerpc, mips64	2018-01-19 17:46:31 +00:00
jhb	0aaf564f94	Use long for the last argument to VOP_PATHCONF rather than a register_t. pathconf(2) and fpathconf(2) both return a long. The kern_[f]pathconf() functions now accept a pointer to a long value rather than modifying td_retval directly. Instead, the system calls explicitly store the returned long value in td_retval[0]. Requested by: bde Reviewed by: kib Sponsored by: Chelsio Communications	2018-01-17 22:36:58 +00:00
kib	c35d24e497	PTI for amd64. The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1. The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped: kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers. Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues. IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed. On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap(). Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity. PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386. The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit. MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending. Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2018-01-17 11:44:21 +00:00
pfg	a7c6776f59	x86: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these ire likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values. X-Differential revision: https://reviews.freebsd.org/D13837	2018-01-15 21:08:22 +00:00
emaste	48fbddaae8	Enable VIMAGE in i386 GENERIC (revert r327840) We've switched back to ld.bfd on i386 for now. PR: 225077 Sponsored by: The FreeBSD Foundation	2018-01-14 16:04:51 +00:00
jeff	f375b4dd66	Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag. The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty. The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense. Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon	2018-01-12 23:25:05 +00:00
emaste	3ba4847ec6	Temporarily disable VIMAGE on i386 An lld-linked i386 kernel hangs on boot, after em(4) starts. This seems similar to the issue that prompted us to disable VIMAGE on arm64 in r326179. Disable VIMAGE on i386 for now while we investigate. PR: 225077 Sponsored by: The FreeBSD Foundation	2018-01-11 19:08:43 +00:00
kib	9a35b0521e	Fix grammar. Submitted by: alc MFC after: 3 days	2018-01-11 16:50:03 +00:00
kib	3a13615ee3	Remove redundand CLD instructions. We already clear %RFLAGS.DF on the kernel entry due to the compiler's ABI requirements. Suggested by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2018-01-11 13:22:13 +00:00
kib	7429338dc0	Update comment explaining the check, to reality. Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 days	2018-01-11 12:07:24 +00:00
cem	8899c9cd84	x86: Document purpose of _safe variants of {rd,wr}msr() Sponsored by: Dell EMC Isilon	2018-01-10 22:41:00 +00:00
imp	6d4bb5fab7	inittodr(0) actually sets the time, so there's no need to call tc_setclock(). It's redundant. Tweak UPDATING based on code review of past releases. Relnotes: yes (for the removal of pmtimer)	2018-01-10 17:25:08 +00:00
imp	10d1a3f2a4	Move prof_machdep.c to it's more traditional place under i386/i386.	2018-01-10 16:51:55 +00:00
imp	07f8cf61c1	Retire pmtimer driver. Move time fixing into apm driver. Move Iwasaki-san's copyright over. Remove FIXME code that couldn't possibly work. Call tc_settime() with our estimate of the delta we've been alseep (the one we print) to adjust the time. Not sure what to do about callouts, so keep the small #ifdef in place there. Differential Revision: https://reviews.freebsd.org/D13823	2018-01-10 14:59:19 +00:00
imp	568d742088	Remove ccbque.h from i386/isa. inline ccbque.h into scsi_low.h. The file isn't MD, so shouldn't live in i386/isa. It's only used by scsi_low, so move it there so no new clients accidentally grow. scsi_low may not even still work, and the locking here is still SPL based. CAM should do the right thing, but I've received no reports of these cards still working. At least it compiles still and there's one fewer files in sys/i386/isa. While I'm here, ansify and de-splize. CCB_MWANTED appears to be a clear-only flag, but I've not changed that. Differential Review: https://reviews.freebsd.org/D13672	2018-01-09 16:11:33 +00:00
kib	e478c0d2cf	Avoid re-check of usermode condition. It does not change anything in the behavior of trap_pfault(), while eliminating obfuscation of jumping to the code which checks for the condition reversed of the goto cause. Also avoid force initialize the rv variable, since it is now only accessed after storing vm_fault() return value. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13725	2018-01-01 20:47:03 +00:00
kib	e75b16052d	Move i386/isa/elink.[hc] to dev/ep. The ep(4) driver is the only consumer of the two functions from elink.c. I removed the standalone module as well, and most likely, the module metadata is not needed anywhere, but this is for later cleanup. Discussed with: imp, jhb Sponsored by: The FreeBSD Foundation	2017-12-30 11:42:49 +00:00
kib	5df88ba9c0	Move i386/isa/npx.c to i386i386/npx.c. The i386 FPU (AKA npx) code does not depend on ISA devices at all, after the support for IRQ13 FPU exceptions was removed. Put the file into the expected place in the kernel source tree. Discussed with: jhb Sponsored by: The FreeBSD Foundation	2017-12-30 11:33:04 +00:00
imp	7634b7d88a	Remove two stray references to wl driver, removed some time ago.	2017-12-30 07:59:32 +00:00
jhb	cc9a800848	Remove a few more dangling references to ie(4).	2017-12-28 21:35:53 +00:00
pfg	cab3e1fc44	sys/i386/isa/elink*: sync with (older) NetBSD. Make it easier to identify the point where we started diverging from NetBSD. Newer versions of these files have been updated to a new license so this information may become useful some day. Obtained from: NetBSD	2017-12-28 14:14:35 +00:00
eadler	421a929b1e	kernel: Fix several typos and minor errors - duplicate words - typos - references to old versions of FreeBSD Reviewed by: imp, benno	2017-12-27 03:23:21 +00:00
kib	3d18a9d66f	Add atomic_load(9) and atomic_store(9) operations. They provide relaxed-ordered atomic access semantic. Due to the FreeBSD memory model, the operations are syntaxical wrappers around the volatile accesses. The volatile qualifier is used to ensure that the access not optimized out and in turn depends on the volatile semantic as implemented by supported compilers. The motivation for adding the operation is to help people coming from other systems or knowing the C11/C++ standards where atomics have special type and require use of the special access operations. It is still the case that FreeBSD requires plain load and stores of aligned integer types to be atomic. Suggested by: jhb Reviewed by: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13534	2017-12-19 09:59:20 +00:00
bde	5c5c139b6a	Also forgotten in the previous that removed the permanent double mapping of low physical memory: Update the comment about leaving the permanent mapping in place. This also improves the wording of the comment. PTD 0 is still left alone because it is fairly important that it was unmapped earlier, and the comment now describes the unmapping of the other low PTDs that the code actually does. Reviewed by: kib	2017-12-18 14:29:48 +00:00
bde	994bacdf8f	Remove the permanent double mapping of low physical memory and replace it by a transient double mapping for the one instruction in ACPI wakeup where it is needed (and for many surrounding instructions in ACPI resume). Invalidate the TLB as soon as convenient after undoing the transient mapping. ACPI resume already has the strict ordering needed for this. This fixes the non-trapping of null pointers and other garbage pointers below NBPDR (except transiently). NBPDR is quite large (4MB, or 2MB for PAE). This fixes spurious traps at the first instruction in VM86 bioscalls. The traps are for transiently missing read permission in the first VM86 page (physical page 0) which was just written to at KERNBASE in the kernel. The mechanism is unknown (it is not simply PG_G). locore uses a similar but larger transient double mapping and needs it for 2 instructions instead of 1. Unmap the first PDE in it after the 2 instructions to detect most garbage pointers while bootstrapping. pmap_bootstrap() finishes the unmapping. Remove the avoidance of the double mapping for a recently fixed special case. ACPI resume could use this avoidance (made non-special) to avoid any problems with the transient double mapping, but no such problems are known. Update comments in locore. Many were for old versions of FreeBSD which tried to map low memory r/o except for special cases, or might have allowed access to low memory via physical offsets. Now all kernel maps are r/w, and removal of of the double map disallows use of physical offsets again.	2017-12-18 13:53:22 +00:00
bde	6031fc5935	Fix the undersupported option KERNLOAD, part 2: fix crashes in locore when KERNLOAD is smaller than NBPDR (not the default) and PG_G is enabled (the default if the CPU supports it). This case has relatively minor problems with coherency of the permanent double mapping, but the fix in r167869 to improve coherency creates page tables with 3 different errors so never worked. The permanent double mapping is fundamentally broken and will be removed soon. It fundamentally breaks trapping for null pointers and requires complications to avoid cache coherency bugs. It is currently used for only a single instruction in ACPI resume, Many fixes VM86 and/or ACPI and/or the double map were attempted near r1200000. r167869 attempted to fix cache coherency bugs in an unusual case, but the bugs were unreachable because older errors in page tables caused a crash first. This commit just makes r167869 work as intended. Part 1 of these fixes fixed the other errors, but also stopped mapping the PDE for KERNBASE as a large page, so double mapping of this PDE only causes the same problems as when KERNLOAD is the default. Except for the problem of trapping null pointers, r167869 could be used to fix these problems, but it is inactive in usual cases. The only known other problem is that incoherent permissions for page 0 cause spurious traps in VM86 BIOS calls. Reviewed by: kib	2017-12-18 11:57:05 +00:00
bde	622efbbef8	Fix the undersupported option KERNLOAD, part 1: fix crashes in locore when KERNLOAD is not a multiple of NBPDR (not the default) and PSE is enabled (the default if the CPU supports it). Addresses in PDEs must be a multiple of NBPDR in the PSE case, but were not so in the crashing case. KERNLOAD defaults to NBPDR. NBPDR is 4 MB for !PAE and 2 MB for PAE. The default can be changed by editing i386/include/vmparam.h or using makeoptions. It can be changed to less than NBPDR to save real and virtual memory at a small cost in time, or to more than NBPDR to waste real and virtual memory. It must be larger than 1 MB and a multiple of PAGE_SIZE. When it is less than NBPDR, it is necessarily not a multiple of NBPDR. This case has much larger bugs which will be fixed in part 2. The fix is to only use PSE for physical addresses above <KERNLOAD rounded _up_ to an NBPDR boundary>. When the rounding is non-null, this leaves part of the kernel not using large pages. Rounding down would avoid this pessimization, but would break setting of PAT bits on i/o pages if it goes below 1MB. Since rounding down always goes below 1MB when KERNLOAD < NBPDR and the KERNLOAD > NBPDR case is not useful, never round down. Fix related style bugs (e.g., wrong literal values for NBPDR in comments). Reviewed by: kib	2017-12-18 09:32:56 +00:00
bde	4e663070d6	Minor cleanups found while fixing a bug involving double mapping of low memory: Load the kernel eflags less magically, as in locore. The magic increased when I removed eflags from the pcb in r305899. Remove a jump to low memory that became garbage when the i386 version was mostly replaced by the amd64 version in r235622. The amd64 version is very similar. It still loads the flags magically, but is not missing comments about using the special page table. Reviewed by: kib	2017-12-15 03:05:14 +00:00
markj	b0b9b4fcf4	Pass the trap frame to fasttrap hooks. The DTrace fasttrap entry points expect a struct reg containing the register values of the calling thread. Perform the conversion in fasttrap rather than in the trap handler: this reduces the number of ifdefs and avoids wasting stack space for traps that don't involve DTrace. MFC after: 2 weeks	2017-12-11 19:21:39 +00:00
cem	c89be21d55	i386: Bump KSTACK_PAGES default to match amd64 Logically, extend r286288 to cover all threads, by default. The world has largely moved on from i386. Most FreeBSD users and developers test on amd64 hardware. For better or worse, we have written a non-trivial amount of kernel code that relies on stacks larger than 8 kB, and it "just works" on amd64, so there has been little incentive to shrink it. amd64 had its KSTACK_PAGES bumped to 4 back in Peter's initial AMD64 commit, r114349, in 2003. Since that time, i386 has limped along on a stack half the size. We've even observed the stack overflows years ago, but neglected to fix the issue; see the 20121223 and 20150728 entries in UPDATING. If anyone is concerned with this change, I suggest they configure their AMD64 kernels with KSTACK_PAGES 2 and fix the fallout there first. Eugene has identified a list of high stack usage functions in the first PR below. PR: 219476, 224218 Reported by: eugen@, Shreesh Holla <hshreesh AT yahoo.com> Relnotes: maybe Sponsored by: Dell EMC Isilon	2017-12-11 04:32:37 +00:00
bde	d81660feeb	Move instantiation of msgbufp from 9 MD files to subr_prf.c. This variable should be pure MI except possibly for reading it in MD dump routines. Its initialization was pure MD in 4.4BSD, but FreeBSD changed this in r36441 in 1998. There were many imperfections in r36441. This commit fixes only a small one, to simplify fixing the others 1 arch at a time. (r47678 added support for special/early/multiple message buffer initialization which I want in a more general form, but this was too fragile to use because hacking on the msgbufp global corrupted it, and was only used for 5 hours in -current...)	2017-12-07 07:55:38 +00:00
pfg	b0f7aa75d4	SPDX: use the Beerware identifier.	2017-11-30 20:33:45 +00:00
scottl	49fb5dd79b	It's time to retire AHC_REG_PRETTY_PRINT and AHD_REG_PRETTY_PRINT from the standard kernels. They are still available as custom compile options.	2017-11-29 23:41:49 +00:00
brooks	c6fbed1a3a	Disable vim syntax highlighting. Vim's default pick doesn't understand that ';' is a comment character and the result looks horrible. Reviewed by: emaste	2017-11-28 18:23:17 +00:00
fsu	24e4690114	Remap ENOATTR to ENODATA in the linuxulator. In the linux ENOADATA is frequently #defined as ENOATTR. The change is required for an xattrs support implementation. MFC after: 1 week Discussed with: netchild Approved by: pfg Differential Revision: https://reviews.freebsd.org/D13221	2017-11-27 17:03:11 +00:00
pfg	712696a24c	sys/i386: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:08:52 +00:00
ed	d13a2ec254	Use TO_PTR() to convert integers to pointers. For FreeBSD/arm64's cloudabi32 support, I'm going to need a TO_PTR() in this place. Also use it for all of the other source files, so that the difference remains as minimal as possible. MFC after: 2 weeks	2017-11-26 14:45:56 +00:00
hselasky	091ce9badd	Merge ^/head r326132 through r326161.	2017-11-24 12:13:27 +00:00
hselasky	7b5126003a	Merge ^/head r325999 through r326131.	2017-11-23 14:28:14 +00:00
kib	873f304292	Remove lint support from system headers and MD x86 headers. Reviewed by: dim, jhb Discussed with: imp Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13156	2017-11-23 11:40:16 +00:00
pfg	4736ccfd9c	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
hselasky	c6f05b2594	Merge ^/head r325842 through r325998.	2017-11-19 12:36:03 +00:00
pfg	9da7bdde06	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133	2017-11-18 14:26:50 +00:00
kib	b53cf0d5b7	Remove i386 XBOX support. It is for console presented at 2001 and featuring Pentium III processor. Even if any of them are still alive and run FreeBSD, we do not have any sign of life from their users. While removing another dozens of #ifdefs from the i386 sources reduces the aversion from looking at the code and improves the platform vitality. Reviewed by: cem, pfg, rink (XBOX support author) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13016	2017-11-16 14:27:02 +00:00
hselasky	b909dfecb7	Remove no longer supported mthca driver. Sponsored by: Mellanox Technologies	2017-11-13 10:59:38 +00:00
kib	a43e8bfb3a	Remove useless DEBUG printfs in i386 sendsig() implementations. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-11-08 13:05:14 +00:00
kib	8113f89b71	x86: Do not emit unused TD_TID symbols. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-11-04 10:51:52 +00:00
kib	a46f239e34	Eliminate unused load. Based on github pull request: #117 Submitted by: Wuyang-Chung@github MFC after: 1 week	2017-11-04 10:50:47 +00:00
kib	a6dcbd1557	Consistently ensure that we do not load MXCSR with reserved bits set. Some callers of fpusetregs()/npxsetregs(), most importantly set_fpcontext(), clear reserved bits. But some did not. Do the clearing in fpusetregs() and remove now redundand operation from set_fpcontext(). Reported by: Maxime Villard <max@m00nbsd.net> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-11-01 10:32:44 +00:00
tijl	173e2ebded	Set the return address for stack entry points to zero. Stack unwinders treat zero as a stop condition. The value on the stack can be non-zero because thread stacks may be arbitrary memory provided via pthread_attr_setstack(3) or may be recycled from previous threads. Reference: https://lists.freebsd.org/pipermail/freebsd-current/2017-August/066855.html https://lists.freebsd.org/pipermail/freebsd-current/2017-October/067254.html Discussed with: kib MFC after: 1 week	2017-10-31 11:51:34 +00:00
eadler	45275e3a26	Update several more URLs - Primarily http -> https - Primarily FreeBSD project URLs	2017-10-29 08:17:03 +00:00
markj	1588800df4	Fix the VM_NRESERVLEVEL == 0 build. Add VM_NRESERVLEVEL guards in the pmaps that implement transparent superpage promotion using reservations. Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D12764	2017-10-23 15:34:05 +00:00
bz	48b1992757	With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into HEAD. Enable VIMAGE in GENERIC kernels and some others (where GENERIC does not exist) on HEAD. Disable building LINT-VIMAGE with VIMAGE being default. This should give it a lot more exposure in the run-up to 12 to help us evaluate whether to keep it on by default or not. We are also hoping to get better performance testing. The feature can be disabled using nooptions. Requested by: many Reviewed by: kristof, emaste, hiren X-MFC after: never Relnotes: yes Differential Revision: https://reviews.freebsd.org/D12639	2017-10-20 21:40:59 +00:00
markj	c87fb69add	Move kernel dump offset tracking into MI code. All of the kernel dump implementations keep track of the current offset ("dumplo") within the dump device. However, except for textdumps, they all write the dump sequentially, so we can reduce code duplication by having the MI code keep track of the current offset. The new dump_append() API can be used to write at the current offset. This is needed to implement support for kernel dump compression in the MI kernel dump code. Also simplify dump_encrypted_write() somewhat: use dump_write() instead of duplicating its bounds checks, and get rid of the redundant offset tracking. Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11722	2017-10-18 15:38:05 +00:00
kib	8844e99855	Change i386_get_ldt() to return 'EOF' when the requested range of descriptors does not fit into currently allocated LDT, or trim the return if the range fits partially. Before, the function returned EINVAL. Fix two bugs in r324366: use capped num counter for malloc size, and do not leak allocated buffer on EINVAL (by handling EINVAL case as normal, see above). Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-09 16:19:26 +00:00
kib	3851626613	Improvements to set_user_ldt(). Remove mtx_owned() checks from set_user_ldt(). Split the function into _locked() version which requires the dt_lock spinlock owned, and make set_user_ldt() a wrapper. Add a comment in swtch.s noting that the call to the new set_user_ldt() cannot recurse on dt_lock. Remove #ifdef SMP block, the addend is always zero on UP. Fix type of set_user_ldt_rv(), making it match the type used for smb_rendezvous() callback, and remove the cast. Use curproc. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-09 16:07:27 +00:00
kib	b941255a34	Reset the fs and gs bases on exec(2). The values from the old address space do not make sense for the new program. In particular, gsbase might be the TLS base for the old program but the new program has no TLS now. amd64 already handles this correctly. Reported and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-09 15:39:43 +00:00
kib	b9de6b3f87	More style. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-09 15:24:18 +00:00
kib	52a8a4f490	Improve i386_get_ldt(). Provide consistent snapshot of the requested descriptors by preventing other threads from modifying LDT while we fetch the data, lock dt_lock around the read. Copy the data into intermediate buffer, which is copied out after the lock is dropped. Comparing with the amd64 version, the read is done byte by byte, since there is no atomic 64bit read (cmpxchg8b method is too heavy comparing with the avoided issues). Improve overflow checking for the descriptors range calculations and remove unneeded casts. Use unsigned types for sizes. Allow zero num argument to i386_get_ldt() and i386_set_ldt(). This case is handled naturally by the code flow. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-06 14:29:53 +00:00
kib	e300c500de	Remove unneeded cast. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-06 10:17:50 +00:00
kib	e0c7ec41ee	Style. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-06 10:16:57 +00:00
kib	c2dbc3a78e	Use ANSI C declarations. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 19:11:25 +00:00
kib	a81dddf1d3	Correct format specifiers in the debug code. Style. Requested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 18:58:28 +00:00
kib	b85dee047c	Style. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-05 18:42:13 +00:00
kib	50cb59e230	A different fix for the issue from r323722. Split the handlers for pop of invalid selectors from the trap frame into usermode and kernel variants. Usermode handler is kept as is, it restores the already loaded parts of the trap frame and jumps to set up a signal delivery to the user process. New kernel part of the handler emulates IRET treatment of the segments which would violate access right. It loads NUL selector in the segment register which load causes the fault, and then continues the return to interrupted kernel code. Since invalid selectors in the segment registers in the kernel mode can only exist while kernel still enters or exits from userspace, we only zero invalid userspace selectors. If userspace tries to use the segment register, it gets a signal, as if the processor segment descriptor cache was reloaded. Reported by: Maxime Villard <max@m00nbsd.net> Suggested and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-28 09:01:28 +00:00
kib	071c00f495	Restore a part of r323722. Do not return from interrupt using the POP_FRAME;iret instruction sequence, always jump to doreti. The user segments selectors saved on the stack might become invalid because userspace manipulated LDT in a parallel thread. trap() is aware of such issue, but it is only prepared to handle it at iret and segment registers load operations in doreti path. Also remove POP_FRAME macro because it is no longer used. Reviewed by: bde, jhb (as part of r323722) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-28 08:46:15 +00:00
kib	81faa225ff	Revert r323722. A better fix will be committed shortly, as well as some still useful bits of the reverted revision. The problem with the committed fix is that there are still issues with returning from NMI, when NMI interrupted kernel in a moment where the kernel segments selectors were still not loaded into registers. If this happens, the NMI return would loose the userspace selectors because r323722 does not reload segment registers on return to kernel mode. Fixing the problem is complicated. Since an alternative approach to handle the original bug exists, it makes sence to stop adding more complexity. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-09-28 08:38:24 +00:00
jpaetzel	b35131985b	Fix indentation for r323068 PR: 220170 Reported by: lidl MFC after: 3 days Pointyhat to: jpaetzel	2017-09-19 20:40:05 +00:00
kib	466bfe2553	Do not do torn writes to active LDTs. Care must be taken when updating the active LDT, since parallel threads might try to load a segment descriptor which is currently updated. Since the results are undefined, this cannot be ignored by claiming to be an application race. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12413	2017-09-19 17:57:04 +00:00
kib	981650a056	Fix handling of the segment registers on i386. Suppose that userspace is executing with the non-standard segment descriptors. Then, until exception or interrupt handler executed SET_KERNEL_SEGS, kernel is still executing with user %ds, %es and %fs. If an interrupt occurs in this window, the interrupt handler is executed unsafely, relying on usability of the usermode registers. If the interrupt results in the context switch on return, the contamination of the kernel state spreads to the thread we switched to. As result, kernel data accesses might fault or, if only the base is changed, completely messed up. More, if the user segment was allocated in LDT, another thread might mark the descriptor as invalid before doreti code tried to reload them. In this case kernel panics. The issue exists for all exception entry points which use trap gate, and thus do not automatically disable interrupts on entry, and for lcall_handler. Fix is two-fold: first, we need to disable interrupts for all kernel entries, changing the IDT descriptor types from trap gate to interrupt gate. Interrupts are re-enabled not earlier than the kernel segments are loaded into the segment registers. Second, we only load the segment registers from the trap frame when returning to usermode. For the later, all interrupt return paths must happen through the doreti common code. There is no way to disable interrupts on call gate, which is the supposed mode of servicing for lcall $7,$0 syscalls. Change the LDT descriptor 0 into a code segment type and point it to the userspace trampoline which redirects the syscall to int $0x80. All the measures make the segment register handling similar to that of amd64. We do not apply amd64 optimizations of not reloading segment registers on return from the syscall. Reported by: Maxime Villard <max@m00nbsd.net> Tested by: pho (the non-lcall part) Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12402	2017-09-18 20:22:42 +00:00
jpaetzel	bfd734f77c	Revert r323087 This needs more thinking out and consensus, and the commit message was wrong AND there was a typo in the commit. pointyhat: jpaetzel	2017-09-01 17:03:48 +00:00
jpaetzel	612bb8539d	Take options IPSEC out of GENERIC PR: 220170 Submitted by: delphij Reviewed by: ae, glebius MFC after: 2 weeks Differential Revision: D11806	2017-09-01 15:54:53 +00:00
jpaetzel	f7739d7e09	Allow kldload tcpmd5 PR: 220170 MFC after: 2 weeks	2017-08-31 20:16:28 +00:00
mav	5849e8f575	Add NTB driver for PLX/Avago/Broadcom PCIe switches. This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though the second with predictable complications on hot-plug and reboot events). I tested it with PEX 8717 and PEX 8733 chips, but expect it should work with many other compatible ones too. It supports up to two NT bridges per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows, 6 or 12 scratchpad registers and 16 doorbells. There are also 4 DMA engines in those chips, but they are not yet supported. While there, rename Intel NTB driver from generic ntb_hw(4) to more specific ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and alike to Linux naming. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2017-08-30 21:16:32 +00:00
cem	6659d8ab68	Drop CACHE_LINE_SIZE to 64 bytes on x86 The actual cache line size has always been 64 bytes. The 128 number arose as an optimization for Core 2 era Intel processors. By default (configurable in BIOS), these CPUs would prefetch adjacent cache lines unintelligently. Newer CPUs prefetch more intelligently. The latest Core 2 era CPU was introduced in September 2008 (Xeon 7400 series, "Dunnington"). If you are still using one of these CPUs, especially in a multi-socket configuration, consider locating the "adjacent cache line prefetch" option in BIOS and disabling it. Reported by: mjg Reviewed by: np Discussed with: jhb Sponsored by: Dell EMC Isilon	2017-08-28 22:28:41 +00:00
kib	36c08d0250	MFamd64 r322720, r322723: Simplify i386 trap(). - Use more relevant name 'signo' instead of 'i' for the local variable which contains a signal number to send for the current exception. - Eliminate two labels 'userout' and 'out' which point to the very end of the trap() function. Instead use return directly. - Re-indent the prot_fault_translation block by reducing if() nesting. - Some more monor style changes. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-26 18:12:25 +00:00
kib	27dbd32f45	Remove unused code. The machdep.uprintf_signal sysctl replaced it in more convenient way, not requiring recompilation to use and providing more information on fault. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-26 18:09:27 +00:00
kib	f6d447048d	MFamd64 r322718: Use ANSI C declaration for trap_pfault(). Style. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-26 18:06:29 +00:00
kib	0a239887e3	MFamd64 r322719: Trim excessive 'extern'. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-26 18:04:29 +00:00
kib	03221989ce	Use the known valid segment when accessing memory in #UD handler. Make sure that %eflags.D flag is cleared for hook. Improve comments. When #UD dtrace code checks for a registered hook before checking that the exception was raised from kernel mode, we might run with the user %ds, trapping on access. Exception entry from userspace automatically load valid %ss, which we can use there instead. Noted and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-08-19 21:00:02 +00:00
kib	6dd0b9e76c	When checking that #UD comes from kernel mode, check that the exception did not happen in vm86 mode. A vm86 userland process could have a %cs that matches GSEL_KPL, while dtrace cannot hook it. Submitted by: Maxime Villard <max@m00nbsd.net> MFC after: 3 days	2017-08-18 17:11:15 +00:00
markj	ce8e2801bf	Rename mkdumpheader() and group EKCD functions in kern_shutdown.c. This helps simplify the code in kern_shutdown.c and reduces the number of globally visible functions. No functional change intended. Reviewed by: cem, def Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11603	2017-08-18 04:04:09 +00:00
markj	f6dd3eb223	Factor out duplicated kernel dump code into dump_{start,finish}(). dump_start() and dump_finish() are responsible for writing kernel dump headers, optionally writing the key when encryption is enabled, and initializing the initial offset into the dump device. Also remove the unused dump_pad(), and make some functions static now that they're only called from kern_shutdown.c. No functional change intended. Reviewed by: cem, def Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11584	2017-08-18 03:52:35 +00:00
cem	5a79b729aa	x86: Add dynamic interrupt rebalancing Add an option to dynamically rebalance interrupts across cores (hw.intrbalance); off by default. The goal is to minimize preemption. By placing interrupt sources on distinct CPUs, ithreads get preferentially scheduled on distinct CPUs. Overall preemption is reduced and latency is reduced. In our workflow it reduced "fighting" between two high-frequency interrupt sources. Reduced latency was proven by, e.g., SPEC2008. Submitted by: jeff@ (earlier version) Reviewed by: kib@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10435	2017-08-16 18:48:53 +00:00
jkim	d871b34dbc	Split identify_cpu() into two functions for amd64 as we do for i386. This reduces diff between amd64 and i386. Also, it fixes a regression introduced in r322076, i.e., identify_hypervisor() failed to identify some hypervisors. This function assumes cpu_feature2 is already initialized. Reported by: dexuan Tested by: dexuan	2017-08-09 18:09:09 +00:00
jkim	0ac013123c	Detect hypervisors early. We used to set lower hz on hypervisors by default but it was broken since r273800 (and r278522, its MFC to stable/10) because identify_cpu() is called too late, i.e., after init_param1(). MFC after: 3 days	2017-08-05 06:56:46 +00:00
cem	d4a7946bbd	x86: Tag some intrinsics with __pure2 Some C wrappers for x86 instructions do not touch global memory and only act on their arguments; they can be marked __pure2, aka __const__. Without this annotation, Clang 3.9.1 is not intelligent enough on its own to grok that these functions are __const__. Submitted by: Anton Rang <anton.rang AT isilon.com> Sponsored by: Dell EMC Isilon	2017-08-03 22:28:30 +00:00
kib	38f0d940ba	Do not call trapsignal() after handling usermode fault or interrupt, when a signal is not intended to be sent. The variable holding the signal number to send is left uninitialized, which sometimes triggers invalid signal checks. For NMI, a return to usermode without ast processing is done. On the other hand, for spurious dtrace probe interrupt it is usermode which triggered the interrupt, so handle it through userret() as any other fault. Reported by: Nils Beyer <nbe@renzel.net> PR: 221151 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-02 10:12:10 +00:00
markj	b23f6b9b03	Batch updates to v_wire_count when freeing page table pages on x86. The removed release stores are not needed since stores are totally ordered on i386 and amd64. Reviewed by: alc, kib (previous revision) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11790	2017-08-01 05:26:30 +00:00
kib	030abb1986	Remove unused symbols. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-30 21:52:22 +00:00
dchagin	93ea6317f4	Avoid using [LINUX_]SHAREDPAGE constant directly in the vdso code. This is needed for https://reviews.freebsd.org/D11780. Reported by: kib@	2017-07-30 21:24:20 +00:00
kib	e83fbfb104	Fix handling of one more possible exception on return to usermode. If %ss is loaded with a segment pointing to a non-present descriptor by the IRETD instruction, a kernel-mode #SS exception is generated. Resulting T_STKFLT trap must be checked against doreti_iret_fault location and handled, otherwise userspace may panic the kernel. Note that this is i386 variant of FreeBSD-SA-15:21.amd64, but unlike amd64, there is no swapgs on i386 and the issue is arguably not exploitable. Reported by: Maxime Villard <max@m00nbsd.net> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-07-08 11:07:39 +00:00
sbruno	12de4d18c8	Garbage collect kernel option TWA_FLASH_FIRMWARE Submitted by: kevin.bowling0kev009.com Differential Revision: https://reviews.freebsd.org/D11387	2017-07-03 19:33:50 +00:00
dchagin	a0f4bac287	Add support for musl consumers to the Linuxulator. PR: 213809 Submitted by: Yonas Yanfa Reported by: Yonas Yanfa MFC after: 1 week Relnotes: yes	2017-07-03 10:24:49 +00:00
alc	bf2a2bba1b	When "force" is specified to pmap_invalidate_cache_range(), the given start address is not required to be page aligned. However, the loop within pmap_invalidate_cache_range() that performs the actual cache line invalidations requires that the starting address be truncated to a multiple of the cache line size. This change corrects an error in that truncation. Submitted by: Brett Gutstein <bgutstein@rice.edu> Reviewed by: kib MFC after: 1 week	2017-07-01 16:42:09 +00:00
jah	d1caaa9300	Clean up MD pollution of bus_dma.h: --Remove special-case handling of sparc64 bus_dmamap* functions. Replace with a more generic mechanism that allows MD busdma implementations to generate inline mapping functions by defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This is currently useful for sparc64, x86, and arm64, which all implement non-load dmamap operations as simple wrappers around map objects which may be bus- or device-specific. --Remove NULL-checked bus_dmamap macros. Implement the equivalent NULL checks in the inlined x86 implementation. For non-x86 platforms, these checks are a minor pessimization as those platforms do not currently allow NULL maps. NULL maps were originally allowed on arm64, which appears to have been the motivation behind adding arm[64]-specific barriers to bus_dma.h, but that support was removed in r299463. --Simplify the internal interface used by the bus_dmamap_load* variants and move it to bus_dma_internal.h --Fix some drivers that directly include sys/bus_dma.h despite the recommendations of bus_dma(9) Reviewed by: kib (previous revision), marius Differential Revision: https://reviews.freebsd.org/D10729	2017-07-01 05:35:29 +00:00
kib	c25bc401a9	Fix indent. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2017-06-24 10:19:06 +00:00
kib	01fe33c656	Correct translations between abridged and full x87 tags. Reported and tested by: karnajit wangkhem <karnajitw@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-06-17 11:25:31 +00:00
kib	e2a14c603f	Move struct syscall_args syscall arguments parameters container into struct thread. For all architectures, the syscall trap handlers have to allocate the structure on the stack. The structure takes 88 bytes on 64bit arches which is not negligible. Also, it cannot be easily found by other code, which e.g. caused duplication of some members of the structure to struct thread already. The change removes td_dbg_sc_code and td_dbg_sc_nargs which were directly copied from syscall_args. The structure is put into the copied on fork part of the struct thread to make the syscall arguments information correct in the child after fork. This move will also allow several more uses shortly. Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080	2017-06-12 21:03:23 +00:00
kib	7b6fe97487	Make struct syscall_args visible to userspace compilation environment from machine/proc.h, consistently on all architectures. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks X-Differential revision: https://reviews.freebsd.org/D11080	2017-06-12 20:53:44 +00:00
jhb	5387dbf595	Remove the BSD/OS 2.1 system call gate LDT entry. An extra copy of the system call gate was added to the default LDT back in 1996 (r18513 / r18514). However, the ability to run BSD/OS 2.1 i386 binaries under FreeBSD's native ABI is most likely no longer needed. Discussed with: kib	2017-05-23 22:34:18 +00:00
emaste	1901c3e1f2	Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193	2017-05-17 00:34:34 +00:00
jkim	9445e9d6e1	Use kmem_malloc() instead of malloc(9) for the native amd64 filter. r316767 broke the BPF JIT compiler for amd64 because malloc()'d space is no longer executable. Discussed with: kib, alc	2017-04-17 22:02:09 +00:00
jkim	22babe5387	Move declarations for a machine-dependent function to the header file.	2017-04-17 21:51:26 +00:00
jkim	98569e5163	Reduce diff with amd64 version.	2017-04-17 21:46:54 +00:00
glebius	21ead51d79	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156	2017-04-17 17:34:47 +00:00
glebius	0266d4259e	Remove unused assembly symbols pointing to vmmeter.	2017-04-17 17:18:07 +00:00
pkelsey	33064e92a2	Corrected misspelled versions of rendezvous. The MFC will include a compat definition of smp_no_rendevous_barrier() that calls smp_no_rendezvous_barrier(). Reviewed by: gnn, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D10313	2017-04-09 02:00:03 +00:00
avatar	93f11e526d	Trying to be more compatible with Linux if.h definitions: - renaming l_ifreq::ifru_metric to l_ifreq::ifru_ivalue; - adding a definition for ifr_ifindex which points to l_ifreq::ifru_ivalue. A quick search indicates that Linux already got the above changes since 2.1.14. Reviewed by: kib, marcel, dchagin MFC after: 1 week	2017-04-08 14:41:39 +00:00
markj	d294fb49c0	Adjust the constraint for "src" in atomic_(f)cmpset_8. "r" is not sufficient to prevent the use of invalid byte-width registers with at least gcc. Reported and reviewed by: bde X-MFC-With: r315718	2017-03-27 16:18:19 +00:00
avg	7a52acd8b3	revert r315959 because it causes build problems The change introduced a dependency between genassym.c and header files generated from .m files, but that dependency is not specified in the make files. Also, the change could be not as useful as I thought it was. Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others	2017-03-27 12:34:29 +00:00
bde	254458ab34	Fix printing of negative offsets (typically from frame pointers) again. I fixed this in 1997, but the fix was over-engineered and fragile and was broken in 2003 if not before. i386 parameters were copied to 8 other arches verbatim, mostly after they stopped working on i386, and mostly without the large comment saying how the values were chosen on i386. powerpc has a non-verbatim copy which just changes the uncritical parameter and seems to add a sign extension bug to it. Just treat negative offsets as offsets if they are no more negative than -db_offset_max (default -64K), and remove all the broken parameters. -64K is not very negative, but it is enough for frame and stack pointer offsets since kernel stacks are small. The over-engineering was mainly to go more negative than -64K for the negative offset format, without affecting printing for more than a single address. Addresses in the top 64K of a (full 32-bit or 64-bit) address space are now printed less well, but there aren't many interesting ones. For arches that have many interesting ones very near the top (e.g., 68k has interrupt vectors there), there would be no good limit for the negative offset format and -64K is a good as anything.	2017-03-26 18:46:35 +00:00
avg	04ec8ce247	specific end of interrupt implementation for AMD Local APIC The change is more intrusive than I would like because the feature requires that a vector number is written to a special register. Thus, now the vector number has to be provided to lapic_eoi(). It was readily available in the IO-APIC and MSI cases, but the IPI handlers required more work. Also, we now store the VMM IPI number in a global variable, so that it is available to the justreturn handler for the same reason. Reviewed by: kib MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9880	2017-03-25 18:45:09 +00:00
dchagin	7e4bbabbee	Implement Linux mincore() system call. This is necessary for the upcoming drm-next. Suggested by: hselasky@ MFC after: 1 month	2017-03-25 15:47:29 +00:00
bde	27ac811b7a	Remove buggy adjustment of page tables in db_write_bytes(). Long ago, perhaps only on i386, kernel text was mapped read-only and it was necessary to change the mapping to read-write to set breakpoints in kernel text. Other writes by ddb to kernel text were also allowed. This write protection is harder to implement with 4MB pages, and was lost even for 4K pages when 4MB pages were implemented. So changing the mapping became useless. It was actually worse than useless since it followed followed various null and otherwise garbage pointers to not change random memory instead of the mapping. (On i386s, the pointers became good in pmap_bootstrap(), and on amd64 the pointers became bad in pmap_bootstrap() if not before.) Another bug broke detection of following of null pointers on i386, except early in boot where not detecting this was a feature. When I fixed the bug, I accidentally broke the feature and soon got traps in db_write_bytes(). Setting breakpoints early in ddb was broken. kib pointed out that a clean way to do the adjustment would be to use a special [sub]map giving a small window on the bytes to be written. The trap handler didn't know how to fix up errors for pagefaults accessing the map itself. Such errors rarely need fixups, since most traps for the map are for the first access which is a read. Reviewed by: kib	2017-03-24 17:34:55 +00:00
glebius	d44335ea0c	Remove Solaris 2.6 syscalls selector. Discussed with: kib	2017-03-23 19:54:41 +00:00
ed	63254ceea6	Stop providing the compat_3_brand. As of r315860, the ELF image activator works fine for CloudABI without it. Reviewed by: kib MFC after: 2 weeks	2017-03-23 14:12:21 +00:00
kib	da63ef1f60	Update r315753 with the proper flag name. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:28:13 +00:00
kib	a22b5a3135	Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only matches static binaries. Interpretation of the 'static' there is that the binary must not specify an interpreter. In particular, shared objects are matched by the brand if BI_CAN_EXEC_DYN is also set. This improves precision of the brand matching, which should eliminate surprises due to brand ordering. Revert r315701. Discussed with and tested by: ed (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-03-22 22:23:01 +00:00
markj	c4e0ff355e	Add support for 8- and 16-bit atomic_(f)cmpset to x86. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D10068	2017-03-22 17:29:04 +00:00
ed	fc95dfd2e5	Set the interpreter path to /nonexistent. CloudABI executables are statically linked and don't have an interpreter. Setting the interpreter path to NULL used to work previously, but r314851 introduced code that checks the string unconditionally. Running CloudABI executables now causes a null pointer dereference. Looking at the rest of imgact_elf.c, it seems various other codepaths already leaned on the fact that the interpreter path is set. Let's just go ahead and pick an obviously incorrect interpreter path to appease imgact_elf.c. MFC after: 1 week	2017-03-22 07:05:27 +00:00
dchagin	69ea87350f	Implement getrandom() syscall. Note. GRND_RANDOM option is not supported for now. MFC after: 1 month	2017-03-18 18:34:29 +00:00
dchagin	82392d7947	To reduce code duplication move socket defines to the MI path. MFC after: 1 week	2017-03-18 18:23:30 +00:00
bde	02d7df1d66	Don't access the reserved registers %dr4 and %dr5 on i386. On the original i386, %dr[4-5] were unimplemented but not very clearly reserved, so debuggers read them to print them. i386 was still doing this. On the original athlon64, %dr[4-5] are documented as reserved but are aliased to %dr[6-7] unless CR4_DE is set, when accessing them traps. On 2 of my systems, accessing %dr[4-5] trapped sometimes. On my Haswell system, the apparent randomness was because the boot CPU starts with CR4_DE set while all other CPUs start with CR4_DE clear. FreeBSD doesn't support the data breakpoints enabled by CR4_DE and it never changes this flag, so the flag remains different across CPUs and the behaviour seemed inconsistent except while booting when the CPU doesn't change. The invalid accesses broke: - read access for printing the registers in ddb "show watches" on CPUs with CR4_DE set - read accesses in fill_dbregs() on CPUs with CR4_DE set. This didn't implement panic(3) since the user case always skipped %dr[4-5]. - write accesses in set_dbregs(). This also didn't affect userland. When it didn't trap, the aliasing made it fragile. Don't print the dummy (zero) values of %dr[4-5] in "show watches" for i386 or amd64. Fix style bugs near this printing. amd64 also has space in the dbregs struct for the reserved %dr[8-15] and already didn't print the dummy values for these, and never accessed any of the 10 reserved debug registers. Remove cpufuncs for making the invalid accesses. Even amd64 had these.	2017-03-17 13:49:05 +00:00
manu	044a23ec3c	Remove i915drm and radeondrm from NOTES and conf. This unbreak LINT kernel. Reported by: lwhsu	2017-03-12 00:52:16 +00:00
dchagin	d7b4f21065	Reduce code duplication between MD Linux code by moving SYSV IPC 64-bit related struct definitions out into the MI path. Invert the native ipc structs to the Linux ipc structs convesion logic. Since 64-bit variant of ipc structs has more precision convert native ipc structs to the 64-bit Linux ipc structs and then truncate 64-bit values into the non 64-bit if needed. Unlike Linux, return EOVERFLOW if the values do not fit. Fix SYSV IPC for 64-bit Linuxulator which never sets IPC_64 bit. MFC after: 1 month	2017-03-07 17:07:16 +00:00
mmokhi	d2f0c04c1f	Regenerated Linuxulator syscall tables for r314782 Approved by: dchagin MFC after: 1 month	2017-03-06 18:20:37 +00:00
mmokhi	4c583e7b8e	Add UNIMPLEMENTED() placeholder macro for the syscalls that are not implemented in Linux kernel itself. Cleanup DUMMY() macros. Reviewed by: dchagin, trasz Approved by: dchagin MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D9804	2017-03-06 18:11:38 +00:00
pfg	82bdb13eb8	Revert r314669, r314670: Bring back the i486 option in GENERIC by default. The code related to i386 CPU variants configuration has received many changes in the last years: most of the features are detected automatically, so there are no performance penalties from keeping the 486 support enabled. Re-instate the 486 support: while the general configuration could still be cleaned a bit, there is no advantage in removing it. Differential Revision: https://reviews.freebsd.org/D9879	2017-03-06 03:52:15 +00:00
pfg	e0c54413a9	Drop i486 from the default i386 GENERIC kernel configuration. 80486 production was stopped by Intel on September 2007. Dropping the 486 configuration option from the GENERIC kernel improves performance slightly. Removing I486_CPU is consistent at this time: we don't support any processor without a FPU and the PC-98 arch, which frequently involved i486 CPUs, is also gone so we don't test such platforms anymore. Relnotes: yes MFC after: 2 weeks https://reviews.freebsd.org/D9879	2017-03-04 15:04:17 +00:00
imp	7e6cabd06e	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
kib	06450fe6c4	Initialize pcb_save for thread0. Otherwise kernel traps on NULL dereference if fpu_kern(9) is used from the thread0 context. Reported by: cem Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-28 22:54:52 +00:00
glebius	745bcd6fba	Remove SVR4 (System V Release 4) binary compatibility support. UNIX System V Release 4 is operating system released in 1988. It ceased to exist in early 2000-s.	2017-02-28 05:14:42 +00:00
dchagin	948e5bcce0	Regen for r314312 (Linux epoll_pwait). MFC after: 1 month	2017-02-26 19:59:28 +00:00
dchagin	8a318fc47b	Change Linux epoll_pwait syscall definition to match Linux actual one. MFC after: 1 month	2017-02-26 19:57:18 +00:00
alc	c062fef7f7	Refine the fix from r312954. Specifically, add a new PDE-only flag, PG_PROMOTED, that indicates whether lingering 4KB page mappings might need to be flushed on a PDE change that restricts or destroys a 2MB page mapping. This flag allows the pmap to avoid range invalidations that are both unnecessary and costly. Reviewed by: kib, markj MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9665	2017-02-26 19:54:02 +00:00
dchagin	396e17522f	Implement timerfd family syscalls. MFC after: 1 month	2017-02-26 09:48:18 +00:00
dchagin	d9e839a5db	Regen after r314291 (timerfd definition). MFC after: 1 month	2017-02-26 09:37:25 +00:00
dchagin	d7d0ab04b1	Change Linuxulator timerfd syscalls definition to match actual Linux one. MFC after: 1 month	2017-02-26 09:35:44 +00:00
trasz	aaee258648	Fix linux_fstatfs() to return proper value for f_frsize. Without it, linux df(1) binary from Xenial shows garbage. Reviewed by: dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9692	2017-02-25 20:32:37 +00:00
mmokhi	bf114a4c4a	Add linux_preadv() and linux_pwritev() syscalls to Linuxulator. Reviewed by: dchagin Approved by: dchagin, trasz (src committers) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9722	2017-02-24 20:04:02 +00:00
dchagin	8f19e32110	Revert r314217. Commit is not match that I have approved.	2017-02-24 19:47:27 +00:00
mmokhi	0de5ce3724	Add linux_preadv() and linux_pwritev() syscalls to Linuxulator. Reviewed by: dchagin Approved by: dchagin, trasz (src committers) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9722	2017-02-24 19:22:17 +00:00
dchagin	ca1a1a9d60	Implement rt_tgsigqueueinfo system call used by glibc for pthread_sigqueue(3). MFC after: 2 week	2017-02-19 07:38:11 +00:00
kib	6ea6500b7e	MFamd64 r313933: microoptimize pmap_protect_pde(). Noted by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-19 06:14:33 +00:00
kib	5c280e36bd	Merge i386 and amd64 mtrr drivers. Reviewed by: royger, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9648	2017-02-17 21:08:32 +00:00
royger	bf0e949003	x86: fix MTRR initialization if EARLY_AP_STARTUP is used MTRR handlers are set in {amd64/i686}_mem_drvinit, which is called at SI_SUB_DRIVERS, and that's too late when EARLY_AP_STARTUP is set because APs have already started at this point. {amd64/i686}_mrinit is also called too late for the BSP, since that happens when the memory device is attached, also after APs have already started. Move the position to SI_SUB_CPU, and also initialize the state for the BSP, so that the APs can correctly get to the same state as the BSP. Sponsored by: Citrix Systems R&D MFC after: 1 week Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D9630	2017-02-17 12:47:51 +00:00
imp	36fafdbb83	Remove EISA bus support for add-in cards. Remove related kernel and compile options. Remove doxygen pointers to now deleted files. Remove EISA and VME as examples in bus_space.9. Retained EISA mode code for IO PIC and MPTABLES because that's not EISA bus, per se, and some people have abused EISA to mean "EISA-like behavior as opposed to ISA" rather than using it for EISA add-in cards. Relnotes: yes	2017-02-16 21:57:35 +00:00
imp	5e19920be6	Remove Micro Channel Architecture support. Of the commonly available machines, only a few 486 machines that used it, and those haven't had enough memory to run FreeBSD for quite some time (often limited to 16MB). Not to be confused with the Machine Check Architecture, which is still very much alive and used (and untouched by this commit). No Objection From: arch@	2017-02-15 23:04:25 +00:00
jhb	e80fc50712	Regenerate all the system call tables to drop "created from" lines. One of the ibcs2 files contains some actual changes (new headers) as it hasn't been regenerated after older changes to makesyscalls.sh.	2017-02-10 19:45:02 +00:00
dchagin	ef67426db9	Regen after r313284. MFC after: 2 week	2017-02-05 14:19:19 +00:00
dchagin	1ba976c2fa	Update syscall.master to 4.10-rc6. Also fix comments, a typo, and wrong numbering for a few unimplemented syscalls. For 32-bit Linuxulator, socketcall() syscall was historically the entry point for the sockets API. Starting in Linux 4.3, direct syscalls are provided for the sockets API. Enable it. The initial version of patch was provided by trasz@ and extended by me. Submitted by: trasz MFC after: 2 week Differential Revision: https://reviews.freebsd.org/D9381	2017-02-05 14:17:09 +00:00
kib	5cb41cd56e	For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE and device npx. This means that FPU is always initialized and handled when available, and SSE+ register file and exception are handled when available. This makes the kernel FPU code much easier to maintain by the cost of slight bloat for CPUs older than 25 years. CPU_DISABLE_CMPXCHG outlived its usefulness, see the removed comment explaining the original purpose. Suggested by and discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2017-02-03 12:51:40 +00:00
kib	38af3da4f6	Use ANSI definitions for some i386 functions. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-02 22:02:10 +00:00
mjg	7055568df5	i386: fixup fcmpset An incorrect output specifier was used which worked with clang by accident, but breaks with the in-tree gcc version. While here plug a whitespace nit. Reported by: bde	2017-02-02 01:33:08 +00:00
trasz	69861f6d1f	Replace sys_ftruncate() with kern_ftruncate() in various compats. Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9368	2017-01-30 11:50:54 +00:00
mjg	efb45bc626	i386: add atomic_fcmpset Tested by: pho	2017-01-30 02:24:54 +00:00
kib	82ecdbf179	Do not leave stale 4K TLB entries on pde (superpage) removal or protection change. On superpage promotion, x86 pmaps do not invalidate existing 4K entries for the superpage range, because they are compatible with the promoted 2/4M entry. But the invalidation on superpage removal or protection change only did single INVLPG with the base address of the superpage. This reliably flushed superpage TLB entry, and 4K entry for the first page of the superpage, potentially leaving other 4K TLB entries lingering. Do the invalidation of the whole superpage range to correct the problem. Note that the precise invalidation is done by x86 code for kernel_pmap only, for user pmaps whole (per-AS) TLB is flushed. This made the bug well hidden, because promotions of the kernel mappings require specific load. Reported and tested by: Jonathan Looney <jtl@netflix.com> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-29 19:14:48 +00:00
jah	6a2ed46209	Implement get_pcpu() for i386 and use it to replace pcpu_find(curcpu) in the i386 pmap. The curcpu macro loads the per-cpu data pointer as its first step, so the remaining steps of pcpu_find(curcpu) are circular. get_pcpu() is already implemented for arm, arm64, and risc-v. My plan is to implement it for the remaining architectures and use it to replace several instances of pcpu_find(curcpu) in MI code. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D9370	2017-01-29 16:54:55 +00:00
nyan	edf372a226	Garbage collect the FPU_ERROR_BROKEN option. It is for pc98 only.	2017-01-28 03:53:53 +00:00
nyan	259480b6de	Remove pc98 support completely. I thank all developers and contributors for pc98. Relnotes: yes	2017-01-28 02:22:15 +00:00
kib	d52cc80daf	Use SFENCE for ordering CLFLUSHOPT. SDM states that CLFLUSHOPT instructions can be ordered with other writes by SFENCE, heavier MFENCE is not required. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-01-20 19:08:44 +00:00
ed	be72efbdd4	Catch up with changes to structure member names. Pointer/length pairs are now always named ${name} and ${name}_len.	2017-01-17 22:05:52 +00:00
jah	07791773f9	Add comment explaining relative order of sched_unpin() and mtx_unlock(). Suggested by: alc MFC after: 1 week	2017-01-14 19:35:36 +00:00
jah	63a408be16	For i386 temporary mappings, unpin the thread before releasing the cmap lock. Releasing the lock first may result in the thread being immediately rescheduled and bound to the same CPU, only to unpin itself upon resuming execution. Noted by: skra (in review for armv6 equivalent) MFC after: 1 week	2017-01-14 09:56:01 +00:00
markj	d8a11d2a36	Coalesce TLB shootdowns of global PTEs in pmap_advise() on x86. We would previously invalidate such entries individually, resulting in more IPIs than necessary. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D9094	2017-01-10 21:52:48 +00:00
sbruno	efab05d612	Migrate e1000 to the IFLIB framework: - em(4) igb(4) and lem(4) - deprecate the igb device from kernel configurations - create a symbolic link in /boot/kernel from if_em.ko to if_igb.ko Devices tested: - 82574L - I218-LM - 82546GB - 82579LM - I350 - I217 Please report problems to freebsd-net@freebsd.org Partial review from jhb and suggestions on how to not brick folks who originally would have lost their igbX device. Submitted by: mmacy@nextbsd.org MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks and Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8299	2017-01-10 03:23:22 +00:00
kib	5def9fa2c2	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-05 17:19:26 +00:00
jah	173cd7bef3	Move the objects used to create temporary mappings for i386 pmap zero and copy operations to the MD PCPU region. Change sysmap initialization to only allocate KVA pages for CPUs that are actually present. As a minor optimization, this also prevents false sharing between adjacent sysmap objects since the pcpu struct is already cacheline-aligned. While here, move pc_qmap_addr initialization for the BSP into pmap_bootstrap(), which allows use of pmap_quick* functions during early boot. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8833	2016-12-23 15:14:56 +00:00
jhb	203d8b88f7	Enable EARLY_AP_STARTUP on amd64 and i386 kernels by default. PR: 199321, 203682 MFC after: 2 months Sponsored by: Netflix	2016-12-16 21:10:37 +00:00
def	f63c437216	Add support for encrypted kernel crash dumps. Changes include modifications in kernel crash dump routines, dumpon(8) and savecore(8). A new tool called decryptcore(8) was added. A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump configuration in the diocskerneldump_arg structure to the kernel. The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for backward ABI compatibility. dumpon(8) generates an one-time random symmetric key and encrypts it using an RSA public key in capability mode. Currently only AES-256-CBC is supported but EKCD was designed to implement support for other algorithms in the future. The public key is chosen using the -k flag. The dumpon rc(8) script can do this automatically during startup using the dumppubkey rc.conf(5) variable. Once the keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O control. When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random IV and sets up the key schedule for the specified algorithm. Each time the kernel tries to write a crash dump to the dump device, the IV is replaced by a SHA-256 hash of the previous value. This is intended to make a possible differential cryptanalysis harder since it is possible to write multiple crash dumps without reboot by repeating the following commands: # sysctl debug.kdb.enter=1 db> call doadump(0) db> continue # savecore A kernel dump key consists of an algorithm identifier, an IV and an encrypted symmetric key. The kernel dump key size is included in a kernel dump header. The size is an unsigned 32-bit integer and it is aligned to a block size. The header structure has 512 bytes to match the block size so it was required to make a panic string 4 bytes shorter to add a new field to the header structure. If the kernel dump key size in the header is nonzero it is assumed that the kernel dump key is placed after the first header on the dump device and the core dump is encrypted. Separate functions were implemented to write the kernel dump header and the kernel dump key as they need to be unencrypted. The dump_write function encrypts data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps are not supported due to the way they are constructed which makes it impossible to use the CBC mode for encryption. It should be also noted that textdumps don't contain sensitive data by design as a user decides what information should be dumped. savecore(8) writes the kernel dump key to a key.# file if its size in the header is nonzero. # is the number of the current core dump. decryptcore(8) decrypts the core dump using a private RSA key and the kernel dump key. This is performed by a child process in capability mode. If the decryption was not successful the parent process removes a partially decrypted core dump. Description on how to encrypt crash dumps was added to the decryptcore(8), dumpon(8), rc.conf(5) and savecore(8) manual pages. EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU. The feature still has to be tested on arm and arm64 as it wasn't possible to run FreeBSD due to the problems with QEMU emulation and lack of hardware. Designed by: def, pjd Reviewed by: cem, oshogbo, pjd Partial review: delphij, emaste, jhb, kib Approved by: pjd (mentor) Differential Revision: https://reviews.freebsd.org/D4712	2016-12-10 16:20:39 +00:00
markj	8bb19c4929	Add a COMPAT_FREEBSD11 kernel option. Use it wherever COMPAT_FREEBSD10 is currently specified. Reviewed by: glebius, imp, jhb Differential Revision: https://reviews.freebsd.org/D8736	2016-12-09 18:54:12 +00:00
alc	7571ef95c1	Previously, vm_radix_remove() would panic if the radix trie didn't contain a vm_page_t at the specified index. However, with this change, vm_radix_remove() no longer panics. Instead, it returns NULL if there is no vm_page_t at the specified index. Otherwise, it returns the vm_page_t. The motivation for this change is that it simplifies the use of radix tries in the amd64, arm64, and i386 pmap implementations. Instead of performing a lookup before every remove, the pmap can simply perform the remove. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D8708	2016-12-08 04:29:29 +00:00
jhb	3ec06e919d	MFamd64: Various fatal page fault fixes. - If a page fault is triggered due to reserved bits in a PTE, treat it as a fatal fault and panic. - If PG_NX is in use, report whether a fatal page fault is due to an instruction fetch or a data access. - If a fatal page fault is due to reserved bits in a PTE, report that as the page fault type rather than a protection violation. MFC after: 1 month	2016-11-19 01:36:44 +00:00
bdrewery	30f99dbeef	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
kib	9cc20ad665	Handle pmap_enter() over an existing 4/2M page in KVA on i386. The userspace case was already handled by pmap_allocpte(). For kernel VA, page table page must exist, and demote cannot fail, so we need to just call pmap_demote_pde(). Also note that due to the machine AS layout, promotions in the KVA on i386 are highly unlikely, so this change is mostly for completeness. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8323	2016-10-28 11:53:22 +00:00
jhb	95a3814f21	MFamd64: Add bounds checks on addresses used with /dev/mem. Reject attempts to read from or memory map offsets in /dev/mem that are beyond the maximum-supported physical address of the current CPU. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7408	2016-10-27 21:23:14 +00:00
jhb	c3c885eb65	Enable EFER_NXE properly on APs. EFER_NXE is set in the EFER MSR by initializecpu() and must be set on all CPUs in the system. When PG_NX support was added to PAE on i386, the block to enable EFER_NXE was placed in a section of initializecpu() that only runs if 'cpu == CPU_686'. During early boot, locore does an initial pass to set cpu that sets it to CPU_686 on all CPUs later than a Pentium. Later, printcpuinfo() adjusts the 'cpu' variable on PII and later CPUs to one of CPU_PII, CPU_PIII, or CPU_P4. However, printcpuinfo() is called after initializecpu() on the BSP, so the BSP would enable EFER_NXE and pg_nx. The APs execute initializecpu() much later after printcpuinfo() has run. The end result on a modern CPU was that cpu was set to CPU_PIII when the APs invoked initializecpu(), so they did not enable EFER_NXE. As a result, the APs would fault when trying to access any pages marked with PG_NX set. When booting a 2 CPU PAE kernel in bhyve this manifested as a hang before single user mode. The attempt to execute /bin/init tried to copy out the exec strings (argv, etc.) to a non-executable mapping while running on the AP. The instruction kept faulting due to invalid bits in the PTE in an infinite loop. Fix this by moving the code to enable EFER_NXE out of the switch statement on 'cpu' and always doing it if 'amd_feature' supports AMDID_NX. MFC after: 2 weeks	2016-10-26 18:47:47 +00:00
kib	45100446da	Follow-up to r307866: - Make !KDB config buildable. - Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi in one place, namely nmi_call_kdb(). This allows to remove do_panic argument from the functions, and to remove i386/amd64 duplication of the variable and sysctl definitions. Note that now NMI causes panic(9) instead of trap_fatal() reporting and then panic(9), consistently for NMIs delivered while CPU operated in ring 0 and 3. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-24 20:47:46 +00:00
kib	a04db702cd	Handle broadcast NMIs. On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs reporting hardware errors are broadcasted to all CPUs. When kernel is configured to enter kdb on NMI, the outcome is problematic, because each CPU tries to enter kdb. All CPUs are executing NMI handlers, which set the latches disabling the nested NMI delivery; this means that stop_cpus_hard(), used by kdb_enter() to stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One indication of this is the harmless but annoying diagnostic "timeout stopping cpus". Much more harming behaviour is that because all CPUs try to enter kdb, and if ddb is used as debugger, all CPUs issue prompt on console and race for the input, not to mention the simultaneous use of the ddb shared state. Try to fix this by introducing a pseudo-lock for simultaneous attempts to handle NMIs. If one core happens to enter NMI trap handler, other cores see it and simulate reception of the IPI_STOP_HARD. More, generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting for the acknowledgement, relying on the nmi handler on other cores suspending and then restarting the CPU. Since it is impossible to detect at runtime whether some stray NMI is broadcast or unicast, add a knob for administrator (really developer) to configure debugging NMI handling mode. The updated patch was debugged with the help from Andrey Gapon (avg) and discussed with him. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8249	2016-10-24 16:40:27 +00:00
jkim	229f578eb8	Implement BPF_MOD and BPF_XOR instructions. These two ALU instructions first appeared on Linux. Then, libpcap adopted and made them available since 1.6.2. Now more platforms including NetBSD have them in kernel. So do we. --이 줄 이하는 자동으로 제거됩니다-- > Description of fields to fill in above: 76 columns --\| > PR: If and which Problem Report is related. > Submitted by: If someone else sent in the change. > Reported by: If someone else reported the issue. > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]\|week[s]\|month[s]]. Request a reminder email. > MFH: Ports tree branch name. Request approval for merge. > Relnotes: Set to 'yes' for mention in release notes. > Security: Vulnerability reference (one per line) or description. > Sponsored by: If the change was sponsored by an organization. > Differential Revision: https://reviews.freebsd.org/D### (full phabric URL needed). > Empty fields above will be automatically removed. M share/man/man4/bpf.4 M sys/amd64/amd64/bpf_jit_machdep.c M sys/amd64/amd64/bpf_jit_machdep.h M sys/i386/i386/bpf_jit_machdep.c M sys/i386/i386/bpf_jit_machdep.h M sys/net/bpf_filter.c	2016-10-21 06:55:07 +00:00
jkim	42d876c52e	Redude code for conditional jumps.	2016-10-21 06:09:30 +00:00
jkim	b35ec131f8	Fix compiler warnings for user land.	2016-10-21 06:06:54 +00:00
jhb	f689fd5a63	Drop support for using mmap() with /dev/kmem. Using the device pager with /dev/kmem is not stable since KVA mappings are transient, but the device pager caches the PA associated with a given offset forever. Interestingly, mips' implementation of memmap() already refused requests for /dev/kmem. Note that kvm_read/kvm_write do not use mmap, but use read and write on /dev/kmem, so this should not affect libkvm users. Reviewed by: kib MFC after: 2 months	2016-10-14 20:01:07 +00:00
imp	081e8d8587	Fix building on i386 and arm. But 'public domain' headers on the files with no creative content. Include "lost" changes from git: o Use /dev/efi instead of /dev/efidev o Remove redundant NULL checks. Submitted by: kib@, dim@, zbb@, emaste@	2016-10-13 06:56:23 +00:00
jtl	62030781cd	In the TCP stack, the hhook(9) framework provides hooks for kernel modules to add actions that run when a TCP frame is sent or received on a TCP session in the ESTABLISHED state. In the base tree, this functionality is only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd, and cc_vegas congestion control modules. Presently, we incur overhead to check for hooks each time a TCP frame is sent or received on an ESTABLISHED TCP session. This change adds a new compile-time option (TCP_HHOOK) to determine whether to include the hhook(9) framework for TCP. To retain backwards compatibility, I added the TCP_HHOOK option to every configuration file that already defined "options INET". (Therefore, this patch introduces no functional change. In order to see a functional difference, you need to compile a custom kernel without the TCP_HHOOK option.) This change will allow users to easily exclude this functionality from their kernel, should they wish to do so. Note that any users who use a custom kernel configuration and use one of the congestion control modules listed above will need to add the TCP_HHOOK option to their kernel configuration. Reviewed by: rrs, lstewart, hiren (previous version), sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8185	2016-10-12 02:16:42 +00:00
imp	41d4ddad0f	Create /dev/efidev to provide an ioctl interface to userland. It supports userland interfaces to UEFI Runtime Services. This is indended to the the MI portion of EFI RuntimeServices support. Differential Revision: https://reviews.freebsd.org/D8128 Reviewed by: kib@, wblock@, Ganael Laplanche	2016-10-11 22:24:30 +00:00
kib	559623d89a	Re-apply r306516 (by cem): Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags Reduce contention during TLB invalidation operations by using a per-CPU completion flag, rather than a single atomically-updated variable. On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements show that smp_tlb_shootdown is about 50% faster with this patch; observations with VTune show that the percentage of time spent in invlrng_single_page on an interrupt (actually doing invalidation, rather than synchronization) increases from 31% with the old mechanism to 71% with the new one. (Running a basic file server workload.) Submitted by: Anton Rang <rang at acm.org> Reviewed by: cem (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8041	2016-10-04 17:01:24 +00:00
hselasky	5e41da7ccd	Move the ConnectX-3 and ConnectX-2 driver from sys/ofed into sys/dev/mlx4 like other PCI network drivers. The sys/ofed directory is now mainly reserved for generic infiniband code, with exception of the mthca driver. - Add new manual page, mlx4en(4), describing how to configure and load mlx4en. - All relevant driver C-files are now prefixed mlx4, mlx4_en and mlx4_ib respectivly to avoid object filename collisions when compiling the kernel. This also fixes an issue with proper dependency file generation for the C-files in question. - Device mlxen is now device mlx4en and depends on device mlx4, see mlx4en(4). Only the network device name remains unchanged. - The mlx4 and mlx4en modules are now built by default on i386 and amd64 targets. Only building the mlx4ib module depends on WITH_OFED=YES . Sponsored by: Mellanox Technologies	2016-09-30 08:23:06 +00:00
bde	009ff78eec	Minor fixes for 160-bit disassembly: (1) Print the default segment %ss before adresses relative to %bp. This is too cluttered for me, but so is printing some other default prefixes, and this is a reasonable reminder that %ss is quite likely to be different from %ds in 16-bit mode. db_disasm still handles prefixes poorly, by trying to discard redundant ones. This loses information, and sometimes the result is wrong or misleading. Clean up nearby initializations and dead code. (2) Fix decoding of operand and address size prefixes in 16-bit mode. They reverse the default in all modes. Obtained from: (1) is partly from r1.4 (2003/11/08) in DFlyBSD (?)	2016-09-25 18:39:24 +00:00
tijl	3f32edbd77	MFamd64: r266901 Allocate a zeroed LDT. Failing to do this might result in the LDT appearing to run out of free descriptors because of random junk in the descriptor's 'sd_type' field. http://lists.freebsd.org/pipermail/freebsd-amd64/2014-May/016088.html PR: 212639 Submitted by: wheelcomplex@gmail.com MFC after: 2 weeks	2016-09-25 18:29:02 +00:00
bde	7d8ccb0c00	Determine the operand/address size of %cs in a new function db_segsize(). Use db_segsize() to set the default operand/address size for disassembling. Allow overriding this with the "alternate" display format /I. The API of db_disasm() should be debooleanized to pass a more general request (amd64 needs overrides to sizes of 16, 32, and 64, but this commit doesn't implement anything for amd64 since much larger changes are needed to restore the amd64 disassmbler's support for non-default sizes). Fix db_print_loc_and_inst() to ask for the normal format and not the alternate in normal operation. This is most useful for vm86 mode, but also works for 16-bit protected mode. Use db_segsize() to avoid trying to print a garbage stack trace if %cs is 16 bits. Print something like the stack trace termination message for a trap boundary instead. Document that the alternate format is now useful on i386.	2016-09-25 16:30:29 +00:00
bde	94a237c17c	Fix vm86 initialization, part 3 of 2 and a half. (Actually, just fix early printfs and debugging of vm86 initialization and some other early initialization in some cases.) Add an option debug.late_console (with default 1=off) to move console and kdb initialization back where it was. Do the same for amd64 although there is no vm86 there. On my test system, debug.late_console=0 works for the syscons, sio and uart console drivers on amd64 and i386, and for vt on i386 but not on amd64. The early printfs fixed by debug.late_console=0 are: - on i386, the message about lost memory above 4G - with -v in otherwise normal use, about 20 printfs for SMAP - other debugging messages for memory sizing. Mostly under -v and not printed in normal use. Document in a comment how much earlier the initialization and early printf()s can be. That is very early for the console. Not much more than curthread is needed. kdb use obviously needs to be not so early, since it needs IDT initialization and that is done relatively late for convenience and historical reasons.	2016-09-25 14:56:24 +00:00
markj	f2a1dd4de8	Regenerate syscall provider argument strings.	2016-09-22 04:50:03 +00:00
bde	74861d13fa	Remove all kernel uses of pcb_psl, but keep in in the struct to preserve the ABI and API for applications. It was removed in the port to amd64, but was remained as garbage giving a micro-pessimization and spurious single-step traps on i386. pcb_psl was intended to be used just to do a context switch of PSL_I, but this context switch was null in most or all versions, and mis-switching of PSL_T was done instead. Some history: - in 386BSD-0.0, cpu_switch() ran at splhigh() and splhigh() did too much interrupt disabling, so interrupts were hard-disabled across cpu_switch() and too many other places - in 386BSD-0.0-patchkit through FreeBSD-4 and FreeBSD-5 before SMPng, splhigh() did soft interrupt masking, and cpu_switch() was excessively cautious and did a cli at the start and a sti at the end to hard-disable interrupts across the switch - SMPng replaced the spl's and cli's by spinlocks (just sched_lock?), so interrupts were hard-disabled across cpu_switch() and too many other places again - initial attempts to fix this intended to restore some soft interrupt disabling, but to support variations in this cpu_switch() used pushfl/popfl into pcb_psl to avoid hard-coding the assumption that the initial and final states have PSL_I enabled. But the version with soft interrupt disabling wasn't used for long, or was never committed, (except I always used my different version of it for UP) so the pushfl/popl and pcb_psl to hold them have been doing less than nothing for about 14 years.	2016-09-17 14:00:52 +00:00
bde	d6a5db2944	(1) Ifdef the new dr6 variable for KDB. While here, avoid using the old variable 'code' and remove it in trap(). ('code' was meant for holding things like %dr6, but is too small to hold %dr6 on amd64 and was reduced to an obfuscation of tf_err, with early truncation on amd64.) Submitted by: Michael Butler (imb@...)	2016-09-16 04:58:37 +00:00
bde	bf8d177543	Abort single stepping in ddb if the trap is not for single-stepping. This is not very easy to do, since ddb didn't know when traps are for single-stepping. It more or less assumed that traps are either breakpoints or single-step, but even for x86 this became inadequate with the release of the i386 in ~1986, and FreeBSD passes it other trap types for NMIs and panics. On x86, teach ddb when a trap is for single stepping using the %dr6 register. Unknown traps are now treated almost the same as breakpoints instead of as the same as single-steps. Previously, the classification of breakpoints was almost correct and everything else was unknown so had to be treated as a single-step. Now the classification of single- steps is precise, the classification of breakpoints is almost correct (as before) and everything else is unknown and treated like a breakpoint. This fixes: - breakpoints not set by ddb, including the main one in kdb_enter(), were treated as single-steps and not stopped on when stepping (except for the usual, simple case of a step with residual count 1). As special cases, kdb_enter() didn't stop for fatal traps or panics - similarly for "hardware breakpoints". Use a new MD macro IS_SSTEP_TRAP(type, code) to code to classify single-steps. This is excessively complicated for bug-for-bug and backwards compatibilty. Design errors apparently started in Mach in ~1990 or perhaps in the FreeBSD interface in ~1993. Common trap types like single steps should have a unique MI code (like the TRAP* codes for user SIGTRAP) so that debuggers don't need macros like IS_SSTEP_TRAP() to decode them. But 'type' is actually an ambiguous MD trap number, and code was always 0 (now it is (int)%dr6 on x86). So it was impossible to determine the trap type from the args. Global variables had to be used. There is already a classification macro db_pc_is_single_step(), but this just gets in the way. It is only used to recover from bugs in IS_BREAKPOINT_TRAP(). On some arches, IS_BREAKPOINT_TRAP() just duplicates the ambiguity in 'type' and misclassifies single-steps as breakpoints. It defaults to 'false', which is the opposite of what is needed for bug-for-bug compatibility. When this is cleaned up, MI classification bits should be passed in 'code'. This could be done now for positive-logic bits, since 'code' was always 0, but some negative logic is needed for compatibility so a simple MI classificition is not usable yet. After reading %dr6, clear the single-step bit in it so that the type of the next debugger trap can be decoded. This is a little ddb-specific. ddb doesn't understand the need to clear this bit and doing it before calling kdb is easiest. gdb would need to reverse this to support hardware breakpoints, but it just doesn't support them now since gdbstub doesn't support %dr*. Fix a bug involving %dr6: when emulating a single-step trap for vm86, set the bit for it in %dr6. Userland debuggers need this. ddb now needs this for vm86 bios calls. The bit gets copied to 'code' then cleared again. Fix related style bugs: - when clearing bits for hardware breakpoints in %dr6, spell the mask as ~0xf on both amd64 and i386 to get the correct number of bits using sign extension and not need a comment about using the wrong mask on amd64 (amd64 traps for invalid results but clearing the reserved top bits didn't trap since they are 0). - rewrite my old wrong comments about using %dr6 for ddb watchpoints.	2016-09-15 17:24:23 +00:00
jhb	bc4a384597	Remove 'cpu' and 'cpu_class' on amd64. The 'cpu' and 'cpu_class' variables were always set to the same value on amd64 and are legacy holdovers from i386. Remove them entirely on amd64. Reviewed by: imp, kib (older version) Differential Revision: https://reviews.freebsd.org/D7888	2016-09-15 17:05:54 +00:00
bde	d58cd5baa4	Use the MI macro TRAPF_USERMODE() instead of open-coded checks for SEL_UPL and sometimes PSL_VM. This is just a style change on amd64, but on i386 it fixes 1 unimportant place where the PSL_VM check was missing and starts fixing 1 important place where the PSL_VM check had a logic error. Fix logic errors in treating vm86 bioscall mode as kernel mode. The main place checked all the necessary flags, but put the necessary parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong place. The broken case is only reached if a vm86 bioscall uses a %cs which is nonzero mod 4, but that is unusual -- most bios calls start with %cs = 0xc000 or 0xf000 and rarely change it. Another place was missing the check for PCB_VM86CALL, but was only reachable if there are bugs virtualizing PSL_I. Add a macro TF_HAS_STACKREGS() and use this instead of converting open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only care about whether the frame has stack registers. This fixes 3 places in my recent fix for register variables in vm86 mode where I messed up the PSL_VM check and cleans up other places.	2016-09-14 12:57:40 +00:00
alc	44f29780e8	Various changes to pmap_ts_referenced() Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and into vm/pmap.h, and describe what its purpose is. Eliminate the archaic "XXX" comment about its value. I don't believe that its exact value, e.g., 5 versus 6, matters. Update the arm64 and riscv pmap implementations of pmap_ts_referenced() to opportunistically update the page's dirty field. On amd64, use the PDE value already cached in a local variable rather than dereferencing a pointer again and again. Reviewed by: kib, markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7836	2016-09-10 16:49:25 +00:00
bde	7e070adea9	Fix single-stepping of instructions emulated by vm86. In vm86.c, fix 2 (rarely used) cases where the return code lost the single-step indicator. While here, fix 2 misspellings of PSL_T as PSL_TF (TF is the CPU manufacturer's spelling, but we use T). In trap.c, turn T_PROTFLT and T_STKFLT into T_TRCTRAP if vm86_emulate() asked for this (it does this when the instruction is being traced and was successully emulated). In the kernel case, we used to deliver the trap as SIGTRAP to the process, where it always terminated the process; now we deliver the trap as T_TRCTRAP to kdb, where it normally gives single-stepping. In the user case, the only difference is that we now clear PSL_T and initialize ucode properly. Reviewed by: kib	2016-09-08 14:43:39 +00:00
markj	fb5804c98d	Remove support for idle page zeroing. Idle page zeroing has been disabled by default on all architectures since r170816 and has some bugs that make it seemingly unusable. Specifically, the idle-priority pagezero thread exacerbates contention for the free page lock, and yields the CPU without releasing it in non-preemptive kernels. The pagezero thread also does not behave correctly when superpage reservations are enabled: its target is a function of v_free_count, which includes reserved-but-free pages, but it is only able to zero pages belonging to the physical memory allocator. Reviewed by: alc, imp, kib Differential Revision: https://reviews.freebsd.org/D7714	2016-09-03 20:38:13 +00:00
alc	24a2d27767	As an optimization to the machine-independent layer, change the machine- dependent pmap_ts_referenced() so that it updates the page's dirty field if a modified bit is found while counting reference bits. This opportunistic update can be performed at low cost and can eliminate the need for some future calls to pmap_is_modified() by the machine- independent layer. Reviewed by: kib, markj MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7722	2016-09-01 15:57:44 +00:00
bde	aa589d0481	Shorten banal comments about zeroing and copying pages. Don't give implementation details that last echoed the code 15-20 years ago. But add a detail about pagezero() on i386. Switch from Mach style to BSD style.	2016-08-29 14:38:31 +00:00
bde	eb85aca715	On amd64, declare sse2_pagezero() and start using it again, but only for zeroing pages in idle where nontemporal writes are clearly best. This is almost a no-op since zeroing in idle works does nothing good and is off by default. Fix END() statement forgotten in previous commit. Align the loop in sse2_pagezero(). Since it writes to main memory, the loop doesn't have to be very carefully written to keep up. Unrolling it was considered useless or harmful and was not done on i386, but that was too careless. Timing for i386: the loop was not unrolled at all, and moved only 4 bytes/iteration. So on a 2GHz CPU, it needed to run at 2 cycles/ iteration to keep up with a memory speed of just 4GB/sec. But when it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/ iteration so it gave a maximum speed of 2.67GB/sec and couldn't even keep up with PC3200 memory. Fix the alignment so that it keep up with 4GB/sec memory, and unroll once to get nearer to 8GB/sec. Further unrolling might be useless or harmful since it would prevent the loop fitting in 16-bytes. My test system with an old CPU and old DDR1 only needed 5+ GB/sec. My test system with a new CPU and DDR3 doesn't need any changes to keep up ~16GB/sec. Timing for amd64: with 8-byte accesses and newer faster CPUs it is easy to reach 16GB/sec but not so easy to go much faster. The alignment doesn't matter much if the CPU is not very old. The loop was already unrolled 4 times, but needs 32 bytes and uses a fancy method that doesn't work for 2-way unrolling in 16 bytes. Just align it to 32-bytes.	2016-08-29 13:07:21 +00:00
bde	e7d74c0dd5	Fix vm86 initialization, part 1 of 2 and a half. Early use of vm86 depends on the PIC being reset to mask interrupts, but r286667 moved PIC initialization to after where vm86 may be first used. Move the PIC initialization up to immdiately before vm86 initialization. All invocations of diff that I tried display this move poorly so that it looks like PIC and vm86 initialization was moved later. r286667 was to move console initialization later. The diffs are again unreadable -- they show a large move that doesn't seem to involve the console. The PIC initialization stayed just below the console initialization where it could still be debugged but no longer works. Later console initialization breaks mainly debugging vm86 initialization and memory sizing using ddb and printf(). There are several printf()s in the memory sizing that now go nowhere since message buffer initialization has always been too late. Memory sizing is done by loader for most users, but the lost messages for this case are even more interesting than for an auto-probe since they tell you what the loader found.	2016-08-28 15:23:44 +00:00
bde	ab8ab604ae	Fix vm86 initialization, part 1 of 2 and a half. vm86 uses the tss, but r273995 moved tss initialization to after where it may be first used, just because tss_esp0 now depends on later initializations and/or amd64 does it later. vm86 is first used for memory sizing in cases where the loader can't figure out the size or is not used. Its initialization is placed immediately before memory sizing to support this, and the tss was initialized a little earlier. Move everything in the tss initialization except for tss_esp0 back to almost where it was, immediately before vm86 initialization (the combined move is from before dblflt_tss initialization to after). Add only early initialization of tss_esp0, later reloading of the tss, and comments. The initial tss_esp0 no longer has space for the pcb since initially the size of the pcb is not known and no pcb is needed. (Later changes broke debugging at this point, so the nonexistent pcb cannot be used by debuggers, and at the time of 273995 when ddb was almost able to debug this problem it didn't need the pcb.) The iniitial tss_esp0 still has a magic 16 bytes reserved for vm86 although I think this is unused too.	2016-08-28 14:03:25 +00:00
ed	c0aa6fd209	Convert pointers obtained from the threadattr_t structure with TO_PTR(). In all of these source files, the userspace pointer size corresponds with the kernelspace pointer size, meaning that casting directly works. As I'm planning on making 32-bit execution on 64-bit systems work as well, use TO_PTR() here as well, so that the changes between source files remain minimal.	2016-08-24 10:13:18 +00:00
jhb	4e659fa057	Fix build for !SMP kernels after the Xen MSIX workaround. Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014 Reported by: Glyn Grinstead <glyn@grinstead.org> MFC after: 3 days	2016-08-22 21:23:17 +00:00
ed	ee20ad15b4	Make CloudABI work on i386. Copy over amd64's cloudabi64_sysvec.c into i386 and tailor it to work. Again, we use a system call convention similar to FreeBSD, except that there is no support for indirect system calls (%eax == 0). Where i386 differs from amd64 is that we have to store thread/process entry arguments on the stack instead of using registers. We also have to put an extra pointer on the stack for TLS (for GSBASE). Place that pointer in the empty slot that is normally used to hold return addresses. That seems to keep the code simple. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D7590	2016-08-22 17:37:31 +00:00
jhb	83662c7f22	Remove the ie(4) driver for Intel 82586 ISA Ethernet adapters. This driver only supports 10Mb Ethernet using PIO (the hardware supports DMA, but the driver only does PIO). There are not any PCCard adapters supported by this driver, only ISA cards. In addition, it does not use bus_space but instead uses bcopy with volatile pointers triggering a host of warnings. (if_ie.c is one of 3 files always built with -Wno-error) Relnotes: yes	2016-08-20 00:49:29 +00:00
jhb	93da7f569e	Remove the spic(4) driver for the Sony Vaoi Jogdial. This hardware is not present on any modern systems. The driver is quite hackish (raw inb/outb instead of bus_space, and raw inb/outb to random I/O ports to enable ACPI since it predated proper ACPI support). Relnotes: yes	2016-08-19 23:39:08 +00:00
jhb	9893a5d2ed	Remove the wl(4) driver and wlconfig(8) utility. The wl(4) driver supports pre-802.11 PCCard wireless adapters that are slower than 802.11b. They do not work with any of the 802.11 framework and the driver hasn't been reported to actually work in a long time. Relnotes: yes	2016-08-19 22:27:14 +00:00
jhb	e24281ea43	Remove the si(4) driver and sicontrol(8) for Specialix serial cards. The si(4) driver supported multiport serial adapters for ISA, EISA, and PCI buses. This driver does not use bus_space, instead it depends on direct use of the pointer returned by rman_get_virtual(). It is also still locked by Giant and calls for patch testing to convert it to use bus_space were unanswered. Relnotes: yes	2016-08-19 21:14:27 +00:00
bde	afc8eb3c73	Remove duplicate definition of get_pcb_td(). gcc works for detecting this error.	2016-08-15 10:46:33 +00:00
bde	39cf12b793	Fix the variables $esp, $ds, $es, $fs, $gs and $ss in vm86 mode. Fix PC_REGS() so that printing of instructions works in some useful cases. ddb only understands a single flat address space, but this macro allows mapping $cs:$eip into vm86's flat address space well enough for the MI parts of ddb. This doesn't work for the MD parts that do stack traces, and there are no similar macros for data addresses. PC_REGS() has to use the trapframe pointer instead of the pcb for this. For other CPUs, the trapframe pointer is not available except by tracing back to it. But tracing back through vm86 trapframes is broken even starting with one.	2016-08-14 16:51:25 +00:00
kib	acae466016	Unconditionally perform checks that FPU region was entered, when #NM exception is caught in kernel mode. There are third-party modules which trigger the issue, and since the problem causes usermode state corruption at least, panic in production kernels as well. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-10 13:44:03 +00:00
kib	9a5f028012	Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no difference between files. For pc98, put x86/mp_x86.c into the same place as used by i386 file list. Fix typo in comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 13:51:53 +00:00
brooks	017f31c108	Don't create pointless backups of generated files in "make sysent". Any sensible workflow will include a revision control system from which to restore the old files if required. In normal usage, developers just have to clean up the mess. Reviewed by: jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D7353	2016-07-28 21:29:04 +00:00
mav	fcb5c9368b	Add more UEFI/e820 memory types from latest specifications. This is only cosmetics. MFC after: 2 weeks	2016-07-24 09:15:11 +00:00
jhb	24eff34a0e	Rename PTRACE_SYSCALL to LINUX_PTRACE_SYSCALL. Suggested by: kib	2016-07-16 00:54:46 +00:00
badger	5908cb719e	Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks	2016-07-13 19:19:18 +00:00
jkim	5a1456e79e	Remove a tunable and always reset system clock while resuming with ACPI. Requested by: bde (long ago)	2016-07-13 19:16:32 +00:00
royger	844ce8697a	xen: automatically disable MSI-X interrupt migration If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0. It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. Sponsored by: Citrix Systems R&D MFC after: 3 days Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D7148	2016-07-12 08:43:09 +00:00
kib	7746f92d24	Fill tf_trapno for trap frames created for syscall. If tf_trapno contains garbage which appears to be equal to T_NMI, e.g. due to thread previously entered kernel due to NMI, doreti sequence skips ast, and does so until a trap or hardware interrupt occur. The visible effects of the issue are quite confusing. First, signals delivery is postponed in observable ways. In particular, the guarantee that unblocked async signals queue is flushed before a return from syscall, is broken. Second, if there are pending signals, all interruptible sleeps of the stuck thread are aborted immediately. Since modern CPUs are relatively fast and tickless kernel generates low interrupt rate, the faulty condition might exist for long time (in an application time scale). In collaboration with: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-07-11 15:52:52 +00:00
dchagin	c93d4a7bde	Fix a copy/paste bug introduced during X86_64 Linuxulator work. FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation use READ_IMPLIES_EXEC flag, introduced in r302515. While here move common part of mmap() and mprotect() code to the files in compat/linux to reduce code dupcliation between Linuxulator's. Reported by: Johannes Jost Meixner, Shawn Webb MFC after: 1 week XMFC with: r302515, r302516	2016-07-10 08:22:04 +00:00
dchagin	7acd3da18d	Regen for r302215 (Linux personality).	2016-07-10 08:17:16 +00:00
dchagin	50efd461d3	Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.	2016-07-10 08:15:50 +00:00
nwhitehorn	89d01c24d1	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
kib	496a3b1f65	Update comments for the MD functions managing contexts for new threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:05:44 +00:00
dchagin	791b4b1122	Add macro to convert errno and use it when appropriate. MFC after: 1 week	2016-05-22 12:46:34 +00:00
dchagin	d8b7958da0	Regen after r300359 (struct l_sched_param removal). MFC after: 1 week	2016-05-21 08:03:13 +00:00
dchagin	4f09fcdf95	Correct an argument param of linux_sched_* system calls as a struct l_sched_param does not defined due to it's nature. MFC after: 1 week	2016-05-21 08:01:14 +00:00
kib	515614230c	Check for overflow and return EINVAL if detected. Backport this and r300305 to i386. PR: 209661 Reported and reviewed by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-05-20 19:50:32 +00:00
sephe	6babf96582	atomic: Add testandclear on i386/amd64 Reviewed by: kib Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6381	2016-05-16 07:19:33 +00:00
jhb	6bae79f884	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).	2016-05-09 20:50:21 +00:00
jhb	eb663acb54	Native PCI-express HotPlug support. PCI-express HotPlug support is implemented via bits in the slot registers of the PCI-express capability of the downstream port along with an interrupt that triggers when bits in the slot status register change. This is implemented for FreeBSD by adding HotPlug support to the PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges representing downstream ports on HotPlug slots. The PCI-PCI bridge driver registers an interrupt handler to receive HotPlug events. It also uses the slot registers to determine the current HotPlug state and drive an internal HotPlug state machine. For simplicty of implementation, the PCI-PCI bridge device detaches and deletes the child PCI device when a card is removed from a slot and creates and attaches a PCI child device when a card is inserted into the slot. The PCI-PCI bridge driver provides a bus_child_present which claims that child devices are present on HotPlug-capable slots only when a card is inserted. Rather than requiring a timeout in the RC for config accesses to not-present children, the pcib_read/write_config methods fail all requests when a card is not present (or not yet ready). These changes include support for various optional HotPlug capabilities such as a power controller, mechanical latch, electro-mechanical interlock, indicators, and an attention button. It also includes support for devices which require waiting for command completion events before initiating a subsequent HotPlug command. However, it has only been tested on ExpressCard systems which support surprise removal and have none of these optional capabilities. PCI-express HotPlug support is conditional on the PCI_HP option which is enabled by default on arm64, x86, and powerpc. Reviewed by: adrian, imp, vangyzen (older versions) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D6136	2016-05-05 22:26:23 +00:00
royger	9681be5bb2	xen/i386: enable the platform hypercall for i386 Not sure why the platform hypercall was disabled on i386, just enable it in order to fix compilation of the PV timer on i386. Sponsored by: Citrix Systems R&D	2016-05-03 08:05:14 +00:00
kib	52ea16e6ba	Make it explicit that D_MEM cdevsw d_flag is to signify that the driver is (or behaves identically to) /dev/mem. Remove the D_MEM flag from random drivers. Note that currently the D_MEM flag does not affect any behaviour, but this going to change in the next commit. Noted and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D6149	2016-05-01 17:46:56 +00:00
jhb	050f1049b2	Move 'device pci' for the PCI bus driver to the MI NOTES file. The PCI bus was already listed in all of the MD NOTES files and the driver should at least compile on all platforms.	2016-04-29 23:53:55 +00:00
pfg	b4106812fd	Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.	2016-04-22 16:57:42 +00:00
pfg	e2f026faa5	Yet more redundant parenthesis from r298431. Mea culpa.	2016-04-21 20:30:38 +00:00
pfg	729533413f	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
pfg	be4082c832	X86: use our nitems() macro when it is avaliable through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:41:46 +00:00
sephe	3d59317312	hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus Submitted by: Jun Su <junsu microsoft com> Reviewed by: jhb, kib, sephe Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5910	2016-04-15 02:20:18 +00:00
pfg	0a0fe3ee17	x86: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-14 17:04:06 +00:00
jhb	b5f76666d8	Expose doreti as a global symbol on amd64 and i386. doreti provides the common code path for returning from interrupt andlers on x86. Exposing doreti as a global symbol allows kernel modules to include low-level interrupt handlers instead of requiring all low-level handlers to be statically compiled into the kernel. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: kib	2016-04-13 17:37:31 +00:00
avg	f7d20d3734	re-enable AMD Topology extension on certain models if disabled by BIOS Some BIOSes disable AMD Topology extension on AMD Family 15h notebook processors. We re-enable the extension, so that we can properly discover core and cache topology. Linux seems to do the same. Reported by: Johannes Dieterich <dieterich.joh@gmail.com> Reviewed by: jhb, kib Tested by: Johannes Dieterich <dieterich.joh@gmail.com> (earlier version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5883	2016-04-12 13:30:39 +00:00
bapt	577607dffc	Add kern.features flags for linux and linux64 modules kern.features.linux: 1 meaning linux 32 bits binaries are supported kern.features.linux64: 1 meaning linux 64 bits binaries are supported The goal here is to help 3rd party applications (including ports) to determine if the host do support linux emulation Reviewed by: dchagin MFC after: 1 week Relnotes: yes Differential Revision: D5830	2016-04-05 22:36:48 +00:00
jhb	ff4b317e50	Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386.	2016-04-03 23:03:54 +00:00
kib	eb986c64f5	Type of the interrupt handlers on x86 cannot be expressed in C. Simplify and unify placeholder type definitions. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5771	2016-03-29 19:56:48 +00:00
dchagin	a5f7ea1073	Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET. Pointed out by: ae@	2016-03-27 10:09:10 +00:00
dchagin	22c1ebea21	iConvert Linux SOL_IPV6 level. MFC after: 1 week	2016-03-27 08:12:01 +00:00
mav	2e42421c8b	Polish wbwd(4) driver and add more supported chips. MFC after: 1 month	2016-03-24 20:52:35 +00:00
jhb	0566758cff	Enable interrupts on the BSP once all PICs are initialized. This moves the enabling of interrupts slightly earlier (the old location was still before devices were enumerated and probed) and does it in the interrupt code (rather than in the device configuration code). This also avoids tripping over an assertion on the first TLB shootdown with earlier AP startup. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5710	2016-03-24 00:24:07 +00:00
dchagin	26a09ec651	Regen for r297061 (fstatfs64 Linux syscall). MFC after: 1 week	2016-03-20 13:23:01 +00:00
dchagin	6aee9bb2b4	Implement fstatfs64 system call. PR: 181012 Submitted by: John Wehle MFC after: 1 week	2016-03-20 13:21:20 +00:00
kib	3d336a82bd	The PKRU state size is 4 bytes, its support makes the XSAVE area size non-multiple of 64 bytes. Thereafter, the user state save area is misaligned, which triggers assertion in the debugging kernels, or segmentation violation on accesses for non-debugging configs. Force the desired alignment of the user save area as the fix (workaround is to disable bit 9 in the hw.xsave_mask loader tunable). This correction is required for booting on the upcoming Intel' Purley platform. Reported and tested by: "Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>, jimharris Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-03-15 15:42:53 +00:00
jhibbits	9da1c36d0a	Migrate many bus_alloc_resource() calls to bus_alloc_resource_anywhere(). Most calls to bus_alloc_resource() use "anywhere" as the range, with a given count. Migrate these to use the new bus_alloc_resource_anywhere() API. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D5370	2016-02-27 03:38:01 +00:00
skra	f4b6499ab5	As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to include it explicitly when <vm/pmap.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373	2016-02-22 09:02:20 +00:00
ian	4543e2c4ad	Minor style cleanups. Submitted by: bde	2016-02-21 18:17:09 +00:00
jhb	169dd4da8e	Convert ss_sp in stack_t and sigstack to void . POSIX requires these members to be of type void rather than the char * inherited from 4BSD. NetBSD and OpenBSD both changed their fields to void * back in 1998. No new build failures were reported via an exp-run. PR: 206503 (exp-run) Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5092	2016-01-27 17:55:01 +00:00
delphij	c07f0f872d	Implement AT_SECURE properly. AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a boolean flag indicating whether secure mode should be enabled. 1 means that the program has changes its credentials during the execution. Being exported AT_SECURE used by glibc issetugid() call. Submitted by: imp, dchagin Security: FreeBSD-SA-16:10.linux Security: CVE-2016-1883	2016-01-27 07:20:55 +00:00
kib	6c8e94f78a	Adjust i386 comment to match amd64 one after r294311. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-19 08:09:09 +00:00
glebius	f65cb2db64	Regen after r293907.	2016-01-14 10:15:21 +00:00
glebius	d87c627c80	Change linux get_robust_list system call to match actual linux one. The set_robust_list system call request the kernel to record the head of the list of robust futexes owned by the calling thread. The head argument is the list head to record. The get_robust_list system call should return the head of the robust list of the thread whose thread id is specified in pid argument. The list head should be stored in the location pointed to by head argument. In contrast, our implemenattion of get_robust_list system call copies the known portion of memory pointed by recorded in set_robust_list system call pointer to the head of the robust list to the location pointed by head argument. So, it is possible for a local attacker to read portions of kernel memory, which may result in a privilege escalation. Submitted by: mjg Security: SA-16:03.linux	2016-01-14 10:13:58 +00:00
dchagin	e706df7b9a	Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it. Differential Revision: https://reviews.freebsd.org/D1090 Reviewed by: kib, trasz MFC after: 1 week	2016-01-09 20:18:53 +00:00
emaste	2029b75c0e	Move amd64 metadata.h to x86 and share with i386 MFC after: 1 week	2016-01-07 19:47:26 +00:00
ian	3d96cedc35	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
jhb	994c23f093	Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c. While here, move the common bits of <machine/cputypes.h> to <x86/cputypes.h> as well. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4670	2015-12-23 21:41:42 +00:00
kib	f124247e27	Merge common parts of i386 and amd64 md_var.h and smp.h into new headers x86/include x86_var.h and x86_smp.h. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4358	2015-12-07 17:41:20 +00:00
jhb	270547c9db	Set %esp correctly in the extended TSS. The pcb is saved at the top of the kernel stack on x86 platforms. The initial kenrel stack pointer is set in the TSS so that the trapframe from user -> kernel transitions begins directly below the pcb and grows down. The XSAVE changes moved the FPU save area out of the pcb and into a variable-sized area after the pcb. This required updating the expressions to calculate the initial stack pointer from 'stacktop - sizeof(pcb)' to 'stacktop - sizeof(pcb) + FPU save area size'. The i386_set_ioperm() system call allows user applications to access individual I/O ports via the I/O port permission bitmap in the TSS. On FreeBSD this requires allocating a custom per-process TSS instead of using the shared per-CPU TSS. The expression to initialize the initial kernel stack pointer in the per-process TSS created for i386_set_ioperm() was not properly updated after the XSAVE changes. Processes that used i386_set_ioperm() would trash the trapframe during subsequent context switches resulting in panics from memory corruption. This changes fixes the kernel stack pointer calculation for the per-process TSS. Reviewed by: kib, n_hibma Reported by: n_hibma MFC after: 1 week	2015-12-07 16:27:11 +00:00
cem	a15dada94b	pmap_invalidate_range: For very large ranges, flush the whole TLB Typical TLBs have 40-512 entries available. At some point, iterating every single page in a requested invalidation range and issuing invlpg on it is more expensive than flushing the TLB and allowing it to reload on demand. Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary number 4096 entries as a hueristic at which point we flush TLB rather than invalidating every single potential page. Reviewed by: alc Feedback from: jhb, kib MFC notes: Depends on r291688 Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4280	2015-12-06 17:39:13 +00:00
kib	fa2e415f2a	In the pmap_set_pg() function, which enables the global bit on the ptes mapping the kernel on CPUs where global TLB entries are supported, revert to flushing only non-global entries, i.e. to the pre-r291688 state. There is no need to flush global TLB entries, since only global entries created during the previous iterations of the loop could exist at this moment. Submitted by: alc Differential revision: https://reviews.freebsd.org/D4368	2015-12-05 08:46:41 +00:00
kib	f741f698b7	For amd64 non-PCID machines, and for i386 machines with support for the PG_G global pte flag, pmap_invalidate_all() fails to flush global TLB entries []. This is because TLB shootdown handler for such configs reloads CR3, and on i386 pmap_invalidate_all() does the same for the initiating CPU. Note that current code does not issue total invalidation requests for the kernel_pmap. Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not specific for PCID for quite some time, and implement the same functionality for i386. Use the function instead of invltlb() in shootdown handlers and in i386 pmap_invalidate_all(), but only for the kernel pmap (which maps pages with the PG_G attribute set), which takes care of PG_G TLB entries on flush. To detect the affected pmap in i386 TLB shootdown handler, pmap should be passed to the smp_masked_invltlb() function, which makes amd64 and i386 TLB shootdown code almost identical. Merge the code under x86/. Noted by: jhb [] Reviewed by: cem, jhb, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4346	2015-12-03 11:14:14 +00:00
kib	ee461b4bba	Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct sysent. sv_prepsyscall is unused. sv_sigsize and sv_sigtbl translate signal number from the FreeBSD namespace into the ABI domain. It is only utilized on i386 for iBCS2 binaries. The issue with this approach is that signals for iBCS2 were delivered with the FreeBSD signal frame layout, which does not follow iBCS2. The same note is true for any other potential user if sv_sigtbl. In other words, if ABI needs signal number translation, it really needs custom sv_sendsig method instead. Sponsored by: The FreeBSD Foundation	2015-11-28 08:49:07 +00:00
kib	01e10e7a39	Disconnect iBCS2 emulator from the build. The ibcs2 option, the build glue and the sources are not removed for now. Discussed with: emaste Sponsored by: The FreeBSD Foundation	2015-11-28 08:31:32 +00:00
emaste	c168857c6a	Fix whitespace on addition of IPSEC option	2015-11-26 21:35:50 +00:00
kib	e0c4faece4	Split kerne timekeep ABI structure vdso_sv_tk out of the struct sysentvec. This allows the timekeep data to be shared between similar ABIs which cannot share sysentvec. Make the timekeep_push_vdso() tick callback to the timekeep structures instead of sysentvecs. If several sysentvec share the vdso_sv_tk structure, we would update the userspace data several times on each tick, without the change. Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when sysentvec is marked with the new SV_TIMEKEEP flag. This saves allocation and update of unneeded vdso_sv_tk for ABIs which do not provide userspace gettimeofday yet, which are PowerPCs arches right now. Make vdso_sv_tk allocator public, namely split out and export alloc_sv_tk() and alloc_sv_tk_compat32(). ABIs which share timekeep data now can allocate it manually and share as appropriate. Requested by: nwhitehorn Tested by: nwhitehorn, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-11-23 07:09:35 +00:00
jhb	c1d9f70889	Export various helper variables describing the layout and size of certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3773	2015-11-12 22:00:59 +00:00
kib	df9dbd12ce	The prefix for CLFLUSHOPT is 0x66. It was right on amd64. Sponsored by: The FreeBSD Foundation	2015-10-30 09:53:33 +00:00
jhb	727afb6dcd	Use movw instead of movl (or plain mov) when moving segment registers into memory. This is a nop on clang's assembler, but some assemblers complain if the size suffix is incorrect. Submitted by: bde	2015-10-29 21:25:46 +00:00
hselasky	c12637e275	Build fix for i386/XBOX and pc98/GENERIC. Reviewed by: kib	2015-10-28 12:10:01 +00:00
kib	919ebacc13	Intel SDM before revision 56 described the CLFLUSH instruction as only ordered with the MFENCE instruction. Similar weak guarantees are also specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced CLFLUSH loop with MFENCE both before and after the loop. In the revision 56 of SDM, Intel stated that all existing implementations of CLFLUSH are strict, CLFLUSH instructions execution is ordered WRT other CLFLUSH and writes. Also, the strict behaviour is made architectural. A new instruction CLFLUSHOPT (which was documented for some time in the Instruction Set Extensions Programming Reference) provides the weak behaviour which was previously attributed to CLFLUSH. Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do not execute MFENCE before and after the flushing loop. Reviewed by: alc Sponsored by: The FreeBSD Foundation	2015-10-24 21:37:47 +00:00
kib	7eb36dd3f9	Add CLFLUSHOPT instruction wrappers. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-23 11:45:38 +00:00
royger	d31f5fb7b9	x86/xen: Consolidate xen-os.h in a single place amd64 and i386 platform code contain very similar xen/xen-os.h The only differences are: - Functions/variables/types which were unused in i386/xen/xen-os.h: * xen_xchg * __xchg_dummy * __xg * __xchg * atomic_t * atomic_inc * rdtscll The functions/variables/types unused in xen-os.h can be dropped and there is no more differences betwen amd64 and i386. The new header is placed in x86/include/xen and each platform will have dummy headers include x86/xen/.h. This is to be able to include machine/xen/.h in the PV drivers. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3880 Sponsored by: Citrix Systems R&D	2015-10-21 10:04:35 +00:00
mav	64d53c4c7d	Remove compatibility shims for legacy ATA device names. We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely removed old stack at 10.x, so at 11.x it is time to remove compat shims.	2015-10-11 13:01:51 +00:00
cem	b2ce78db72	Fix missing semi-colon from r289055. Obtained from: mjg Sponsored by: EMC / Isilon Storage Division	2015-10-08 23:27:45 +00:00
mjg	d8dc4fc1ae	linux: fix handling of out-of-bounds syscall attempts Due to an off by one the code would read an entry past the table, as opposed to the last entry which contains the nosys handler. Reported by: Pawel Biernacki <pawel.biernacki gmail.com>	2015-10-08 21:08:35 +00:00
royger	375ecc42de	xen/console: Introduce a new console driver for Xen guest The current Xen console driver is crashing very quickly when using it on an ARM guest. This is because the console lock is recursive and it may lead to recursion on the tty lock and/or corrupt the ring pointer. Furthermore, the console lock is not always taken where it should be and has to be released too early because of the way the console has been designed. Over the years, code has been modified to support various new features but the driver has not been reworked. This new driver has been rewritten with the idea of only having a small set of specific function to write either via the shared ring or the hypercall interface. Note that HVM support has been left aside for now because it requires additional features which are not yet supported. A follow-up patch will be sent with HVM guest support. List of items that may be good to have but not mandatory: - Avoid to flush for each character written when using the tty - Support multiple consoles Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3698 Sponsored by: Citrix Systems R&D	2015-10-08 16:39:43 +00:00
royger	c1bb2e3246	Update Xen headers from 4.2 to 4.6 Pull the latest headers for Xen which allow us to add support for ARM and use new features in FreeBSD. This is a verbatim copy of the xen/include/public so every headers which don't exits anymore in the Xen repositories have been dropped. Note the interface version hasn't been bumped, it will be done in a follow-up. Although, it requires fix in the code to get it compiled: - sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so drop it. - {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on xen/interface/event_channel.h, so include it. - {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include machine/intr_machdep.h. This is also fixing build compilation with the new headers. - dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas been dropped. So directly use struct blkif_request_segment Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if __XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file where xen/xen-os.h is not correctly included. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3805 Sponsored by: Citrix Systems R&D	2015-10-06 11:29:44 +00:00
alc	57f2addb31	Exploit r288122 to address a cosmetic issue. Since PV chunk pages don't belong to a vm object, they can't be paged out. Since they can't be paged out, they are never enqueued in a paging queue. Nonetheless, passing PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages are being enqueued in the inactive queue. As of r288122, we can avoid this false impression by passing PQ_NONE. Submitted by: kmacy (an earlier version) Differential Revision: https://reviews.freebsd.org/D1674	2015-09-26 07:18:05 +00:00
kib	518734671f	Add support for weak symbols to the kernel linkers. It means that linkers no longer raise an error when undefined weak symbols are found, but relocate as if the symbol value was 0. Note that we do not repeat the mistake of userspace dynamic linker of making the symbol lookup prefer non-weak symbol definition over the weak one, if both are available. In fact, kernel linker uses the first definition found, and ignores duplicates. Signature of the elf_lookup() and elf_obj_lookup() functions changed to split result/error code and the symbol address returned. Otherwise, it is impossible to return zero address as the symbol value, to MD relocation code. This explains the mechanical changes in elf_machdep.c sources. The powerpc64 R_PPC_JMP_SLOT handler did not checked error from the lookup() call, the patch leaves the code as is (untested). Reported by: glebius Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-09-20 01:27:59 +00:00
markj	e8967c8bd9	Add stack_save_td_running(), a function to trace the kernel stack of a running thread. It is currently implemented only on amd64 and i386; on these architectures, it is implemented by raising an NMI on the CPU on which the target thread is currently running. Unlike stack_save_td(), it may fail, for example if the thread is running in user mode. This change also modifies the kern.proc.kstack sysctl to use this function, so that stacks of running threads are shown in the output of "procstat -kk". This is handy for debugging threads that are stuck in a busy loop. Reviewed by: bdrewery, jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3256	2015-09-11 03:54:37 +00:00
markj	dfb0cc5c03	Merge stack(9) implementations for i386 and amd64 under x86/. Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255	2015-09-11 03:24:07 +00:00
kib	98df6be028	Do not hold the process around the vm_fault() call from the trap()s. The only operation which is prevented by the hold is the kernel stack swapout for the faulted thread, which should be fine to allow. Remove useless checks for NULL curproc or curproc->p_vmspace from the trap_pfault() wrappers on x86 and powerpc. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-09-10 17:46:48 +00:00
royger	5b319fbe38	preload_search_info: make sure mod is set Add a check to preload_search_info to make sure mod is set. Most of the callers of preload_search_info don't check that the mod parameter is set, which can cause page faults. While at it, remove some now unnecessary checks before calling preload_search_info. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3440	2015-08-21 15:57:57 +00:00
marcel	bacabe8a7e	Better support memory mapped console devices, such as VGA and EFI frame buffers and memory mapped UARTs. 1. Delay calling cninit() until after pmap_bootstrap(). This makes sure we have PMAP initialized enough to add translations. Keep kdb_init() after cninit() so that we have console when we need to break into the debugger on boot. 2. Unfortunately, the ATPIC code had be moved as well so as to avoid a spurious trap #30. The reason for which is not known at this time. 3. In pmap_mapdev_attr(), when we need to map a device prior to the VM system being initialized, use virtual_avail as the KVA to map the device at. In particular, avoid using the direct map on amd64 because we can't demote by virtue of not being able to allocate yet. Keep track of the translation. Re-use the translation after the VM has been initialized to not waste KVA and to satisfy the assumption in uart(4) that the handle returned for the low-level console is the same as later returned when the device is probed and attached. 4. In pmap_unmapdev() remove the mapping from the table when called pre-init. Otherwise keep the mapping. During bus probe and attach device resources are mapped and unmapped multiple times, which would have us destroy the mapping used by the low-level console. 5. In pmap_init(), set pmap_initialized to signal that we're not pre-init anymore. On amd64, bring the direct map in sync with the translations created at that time. 6. Implement bus_space_map() and bus_space_unmap() for real: when the tag corresponds to memory space, call the corresponding pmap_mapdev() and pmap_unmapdev() functions to construct and actual handle. 7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply call pmap_mapdev_attr() or bus_space_map() as desired. Notes: 1. uart(4) already used bus_space_map() during low-level console setup but since serial ports have traditionally been I/O port based, the lack of a proper implementation for said function was not a problem. It has always supported memory mapped UARTs for low-level consoles by setting hw.uart.console accordingly. 2. The use of the direct map on amd64 without setting caching attributes has been a bigger problem than previously thought. This change has the fortunate (and unexpected) side-effect of fixing various EFI frame buffer problems (though not all). PR: 191564, 194952 Special thanks to: 1. XipLink, Inc -- generously donated an Intel Bay Trail E3800 based eval board (ADLE3800PC). 2. The FreeBSD Foundation, in particular emaste@ -- for UEFI support in general and testing. 3. Everyone who tested the proposed for PR 191564. 4. jhb@ and kib@ for being a soundboard and applying a clue bat if so needed.	2015-08-12 15:26:32 +00:00
kib	9458aa2b57	Initialization of smp_tlb_wait does not require release semantic, no data is synchronized by store/load to the variable. The lapic_write_icr() function ensures that store buffers are flushed before IPI command is issued. Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-12 09:46:39 +00:00
kib	bff0ec5bef	AP should load aps_ready with acquire semantic to see BSP updates to the SMP structures, synchronized with the load by release store in release_aps(). The change is formal, x86 strong memory model implicitely provided the guarantees. Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-08-12 09:43:12 +00:00
kib	9033c894a1	Make kstack_pages a tunable on arm, x86, and powepc. On i386, the initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week	2015-08-10 17:18:21 +00:00
kib	665130d183	Remove unused i386 header privatespace.h. For the native kernel, its use was removed in r173592 (Nov 2007), yet Xen PV bits continued referencing the privatespace structure, and were removed in r282274 (Apr 2015). Discussed with: jhb Sponsored by: The FreeBSD Foundation	2015-08-07 05:59:58 +00:00
jhb	47d8edd4b1	Remove some more vestiges of the Xen PV domu support. Specifically, use vtophys() directly instead of vtomach() and retire the no-longer-used headers <machine/xenfunc.h> and <machine/xenvar.h>. Reported by: bde (stale bits in <machine/xenfunc.h>) Reviewed by: royger (earlier version) Differential Revision: https://reviews.freebsd.org/D3266	2015-08-06 17:07:21 +00:00
emaste	002d9943c1	Rationalize BSD license on sys/*/include/in_cksum.h Remove the advertising clause from the Regents of the University of California's license, per the letter dated July 22, 1999. Update clause numbering.	2015-08-05 19:05:12 +00:00
kib	579f504041	Fix UP build after r286296, ensure that CPU_FOREACH() is defined. Sponsored by: The FreeBSD Foundation	2015-08-05 10:50:33 +00:00

... 5 6 7 8 9 ...

13742 Commits