freebsd-skq

Author	SHA1	Message	Date
Ed Schouten	d96aeddf2f	Don't forget to set sa->narg for CloudABI system calls. It turns out that this value is not used within the system call code under normal conditions, except when using tracing tools like ktrace. If we forget to set this value, it is set to random garbage. This may cause ktrace to hang indefinitely, making it impossible to kill. Reported by: Michael Plass PR: 210800 MFC before: 11.0-RELEASE	2016-07-08 20:09:21 +00:00
Nathan Whitehorn	96c85efb4b	Replace a number of conflations of mp_ncpus and mp_maxid with either mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)	2016-07-06 14:09:49 +00:00
Konstantin Belousov	5c2cf81845	Update comments for the MD functions managing contexts for new threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731	2016-06-16 12:05:44 +00:00
Konstantin Belousov	b9c8a54dc3	Do not access pv_table array for fictitious pages, since the array does not cover the dynamically registered ficititious ranges, and fictitious pages mappings are not promoted. Offer a dummy struct md_page to fetch constant superpage pv list generation to satisfy logic. Also, by initializing the pv_dummy pv_list to empty, we can remove several explicit PG_FICTITIOUS tests. Reported and tested by: Michael Butler <imb@protected-networks.net> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6728 Approved by: re (hrs)	2016-06-13 03:45:08 +00:00
Konstantin Belousov	fc0924b9a4	Avoid spurious EINVAL in amd64 pmap_change_attr(). Do not try to change attributes for DMAP when working on a mapping which is not covered by the DMAP. This was reported on real system where a BAR of a device (NTB) was mapped outside the PCI window. Reported and tested by: mav Reviewed by: jhb, mav Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6668	2016-06-05 17:11:23 +00:00
Konstantin Belousov	4230ac2fa5	In pmap_advise(), avoid leaking DI start for EPT pmaps which needs A/D emulation. Assert that syscalls do not leak DI. Reported by: gjb Sponsored by: The FreeBSD Foundation	2016-05-27 18:45:11 +00:00
Jung-uk Kim	1eb6b86a44	Both Clang and GCC cannot generate efficient reserve_pv_entries(). http://docs.freebsd.org/cgi/mid.cgi?552BFEB2.8040407 Re-implement it entirely in inline assembly not to let compilers do silly spilling to memory. For non-POPCNT case, use newly added bit_count(3). Reported by: alc Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D6541	2016-05-25 23:06:52 +00:00
Jung-uk Kim	22d9c132f6	Document POPCNT erratum for 6th Generation Intel Core processors.	2016-05-23 23:00:47 +00:00
Dmitry Chagin	5437e1d103	Add macro to convert errno and use it when appropriate. MFC after: 1 week	2016-05-22 12:46:34 +00:00
Dmitry Chagin	f26a190f65	Regen after r300359 (struct l_sched_param removal). MFC after: 1 week	2016-05-21 08:03:13 +00:00
Dmitry Chagin	8cc96fb43a	Correct an argument param of linux_sched_* system calls as a struct l_sched_param does not defined due to it's nature. MFC after: 1 week	2016-05-21 08:01:14 +00:00
Konstantin Belousov	0bfad8e4a3	Check for overflow and return EINVAL if detected. Backport this and r300305 to i386. PR: 209661 Reported and reviewed by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-05-20 19:50:32 +00:00
Konstantin Belousov	ae76e30131	Use unsigned type for the loop index to make overflow checks effective. PR: 209661 Reported by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-05-20 15:32:48 +00:00
Eitan Adler	cef367e6a1	Don't repeat the the word 'the' (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)	2016-05-17 12:52:31 +00:00
Sepherosa Ziehau	dfdc9a05c6	atomic: Add testandclear on i386/amd64 Reviewed by: kib Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6381	2016-05-16 07:19:33 +00:00
Konstantin Belousov	56e61f57b0	Eliminate pvh_global_lock from the amd64 pmap. The only current purpose of the pvh lock was explained there On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote: > Let me lay out one example for you in detail. Suppose that we have > three processors and two of these processors are actively using the same > pmap. Now, one of the two processors sharing the pmap performs a > pmap_remove(). Suppose that one of the removed mappings is to a > physical page P. Moreover, suppose that the other processor sharing > that pmap has this mapping cached with write access in its TLB. Here's > where the trouble might begin. As you might expect, the processor > performing the pmap_remove() will acquire the fine-grained lock on the > PV list for page P before destroying the mapping to page P. Moreover, > this processor will ensure that the vm_page's dirty field is updated > before releasing that PV list lock. However, the TLB shootdown for this > mapping may not be initiated until after the PV list lock is released. > The processor performing the pmap_remove() is not problematic, because > the code being executed by that processor won't presume that the mapping > is destroyed until the TLB shootdown has completed and pmap_remove() has > returned. However, the other processor sharing the pmap could be > problematic. Specifically, suppose that the third processor is > executing the page daemon and concurrently trying to reclaim page P. > This processor performs a pmap_remove_all() on page P in preparation for > reclaiming the page. At this instant, the PV list for page P may > already be empty but our second processor still has a stale TLB entry > mapping page P. So, changes might still occur to the page after the > page daemon believes that all mappings have been destroyed. (If the PV > entry had still existed, then the pmap lock would have ensured that the > TLB shootdown completed before the pmap_remove_all() finished.) Note, > however, the page daemon will know that the page is dirty. It can't > possibly mistake a dirty page for a clean one. However, without the > current pvh global locking, I don't think anything is stopping the page > daemon from starting the laundering process before the TLB shootdown has > completed. > > I believe that a similar example could be constructed with a clean page > P' and a stale read-only TLB entry. In this case, the page P' could be > "cached" in the cache/free queues and recycled before the stale TLB > entry is flushed. TLBs for addresses with updated PTEs are always flushed before pmap lock is unlocked. On the other hand, amd64 pmap code does not always flushes TLBs before PV list locks are unlocked, if previously PTEs were cleared and PV entries removed. To handle the situations where a thread might notice empty PV list but third thread still having access to the page due to TLB invalidation not finished yet, introduce delayed invalidation. Comparing with the pvh_global_lock, DI does not block entered thread when pmap_remove_all() or pmap_remove_write() (callers of pmap_delayed_invl_wait()) are executed in parallel. But _invl_wait() callers are blocked until all previously noted DI blocks are leaved, thus ensuring that neccessary TLB invalidations were performed before returning from pmap_remove_all() or pmap_remove_write(). See comments for detailed description of the mechanism, and also for the explanations why several pmap methods, most important pmap_enter(), do not need DI protection. Reviewed by: alc, jhb (turnstile KPI usage) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5747	2016-05-14 23:35:11 +00:00
Alan Cox	d3ffaee8e6	Eliminate an unused #include. For a brief period of time, _unrhdr.h was used to implement PCID support on amd64. Reviewed by: kib	2016-05-13 20:14:41 +00:00
Konstantin Belousov	aa3ec63e02	Add locking annotations to amd64 struct md_page members. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-10 09:58:51 +00:00
John Baldwin	8d791e5af1	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).	2016-05-09 20:50:21 +00:00
Dmitry Chagin	8521d01a71	Add a forgotten in r283424 .eh_frame section with CFI & FDE records to allow stack unwinding through signal handler. Reported by: Dmitry Sivachenko MFC after: 2 weeks	2016-05-09 07:38:47 +00:00
John Baldwin	82cb5c3b5b	Native PCI-express HotPlug support. PCI-express HotPlug support is implemented via bits in the slot registers of the PCI-express capability of the downstream port along with an interrupt that triggers when bits in the slot status register change. This is implemented for FreeBSD by adding HotPlug support to the PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges representing downstream ports on HotPlug slots. The PCI-PCI bridge driver registers an interrupt handler to receive HotPlug events. It also uses the slot registers to determine the current HotPlug state and drive an internal HotPlug state machine. For simplicty of implementation, the PCI-PCI bridge device detaches and deletes the child PCI device when a card is removed from a slot and creates and attaches a PCI child device when a card is inserted into the slot. The PCI-PCI bridge driver provides a bus_child_present which claims that child devices are present on HotPlug-capable slots only when a card is inserted. Rather than requiring a timeout in the RC for config accesses to not-present children, the pcib_read/write_config methods fail all requests when a card is not present (or not yet ready). These changes include support for various optional HotPlug capabilities such as a power controller, mechanical latch, electro-mechanical interlock, indicators, and an attention button. It also includes support for devices which require waiting for command completion events before initiating a subsequent HotPlug command. However, it has only been tested on ExpressCard systems which support surprise removal and have none of these optional capabilities. PCI-express HotPlug support is conditional on the PCI_HP option which is enabled by default on arm64, x86, and powerpc. Reviewed by: adrian, imp, vangyzen (older versions) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D6136	2016-05-05 22:26:23 +00:00
Alan Cox	cdbf6d8a05	Explain why pmap_copy(), pmap_enter_pde(), and pmap_enter_quick_locked() call pmap_invalidate_page() even though they are not destroying a leaf- level page table entry. Eliminate some bogus white-space characters in a comment. Reviewed by: kib	2016-05-04 17:54:13 +00:00
Pedro F. Giffuni	edafb5a327	sys/amd64: Small spelling fixes. No functional change.	2016-05-03 22:13:04 +00:00
Pedro F. Giffuni	500eb14ae8	vmm(4): Small spelling fixes. Reviewed by: grehan	2016-05-03 22:07:18 +00:00
John Baldwin	8a08b7d36b	Revert bus_get_cpus() for now. I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.	2016-05-03 01:17:40 +00:00
John Baldwin	bc153c692f	Add a new bus method to fetch device-specific CPU sets. bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519	2016-05-02 18:00:38 +00:00
John Baldwin	e131ba36e8	Move 'device pci' for the PCI bus driver to the MI NOTES file. The PCI bus was already listed in all of the MD NOTES files and the driver should at least compile on all platforms.	2016-04-29 23:53:55 +00:00
Andriy Gapon	f9ac50ac45	fix missing variable in r298736 Pointyhat to: avg Reported by: Ivan Klymenko <fidaj@ukr.net> MFC after: 2 weeks X-MFC with: r298736	2016-04-28 09:40:24 +00:00
Andriy Gapon	e5e4452078	ensure that initial local apic id is sane on AMD 10h systems Summary: The Initial Local APIC ID is returned by CPUID function 1 (in EBX). On AMD Family 10h systems the way that ID is built is controlled by an MSR bit (InitApicIdCpuIdLo). BKDG instructs BIOS to set it in a certain way, but a BIOS can be buggy. In that case the ID can confuse tools that use it, e.g. hwloc. For example, on a system that I own real Local APIC IDs are configured as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0. See: https://github.com/open-mpi/hwloc/issues/183 Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6060	2016-04-28 08:29:57 +00:00
Conrad Meyer	0e3f9e5bdd	AMD64 pmap: Use howmany() macro Use param.h howmany() instead of hand-rolled version. Sponsored by: EMC / Isilon Storage Division	2016-04-24 21:35:01 +00:00
Pedro F. Giffuni	b66bb393f2	Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.	2016-04-22 16:57:42 +00:00
Pedro F. Giffuni	d9c9c81c08	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
Pedro F. Giffuni	ea24b0561f	X86: use our nitems() macro when it is avaliable through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:41:46 +00:00
Conrad Meyer	5dc5dab6eb	Add 4Kn kernel dump support (And 4Kn minidump support, but only for amd64.) Make sure all I/O to the dump device is of the native sector size. To that end, we keep a native sector sized buffer associated with dump devices (di->blockbuf) and use it to pad smaller objects as needed (e.g. kerneldumpheader). Add dump_write_pad() as a convenience API to dump smaller objects with zero padding. (Rather than pull in NPM leftpad, we wrote our own.) Savecore(1) has been updated to deal with these dumps. The format for 512-byte sector dumps should remain backwards compatible. Minidumps for other architectures are left as an exercise for the reader. PR: 194279 Submitted by: ambrisko@ Reviewed by: cem (earlier version), rpokala Tested by: rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5848	2016-04-15 17:45:12 +00:00
Sepherosa Ziehau	0c29fe6db8	hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus Submitted by: Jun Su <junsu microsoft com> Reviewed by: jhb, kib, sephe Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5910	2016-04-15 02:20:18 +00:00
John Baldwin	4478441145	Expose doreti as a global symbol on amd64 and i386. doreti provides the common code path for returning from interrupt andlers on x86. Exposing doreti as a global symbol allows kernel modules to include low-level interrupt handlers instead of requiring all low-level handlers to be statically compiled into the kernel. Submitted by: Howard Su <howard0su@gmail.com> Reviewed by: kib	2016-04-13 17:37:31 +00:00
John Baldwin	7ecf8cab6f	Enable DEVICE_NUMA with up to 8 domains by default on amd64. 8 memory domains should handle a quad-socket board with dual-domain processors. Reviewed by: kib Relnotes: maybe? Differential Revision: https://reviews.freebsd.org/D5893	2016-04-12 21:23:44 +00:00
Andriy Gapon	0d63fc3ed8	re-enable AMD Topology extension on certain models if disabled by BIOS Some BIOSes disable AMD Topology extension on AMD Family 15h notebook processors. We re-enable the extension, so that we can properly discover core and cache topology. Linux seems to do the same. Reported by: Johannes Dieterich <dieterich.joh@gmail.com> Reviewed by: jhb, kib Tested by: Johannes Dieterich <dieterich.joh@gmail.com> (earlier version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5883	2016-04-12 13:30:39 +00:00
Andriy Gapon	9054bcbce7	[amd64] dtrace_invop handler is to be called only for kernel exceptions DTrace-related exceptions in userland code are handled elsewhere. One practical problem was a crash in dtrace_invop_start() when saved %rsp pointed to a virtual address that was not backed. i386 code already ignored userland exceptions. Reviewed by: markj, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D5906	2016-04-12 06:46:54 +00:00
Anish Gupta	441a3497f5	Allow guest writes to AMD microcode update[0xc0010020] MSR without updating actual hardware MSR. This allows guest microcode update to go through which otherwise failing because wrmsr() was returning EINVAL. Submitted by:Yamagi Burmeister Approved by:grehan MFC after:2 weeks	2016-04-11 05:09:43 +00:00
Ed Schouten	ab83575070	Make CloudABI's way of doing TLS more friendly to userspace emulators. We're currently seeing how hard it would be to run CloudABI binaries on operating systems cannot be modified easily (Windows, Mac OS X). The idea is that we want to just run them without any sandboxing. Now that CloudABI executables are PIE, this is already a bit easier, but TLS is still problematic: - CloudABI executables want to write to the %fs, which typically requires extra system calls by the emulator every time it needs to switch between CloudABI's and its own TLS. - If CloudABI executables overwrite the %fs base unconditionally, it also becomes harder for the emulator to store a backup of the old value of %fs. To solve this, let's no longer overwrite %fs, but just %fs:0. As CloudABI's C library does not use a TCB, this space can now be used by an emulator to keep track of its internal state. The executable can now safely overwrite %fs:0, as long as it makes sure that the TCB is copied over to the new TLS area. Ensure that there is an initial TLS area set up when the process starts, only containing a bogus TCB. We don't really care about its contents on FreeBSD. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5836	2016-04-06 11:11:31 +00:00
Baptiste Daroussin	b6348be7b9	Add kern.features flags for linux and linux64 modules kern.features.linux: 1 meaning linux 32 bits binaries are supported kern.features.linux64: 1 meaning linux 64 bits binaries are supported The goal here is to help 3rd party applications (including ports) to determine if the host do support linux emulation Reviewed by: dchagin MFC after: 1 week Relnotes: yes Differential Revision: D5830	2016-04-05 22:36:48 +00:00
John Baldwin	2b1e924b69	Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386.	2016-04-03 23:03:54 +00:00
Ed Schouten	4a8b3b18cc	Make Position Independent Executables work for CloudABI. - Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to regular ET_EXECs. - Provide an AT_BASE entry in the auxiliary vector, so the executable knows at which address it got loaded and can apply relocations.	2016-03-31 18:52:00 +00:00
Konstantin Belousov	0df87548b9	Type of the interrupt handlers on x86 cannot be expressed in C. Simplify and unify placeholder type definitions. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5771	2016-03-29 19:56:48 +00:00
Dmitry Chagin	7c5982000d	Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET. Pointed out by: ae@	2016-03-27 10:09:10 +00:00
Dmitry Chagin	c826fcfe22	iConvert Linux SOL_IPV6 level. MFC after: 1 week	2016-03-27 08:12:01 +00:00
Alexander Motin	baa7dd65be	Polish wbwd(4) driver and add more supported chips. MFC after: 1 month	2016-03-24 20:52:35 +00:00
John Baldwin	7a2c1d8c60	Enable interrupts on the BSP once all PICs are initialized. This moves the enabling of interrupts slightly earlier (the old location was still before devices were enumerated and probed) and does it in the interrupt code (rather than in the device configuration code). This also avoids tripping over an assertion on the first TLB shootdown with earlier AP startup. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5710	2016-03-24 00:24:07 +00:00
Dmitry Chagin	351cf753eb	Regen for r297061 (fstatfs64 Linux syscall). MFC after: 1 week	2016-03-20 13:23:01 +00:00
Dmitry Chagin	99546279d6	Implement fstatfs64 system call. PR: 181012 Submitted by: John Wehle MFC after: 1 week	2016-03-20 13:21:20 +00:00
Gleb Smirnoff	e33c2e6b06	Due to invalid use of a signed intermediate value in the bounds checking during argument validity verification, unbound zero'ing of the process LDT and adjacent memory can be initiated from usermode. Submitted by: CORE Security Patch by: kib Security: SA-16:15	2016-03-16 22:33:12 +00:00
Konstantin Belousov	3ef966c4c0	The PKRU state size is 4 bytes, its support makes the XSAVE area size non-multiple of 64 bytes. Thereafter, the user state save area is misaligned, which triggers assertion in the debugging kernels, or segmentation violation on accesses for non-debugging configs. Force the desired alignment of the user save area as the fix (workaround is to disable bit 9 in the hw.xsave_mask loader tunable). This correction is required for booting on the upcoming Intel' Purley platform. Reported and tested by: "Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>, jimharris Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-03-15 15:42:53 +00:00
John Baldwin	6fc8053f1a	Fix reporting of the CloudABI ABI in kdump. - Advertise the word size for CloudABI ABIs via the SV_LP64 flag. All of the other ABIs include either SV_ILP32 or SV_LP64. - Fix kdump to not assume a 32-bit ABI if the ABI flags field is non-zero but SV_LP64 isn't set. Instead, only assume a 32-bit ABI if SV_ILP32 is set and fallback to the unknown value of "00" if neither SV_LP64 nor SV_ILP32 is set. Reviewed by: kib, ed Differential Revision: https://reviews.freebsd.org/D5560	2016-03-09 18:38:30 +00:00
Marcel Moolenaar	6bcf245ebc	Bump VM_MAX_MEMSEGS from 2 to 3 to match the number of VM segment identifiers present in vmmapi.h. In particular, it's now possible to create a VM_FRAMEBUFFER segment.	2016-02-26 16:18:47 +00:00
Konstantin Belousov	abb8f08388	Return dst as the result from memcpy(9) on amd64. PR: 207422 MFC after: 1 week	2016-02-24 11:58:15 +00:00
Svatopluk Kraus	b352b10400	As <machine/vm.h> is included from <vm/vm.h>, there is no need to include it explicitly when <vm/vm.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5380	2016-02-22 09:10:23 +00:00
Svatopluk Kraus	35a0bc1260	As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no need to include it explicitly when <vm/vm_param.h> is already included. Suggested by: alc Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D5379	2016-02-22 09:08:04 +00:00
Svatopluk Kraus	a1e1814d76	As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to include it explicitly when <vm/pmap.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373	2016-02-22 09:02:20 +00:00
Gleb Smirnoff	b28cc462ad	Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a requirement for uma_int.h. Suggested by: jhb	2016-02-09 20:22:35 +00:00
Gleb Smirnoff	e60b2fcbeb	Redo r292484. Embed task(9) into zone, so that uz_maxaction is called in a context that can sleep, allowing consumers of the KPI to run their drain routines without any extra measures. Discussed with: jtl	2016-02-03 23:30:17 +00:00
John Baldwin	aa949be551	Convert ss_sp in stack_t and sigstack to void . POSIX requires these members to be of type void rather than the char * inherited from 4BSD. NetBSD and OpenBSD both changed their fields to void * back in 1998. No new build failures were reported via an exp-run. PR: 206503 (exp-run) Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5092	2016-01-27 17:55:01 +00:00
Xin LI	669414e4fb	Implement AT_SECURE properly. AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a boolean flag indicating whether secure mode should be enabled. 1 means that the program has changes its credentials during the execution. Being exported AT_SECURE used by glibc issetugid() call. Submitted by: imp, dchagin Security: FreeBSD-SA-16:10.linux Security: CVE-2016-1883	2016-01-27 07:20:55 +00:00
Dmitry Chagin	9dba79fb66	Remove obsolete comment. MFC after: 3 days	2016-01-23 08:08:06 +00:00
Dmitry Chagin	f138999141	Fix a typo. MFC after: 3 days	2016-01-23 08:04:29 +00:00
Hans Petter Selasky	c1ecb7e114	Add missing atomic wrapper macro. Reviewed by: alfred @ Sponsored by: Mellanox Technologies MFC after: 1 week	2016-01-21 18:22:50 +00:00
Konstantin Belousov	f132cd0547	Use ANSI definitions. Wrap long line. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-19 08:08:08 +00:00
Konstantin Belousov	b57e68141f	Clear whole XMM register file instead of only XMM0. Also clear x87 registers. This brings amd64 on par with i386, providing consistent initial FPU state. Note that we do not clear any extended state, at least because kernel does not understand extended state structure and consequences of zero overwrite after fninit()/fpusave(). Submitted by: joss.upton@yahoo.com PR: 206370 MFC after: 2 weeks	2016-01-19 08:04:02 +00:00
Gleb Smirnoff	de44d808ef	Regen after r293907.	2016-01-14 10:15:21 +00:00
Gleb Smirnoff	037f750877	Change linux get_robust_list system call to match actual linux one. The set_robust_list system call request the kernel to record the head of the list of robust futexes owned by the calling thread. The head argument is the list head to record. The get_robust_list system call should return the head of the robust list of the thread whose thread id is specified in pid argument. The list head should be stored in the location pointed to by head argument. In contrast, our implemenattion of get_robust_list system call copies the known portion of memory pointed by recorded in set_robust_list system call pointer to the head of the robust list to the location pointed by head argument. So, it is possible for a local attacker to read portions of kernel memory, which may result in a privilege escalation. Submitted by: mjg Security: SA-16:03.linux	2016-01-14 10:13:58 +00:00
Jung-uk Kim	4ec1c9bfac	Remove dead code when the target processor has POPCNT instruction.	2016-01-13 19:19:50 +00:00
Dmitry Chagin	038c720553	Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it. Differential Revision: https://reviews.freebsd.org/D1090 Reviewed by: kib, trasz MFC after: 1 week	2016-01-09 20:18:53 +00:00
Ed Maste	0e42ee5dd8	Move amd64 metadata.h to x86 and share with i386 MFC after: 1 week	2016-01-07 19:47:26 +00:00
Ian Lepore	69dcb7e771	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
John Baldwin	9e8d8b4b0c	Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c. While here, move the common bits of <machine/cputypes.h> to <x86/cputypes.h> as well. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4670	2015-12-23 21:41:42 +00:00
Enji Cooper	418629d81d	Remove redundant ctx_switch_xsave declaration in sys/amd64/include/md_var.h This variable was added to sys/x86/include/x86_var.h recently. This unbreaks building kernel source that #includes both md_var.h and x86_var.h with gcc 4.2.1 on amd64 Differential Revision: https://reviews.freebsd.org/D4686 Reviewed by: kib X-MFC with: r291949 Sponsored by: EMC / Isilon Storage Division	2015-12-22 20:08:32 +00:00
Warner Losh	2fca0f2dd4	Save the physical address passed into the kernel of the UEFI system table.	2015-12-19 19:01:43 +00:00
Konstantin Belousov	7c958a41fe	Merge common parts of i386 and amd64 md_var.h and smp.h into new headers x86/include x86_var.h and x86_smp.h. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4358	2015-12-07 17:41:20 +00:00
Konstantin Belousov	49e806677c	Use ANSI C definition. MFC after: 1 week	2015-12-07 17:24:55 +00:00
Conrad Meyer	10386b56ad	pmap_invalidate_range: For very large ranges, flush the whole TLB Typical TLBs have 40-512 entries available. At some point, iterating every single page in a requested invalidation range and issuing invlpg on it is more expensive than flushing the TLB and allowing it to reload on demand. Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary number 4096 entries as a hueristic at which point we flush TLB rather than invalidating every single potential page. Reviewed by: alc Feedback from: jhb, kib MFC notes: Depends on r291688 Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4280	2015-12-06 17:39:13 +00:00
Konstantin Belousov	27691a24ab	For amd64 non-PCID machines, and for i386 machines with support for the PG_G global pte flag, pmap_invalidate_all() fails to flush global TLB entries []. This is because TLB shootdown handler for such configs reloads CR3, and on i386 pmap_invalidate_all() does the same for the initiating CPU. Note that current code does not issue total invalidation requests for the kernel_pmap. Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not specific for PCID for quite some time, and implement the same functionality for i386. Use the function instead of invltlb() in shootdown handlers and in i386 pmap_invalidate_all(), but only for the kernel pmap (which maps pages with the PG_G attribute set), which takes care of PG_G TLB entries on flush. To detect the affected pmap in i386 TLB shootdown handler, pmap should be passed to the smp_masked_invltlb() function, which makes amd64 and i386 TLB shootdown code almost identical. Merge the code under x86/. Noted by: jhb [] Reviewed by: cem, jhb, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4346	2015-12-03 11:14:14 +00:00
Konstantin Belousov	724f4b62b0	Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct sysent. sv_prepsyscall is unused. sv_sigsize and sv_sigtbl translate signal number from the FreeBSD namespace into the ABI domain. It is only utilized on i386 for iBCS2 binaries. The issue with this approach is that signals for iBCS2 were delivered with the FreeBSD signal frame layout, which does not follow iBCS2. The same note is true for any other potential user if sv_sigtbl. In other words, if ABI needs signal number translation, it really needs custom sv_sendsig method instead. Sponsored by: The FreeBSD Foundation	2015-11-28 08:49:07 +00:00
Ed Maste	2e0002c18e	Fix whitespace on addition of IPSEC option	2015-11-26 21:35:50 +00:00
Konstantin Belousov	5e27d79314	Split kerne timekeep ABI structure vdso_sv_tk out of the struct sysentvec. This allows the timekeep data to be shared between similar ABIs which cannot share sysentvec. Make the timekeep_push_vdso() tick callback to the timekeep structures instead of sysentvecs. If several sysentvec share the vdso_sv_tk structure, we would update the userspace data several times on each tick, without the change. Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when sysentvec is marked with the new SV_TIMEKEEP flag. This saves allocation and update of unneeded vdso_sv_tk for ABIs which do not provide userspace gettimeofday yet, which are PowerPCs arches right now. Make vdso_sv_tk allocator public, namely split out and export alloc_sv_tk() and alloc_sv_tk_compat32(). ABIs which share timekeep data now can allocate it manually and share as appropriate. Requested by: nwhitehorn Tested by: nwhitehorn, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-11-23 07:09:35 +00:00
Mark Johnston	7672ca059a	Remove unneeded includes of opt_kdtrace.h. As of r258541, KDTRACE_HOOKS is defined in opt_global.h, so opt_kdtrace.h is not needed when defining SDT(9) probes.	2015-11-22 02:01:01 +00:00
John Baldwin	645743ea99	Export various helper variables describing the layout and size of certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3773	2015-11-12 22:00:59 +00:00
Conrad Meyer	0e5d2011ae	pmap_change_attr: Only fixup DMAP for DMAPed ranges pmap_change_attr must change the memory type of both the requested KVA and the corresponding DMAP mappings (if such mappings exist), to satisfy an Intel requirement that two or more mappings to the same physical pages must have the same memory type. However, not all kernel mapped pages have corresponding DMAP mappings -- for example, 64-bit BARs. Skip fixing up the DMAP for out-of-bounds addresses. Submitted by: Steve Wahl <steve_wahl@dell.com> Reviewed by: alc, jhb Sponsored by: Dell Compellent Differential Revision: https://reviews.freebsd.org/D4030	2015-10-29 19:07:00 +00:00
John Baldwin	2219c44a1f	Update for LINUX32 rename. The assembler didn't complain about undefined symbols but just used 0 after the rename.	2015-10-29 15:20:47 +00:00
John Baldwin	6cea44a704	Fix build with DEBUG defined. Reported by: hselasky	2015-10-29 15:16:47 +00:00
Kirk McKusick	a57418a761	Bring the tags and links entries for amd64 up to date. Based on how out of date it is, I doubt that anyone other than me and my code-reading students still use it.	2015-10-27 22:59:24 +00:00
Konstantin Belousov	af95bbf5bf	Intel SDM before revision 56 described the CLFLUSH instruction as only ordered with the MFENCE instruction. Similar weak guarantees are also specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced CLFLUSH loop with MFENCE both before and after the loop. In the revision 56 of SDM, Intel stated that all existing implementations of CLFLUSH are strict, CLFLUSH instructions execution is ordered WRT other CLFLUSH and writes. Also, the strict behaviour is made architectural. A new instruction CLFLUSHOPT (which was documented for some time in the Instruction Set Extensions Programming Reference) provides the weak behaviour which was previously attributed to CLFLUSH. Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do not execute MFENCE before and after the flushing loop. Reviewed by: alc Sponsored by: The FreeBSD Foundation	2015-10-24 21:37:47 +00:00
Konstantin Belousov	3f8e071052	Add CLFLUSHOPT instruction wrappers. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-10-23 11:45:38 +00:00
John Baldwin	f2264f5048	Regen for linux32 rename and linux64 systrace.	2015-10-22 21:33:37 +00:00
John Baldwin	2f99bcce1e	Rename remaining linux32 symbols such as linux_sysent[] and linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with linux64.ko. While here, add support for linux64 binaries to systrace. - Update NOPROTO entries in amd64/linux/syscalls.master to match the main table to fix systrace build. - Add a special case for union l_semun arguments to the systrace generation. - The systrace_linux32 module now only builds the systrace_linux32.ko. module on amd64. - Add a new systrace_linux module that builds on both i386 and amd64. For i386 it builds the existing systrace_linux.ko. For amd64 it builds a systrace_linux.ko for 64-bit binaries. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D3954	2015-10-22 21:28:20 +00:00
John Baldwin	5047105b71	Merge r289055 to amd64/linux32: linux: fix handling of out-of-bounds syscall attempts Due to an off by one the code would read an entry past the table, as opposed to the last entry which contains the nosys handler.	2015-10-22 21:23:58 +00:00
Ed Schouten	b78ef4bd86	Refactoring: move out generic bits from cloudabi64_sysvec.c. In order to make it easier to support CloudABI on ARM64, move out all of the bits from the AMD64 cloudabi_sysvec.c into a new file cloudabi_module.c that would otherwise remain identical. This reduces the AMD64 specific code to just ~160 lines. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D3974	2015-10-22 09:07:53 +00:00
Roger Pau Monné	6a306bff7f	x86/xen: Consolidate xen-os.h in a single place amd64 and i386 platform code contain very similar xen/xen-os.h The only differences are: - Functions/variables/types which were unused in i386/xen/xen-os.h: * xen_xchg * __xchg_dummy * __xg * __xchg * atomic_t * atomic_inc * rdtscll The functions/variables/types unused in xen-os.h can be dropped and there is no more differences betwen amd64 and i386. The new header is placed in x86/include/xen and each platform will have dummy headers include x86/xen/.h. This is to be able to include machine/xen/.h in the PV drivers. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3880 Sponsored by: Citrix Systems R&D	2015-10-21 10:04:35 +00:00
Alexander Motin	4a3760bae6	Remove compatibility shims for legacy ATA device names. We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely removed old stack at 10.x, so at 11.x it is time to remove compat shims.	2015-10-11 13:01:51 +00:00
Mateusz Guzik	3e15a670d2	linux: fix handling of out-of-bounds syscall attempts Due to an off by one the code would read an entry past the table, as opposed to the last entry which contains the nosys handler. Reported by: Pawel Biernacki <pawel.biernacki gmail.com>	2015-10-08 21:08:35 +00:00
Roger Pau Monné	a231723cc0	xen/console: Introduce a new console driver for Xen guest The current Xen console driver is crashing very quickly when using it on an ARM guest. This is because the console lock is recursive and it may lead to recursion on the tty lock and/or corrupt the ring pointer. Furthermore, the console lock is not always taken where it should be and has to be released too early because of the way the console has been designed. Over the years, code has been modified to support various new features but the driver has not been reworked. This new driver has been rewritten with the idea of only having a small set of specific function to write either via the shared ring or the hypercall interface. Note that HVM support has been left aside for now because it requires additional features which are not yet supported. A follow-up patch will be sent with HVM guest support. List of items that may be good to have but not mandatory: - Avoid to flush for each character written when using the tty - Support multiple consoles Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3698 Sponsored by: Citrix Systems R&D	2015-10-08 16:39:43 +00:00

1 2 3 4 5 ...

7302 Commits