freebsd-skq

Author	SHA1	Message	Date
Neel Natu	b98940e5eb	Do not create superpage mappings in the iommu. This is a workaround to hide the fact that we do not have any code to demote a superpage mapping before we unmap a single page that is part of the superpage.	2013-08-20 06:46:40 +00:00
Neel Natu	f77e982952	Extract the location of the remapping hardware units from the ACPI DMAR table. Submitted by: Gopakumar T (gopakumar_thekkedath@yahoo.co.in)	2013-08-20 06:20:05 +00:00
Neel Natu	15e683837c	Fix breakage caused by r254466 in minidumpsys(). r254466 increased the KVA from 512GB to 2TB which requires 4 PDP pages as opposed to a single one before the change. This broke minidumpsys() since it assumed that the entire KVA could be addressed via a single PDP page. Fix this by obtaining the address of the PDP page from the PML4 entry associated with the KVA being dumped. Reported by: pho Submitted by: kib Pointy hat to: neel	2013-08-20 02:09:26 +00:00
Konstantin Belousov	d91f339823	When code from r254064 in pmap_ts_referenced() drops pv lock and blocks on a pmap lock, pmap_release() might proceed in parallel and destroy the pmap mutex, since unlocked pv lock allows to remove pv entry owned by the pmap. For now, gate the pmap_release() on write-locked pvh_global_lock. Since pmap_ts_release() does not unlock the global lock, pmap_release() would not destroy pmap mutex until the pmap_ts_referenced() finished. We cannot enter pmap_ts_referenced() and encounter a pv entry for the destroyed pmap if pmap_release() passed the global lock gate, since pmap_remove_pages() would finish earlier. Reported by: jeff, pho Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-18 21:36:22 +00:00
Pawel Jakub Dawidek	417ffc66fa	Add process descriptors support to the GENERIC kernel. It is already being used by the tools in base systems and with sandboxing more and more tools the usage should only increase. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org> Sponsored by: Google Summer of Code 2013 MFC after: 1 month	2013-08-18 10:21:29 +00:00
Neel Natu	0ef2ab3ab8	Bump up the maximum addressable memory on amd64 systems from 1TB to 4TB. Bump up the KVA size proportionally from 512GB to 2TB. The number of page table pages used by the direct map is now calculated at run time based on 'Maxmem'. This means the small memory systems will not see any additional tax in terms of page table pages for the direct map. However all amd64 systems, regardless of the memory size, will use 3 more pages to accomodate the bump in the KVA size. More details available here: http://lists.freebsd.org/pipermail/freebsd-hackers/2013-June/043015.html http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043143.html Tested with the following configurations: - Sandybridge server with 64GB of memory. - bhyve VM with 64MB of memory. - bhyve VM with a 8GB of memory with the memory segment above 4GB cuddling right up against the 4TB maximum memory limit. Discussed on: hackers@, current@ Submitted by: Chris Torek (torek@torek.net)	2013-08-17 19:49:08 +00:00
Jilles Tjoelker	0f3a4d8051	libc: Access _logname_valid more efficiently. The variable _logname_valid is not exported via the version script; therefore, change C and i386/amd64 assembler code to remove indirection (which allowed interposition). This makes the code slightly smaller and faster. Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC, there is no place containing the address of each variable, so there is no possible definition for PIC_GOT.	2013-08-17 19:24:58 +00:00
Brooks Davis	cd234300d3	Use an ANSI C definition of initializecpucache() to match the declaration and the rest of the file.	2013-08-15 17:44:44 +00:00
Jung-uk Kim	38da30b419	Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these two files were functionally identical.	2013-08-13 22:05:10 +00:00
Jung-uk Kim	3bd12ca8f1	Tidy up global locks for ACPICA. There is no functional change.	2013-08-13 21:34:03 +00:00
Konstantin Belousov	c325e866f4	Different consumers of the struct vm_page abuse pageq member to keep additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-10 17:36:42 +00:00
Attilio Rao	e946b94934	On all the architectures, avoid to preallocate the physical memory for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl	2013-08-09 11:28:55 +00:00
Attilio Rao	c7aebda8a1	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl	2013-08-09 11:11:11 +00:00
Andriy Gapon	9ba0691bdd	follow up to r254051 - update powerpc/GENERIC64 as well, suggested by mdf - update comments so that they make sense after the change, suggested by jhb X-MFC after: never (change specific to head)	2013-08-09 08:11:09 +00:00
Neel Natu	f263e391a3	Use local variables with the appropriate types and eliminate a bunch of casts. This is a cosmetic change but it does help with a proposed change to increase the maximum size of physical memory supported on amd64 platforms. Submitted by: Chris Torek (torek@torek.net)	2013-08-08 03:17:39 +00:00
Konstantin Belousov	449c2e92c9	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation	2013-08-07 16:36:38 +00:00
Konstantin Belousov	872d995f76	Change the pmap_ts_referenced() method of amd64 pmap to use shared pvh_global_lock. This allows the method to be executed in parallel, avoiding undue contention on the pvh_global_lock for the multithreaded pagedaemon. The pmap_ts_referenced() function has to inspect the page mappings for several pmaps, which need to be locked while pv list lock is owned. This contradicts to the lock order, where pmap lock is before pv list lock. Introduce the generation count for the pv list of the page or superpage, which indicate any change in the pv list, and, as usual, perform restart of the iteration if generation changed while pv lock was dropped for blocking acquire of a pmap lock. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation	2013-08-07 16:33:15 +00:00
Andriy Gapon	818d282e7b	enable KDB_TRACE in GENERICs KDB_TRACE is not an alternative to DDB/etc, they are complementary. So I do not see any reason to not enable KDB_TRACE by default. X-MFC after: never (change specific to head)	2013-08-07 08:03:50 +00:00
Jeff Roberson	5df87b21d3	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-08-07 06:21:20 +00:00
Jeff Roberson	2c0b86b48f	- Introduce a specific function, pmap_remove_kernel_pde, for removing huge pages in the kernel's address space. This works around several asserts from pmap_demote_pde_locked that did not apply and gave false warnings. Discovered by: pho Reviewed by: alc Sponsored by: EMC / Isilon Storage Division	2013-08-05 00:28:03 +00:00
Peter Grehan	80a902ef7d	Follow-up commit to fix CR0 issues. Maintain architectural state on CR vmexits by guaranteeing that EFER, CR0 and the VMCS entry controls are all in sync when transitioning to IA-32e mode. Submitted by: Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com)	2013-08-03 03:16:42 +00:00
Peter Grehan	81ef6611ed	Moved clearing of vmm_initialized to avoid the case of unloading the module while VMs existed. This would result in EBUSY, but would prevent further operations on VMs resulting in the module being impossible to unload. Submitted by: Tycho Nightingale (tycho.nightingale <at> plurisbusnetworks.com) Reviewed by: grehan, neel	2013-08-01 05:59:28 +00:00
Peter Grehan	aaaa065629	Correctly maintain the CR0/CR4 shadow registers. This was exposed with AP spinup of Linux, and booting OpenBSD, where the CR0 register is unconditionally written to prior to the longjump to enter protected mode. The CR-vmexit handling was not updating CPU state which resulted in a vmentry failure with invalid guest state. A follow-on submit will fix the CPU state issue, but this fix prevents the CR-vmexit prior to entering protected mode by properly initializing and maintaining CR* state. Reviewed by: neel Reported by: Gopakumar.T @ netapp	2013-08-01 01:18:51 +00:00
David E. O'Brien	0e6a0799a9	Back out r253779 & r253786.	2013-07-31 17:21:18 +00:00
David E. O'Brien	99ff83da74	Decouple yarrow from random(4) device. * Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option. The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow. * random(4) device doesn't really depend on rijndael-. Yarrow, however, does. Add random_adaptors.[ch] which is basically a store of random_adaptor's. random_adaptor is basically an adapter that plugs in to random(4). random_adaptor can only be plugged in to random(4) very early in bootup. Unplugging random_adaptor from random(4) is not supported, and is probably a bad idea anyway, due to potential loss of entropy pools. We currently have 3 random_adaptors: + yarrow + rdrand (ivy.c) + nehemeiah * Remove platform dependent logic from probe.c, and move it into corresponding registration routines of each random_adaptor provider. probe.c doesn't do anything other than picking a specific random_adaptor from a list of registered ones. * If the kernel doesn't have any random_adaptor adapters present then the creation of /dev/random is postponed until next random_adaptor is kldload'ed. * Fix randomdev_soft.c to refer to its own random_adaptor, instead of a system wide one. Submitted by: arthurmesh@gmail.com, obrien Obtained from: Juniper Networks Reviewed by: obrien	2013-07-29 20:26:27 +00:00
Andriy Gapon	a29cc9a34b	Revert r253748,253749 This WIP should not have been committed yet. Pointyhat to: avg	2013-07-28 18:44:17 +00:00
Andriy Gapon	366d8bfb7b	put contents of cpu.h under _KERNEL no userland-serviceable parts inside MFC after: 20 days	2013-07-28 18:32:27 +00:00
Andriy Gapon	a69e8d609e	x86: detect mwait capabilities and extensions, when present Reviewed by: kib (earlier amd64-only version) MFC after: 2 weeks	2013-07-28 17:54:42 +00:00
Jeff Roberson	2f84c08eee	- Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc. This eliminates some unusual uses of that API in favor of more typical uses of kmem_malloc(). Discussed with: kib/alc Tested by: pho Sponsored by: EMC / Isilon Storage Division	2013-07-26 19:06:14 +00:00
Neel Natu	84e169c6c3	Add support for emulation of the "or r/m, imm8" instruction. Submitted by: Zhixiang Yu (zxyu.core@gmail.com) Obtained from: GSoC 2013 (AHCI device emulation for bhyve)	2013-07-23 23:43:00 +00:00
Neel Natu	113326a772	Fix a bug introduced in r252646 that causes a page with the PG_PTE_PAT bit set to be interpreted as a superpage. This is because PG_PTE_PAT is at the same bit position in PTE as PG_PS is in a PDE. This caused a number of regressions on amd64 systems: panic when starting X applications, freeze during shutdown etc. Pointy hat to: me Tested by: gperez@entel.upc.edu, joel, dumbbell Reviewed by: kib	2013-07-23 22:17:00 +00:00
Konstantin Belousov	0f6bcda4cd	MFi386: add ddb "show sysregs" command. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-07-15 06:30:57 +00:00
Konstantin Belousov	0cdd261571	Clear m->object for the page taken from the delayed free list for reuse as the pv chink page in reclaim_pv_chunk(). Having non-NULL m->object is wrong for page not owned by an object and confuses both vm_page_free_toq() and vm_page_remove() when the page is freed later. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-07-10 09:24:03 +00:00
Xin LI	1fdeb1651c	Import HighPoint DC Series Data Center HBA (DC7280 and R750) driver. This driver works for FreeBSD/i386 and FreeBSD/amd64 platforms. Many thanks to HighPoint for providing this driver. MFC after: 1 day	2013-07-06 07:49:41 +00:00
Neel Natu	be28275d00	If a superpage mapping is being removed then we need to ignore the PG_PDE_PAT bit when looking up the vm_page associated with the superpage's physical address. If the caching attribute for the mapping is write combining or write protected then the PG_PDE_PAT bit will be set and thus cause an 'off-by-one' error when looking up the vm_page. Fix this by using the PG_PS_FRAME mask to compute the physical address for a superpage mapping instead of PG_FRAME. This is a theoretical issue at this point since non-writeback attributes are currently used only for fictitious mappings and fictitious mappings are not subject to promotion. Discussed with: alc, kib MFC after: 2 weeks	2013-07-03 23:21:25 +00:00
Neel Natu	de16308c48	Verify that all bytes in the instruction buffer are consumed during decoding. Suggested by: grehan	2013-07-03 23:05:17 +00:00
Peter Grehan	e60f5d779e	Ignore guest PAT settings by default in EPT mappings. From experimentation, other hypervisors also do this. Diagnosed by: tycho nightingale at pluribusnetworks com Reviewed by: neel	2013-07-01 20:05:43 +00:00
Konstantin Belousov	70a7dd5d5b	Fix issues with zeroing and fetching the counters, on x86 and ppc64. Issues were noted by Bruce Evans and are present on all architectures. On i386, a counter fetch should use atomic read of 64bit value, otherwise carry from the increment on other CPU could be lost for the given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not available on the machine, it cannot be SMP and it is enough to disable preemption around read to avoid the split read. On x86 the counter increment is not atomic on purpose, which makes it possible for the store of the incremented result to override just zeroed per-cpu slot. The effect would be a counter going off by arbitrary value after zeroing. Perform the counter zeroing on the same processor which does the increments, making the operations mutually exclusive. On i386, same as for the fetching, if the cmpxchg8b is not available, machine is not SMP and we disable preemption for zeroing. PowerPC64 is treated the same as amd64. For other architectures, the changes made to allow the compilation to succeed, without fixing the issues with zeroing or fetching. It should be possible to handle them by using the 64bit loads and stores atomic WRT preemption (assuming the architectures also converted from using critical sections to proper asm). If architecture does not provide the facility, using global (spin) mutex would be non-optimal but working solution. Noted by: bde Sponsored by: The FreeBSD Foundation	2013-07-01 02:48:27 +00:00
Peter Grehan	560d5eda2c	Make sure all CPUID values are handled, instead of exiting the bhyve process when an unhandled one is encountered. Hide some additional capabilities from the guest (e.g. debug store). This fixes the issue with FreeBSD 9.1 MP guests exiting the VM on AP spinup (where CPUID is used when sync'ing the TSCs) and the issue with the Java build where CPUIDs are issued from a guest userspace. Submitted by: tycho nightingale at pluribusnetworks com Reviewed by: neel Reported by: many	2013-06-28 06:05:33 +00:00
Jung-uk Kim	b1ddd13145	Move definitions required by userland applications out of acpica_machdep.h.	2013-06-27 00:22:40 +00:00
Konstantin Belousov	9dbb63fe03	Allow immediate operand. Sponsored by: The FreeBSD Foundation	2013-06-20 14:30:04 +00:00
Konstantin Belousov	c788f92509	Some clarifications and updates for the comments, mostly retrieved from Bruce Evans. Trim the trailing spaces. MFC after: 1 week	2013-06-19 05:05:16 +00:00
Sergey Kandaurov	1e2751ddeb	Fix a gcc warning uncovered after r251745. Reported by: Sergey V. Dyatko Reviewed by: neel	2013-06-18 23:31:09 +00:00
Justin T. Gibbs	a8f6ac0573	Upgrade Xen interface headers to Xen 4.2.1. Move FreeBSD from interface version 0x00030204 to 0x00030208. Updates are required to our grant table implementation before we can bump this further. sys/xen/hvm.h: Replace the implementation of hvm_get_parameter(), formerly located in sys/xen/interface/hvm/params.h. Linux has a similar file which primarily stores this function. sys/xen/xenstore/xenstore.c: Include new xen/hvm.h header file to get hvm_get_parameter(). sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: Correctly protect function definition and variables from being included into assembly files in xen-os.h Xen memory barriers are now prefixed with "xen_" to avoid conflicts with OS native primatives. Define Xen memory barriers in terms of the native FreeBSD primatives. Sponsored by: Spectra Logic Corporation Reviewed by: Roger Pau Monné Tested by: Roger Pau Monné Obtained from: Roger Pau Monné (bug fixes)	2013-06-14 23:43:44 +00:00
Sergey Kandaurov	82f2974a69	Replace cpusetffs_obj with CPU_FFS, missed in r251703. Reported by: bdrewery, O. Hartmann	2013-06-14 10:26:38 +00:00
Neel Natu	8f1664b724	Remove unused macros PTESHIFT, PDESHIFT, PDPESHIFT and PML4ESHIFT. Reviewed by: alc	2013-06-14 00:03:43 +00:00
Jeff Roberson	17a2737732	- Add a BIT_FFS() macro and use it to replace cpusetffs_obj() Discussed with: attilio Sponsored by: EMC / Isilon Storage Division	2013-06-13 20:46:03 +00:00
Konstantin Belousov	9138579845	Assert that interrupts are enabled in the trap handlers on x86 before calling generic code to deliver signals. Discussed with: bde Tested by: pho MFC after: 1 week	2013-06-03 17:40:05 +00:00
Konstantin Belousov	cb5bfd1240	Use slightly more idiomatic expression to get the address of array. Tested by: dim, pgj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-27 18:39:39 +00:00
Konstantin Belousov	87b94d9a92	The _MC_HASFPXSTATE and _MC_IA32_HASFPXSTATE flags have the same bit value on purpose, but the ia32 context handling code is logically more correct to use the _MC_IA32_HASFPXSTATE name for the flag. Tested by: dim, pgj Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-05-27 18:36:46 +00:00

1 2 3 4 5 ...

6441 Commits